

**OBJECTIVES - 10/4** Course Introduction Demographics Survey AWS Cloud Credits Survey ■ Tutorial 1 - Intro to Linux Cloud Computing - How did we get here? (10/4) Chapter 4 Marinescu 2<sup>nd</sup> edition: Introduction to parallel and distributed systems October 4, 2022



**OBJECTIVES - 10/4** Syllabus Course Introduction ■ Demographics Survey AWS Cloud Credits Survey ■ Tutorial 1 - Intro to Linux Cloud Computing - How did we get here? Chapter 4 Marinescu 2<sup>nd</sup> edition: Introduction to parallel and distributed systems October 4, 2022 L2.4

3



**REFERENCES** [1] Cloud Computing: Concepts, Technology and Architecture\* Thomas Erl, Prentice Hall 2013 [2] Cloud Computing - Theory and Practice Dan Marinescu, First Edition 2013\*, Second Edition 2018 [3] Cloud Computing: A Hands-On Approach Arshdeep Bahga 2013 available online via UW libra October 4, 2022

Slides by Wes J. Lloyd L2.1



TCS462/562 COURSE WORK Project Proposal Project Status Reports / Activities - ~ 2-4 total items (??) Variety of formats: in class, online, reading, activity Oulzzes Open book, note, etc. Class Presentation (TCSS 562) Class Presentation Summaries (TCSS 462) Term Project / Paper / Presentation October 4, 2022 L2.8

**TERM PROJECT** ■ Project description to be posted ■ Teams of ~4, self formed, one project leader Project scope can vary based on team size and background w/ instructor approval Proposal due: Tuesday October 18, 11:59pm (tentative) Approach: Build a "cloud native" serverless application App will consist of multiple FaaS functions (services) Objective is to compare outcomes of design trade-offs Performance (runtime) Cost (\$) How does application design impact cost and performance?

**TERM PROJECT - 2** GOAL: Compare implementations with alternate: Different service compositions / services Different external services (e.g. database, key-value store) Application control flow - AWS Step Functions, laptop client, etc. A & B Testing As developers it is common to implement a system or algorithm multiple ways But which implementation is more effective for a given set of goals, objectives, metrics? WHAT are some metrics that would be interesting to compare? TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022]
School of Engineering and Technology, University of Washington - Taco October 4, 2022

9

**TERM PROJECT - 3** Deliverables Short Demo in class at end of quarter (< 5 min) Project report paper (4-6 pgs IEEE format, template provided) GitHub (project source) How-To document (via GitHub markdown) Standard project(s) will be suggested or propose your own: (Example) Extract-Transform-Load (ETL) style serverless data processing pipeline combing AWS Lambda, S3, and Amazon Aurora Serverless DB October 4, 2022

**COMPARING DESIGN TRADE-OFFS** What design trade-offs can be compared? Compare and contrast alternative designs using various cloud services, languages, platforms, etc. ■ Examples - Compare different: Cloud storage services: Object/blob storage services Amazon S3. Google blobstore, Azure blobstore, vs. self-hosted Cloud relational database services: Amazon Relational Database Service (RDS), Aurora, Self-Hosted DB Platform-as-a-Service (PaaS) alternatives: Amazon Elastic Beanstalk, Heroku, others Open source FaaS platforms Apache OpenWhisk, OpenFaaS, Fn, others. October 4, 2022

11 12

Slides by Wes J. Lloyd L2.2



TERM PROJECT: BIG PICTURE

1. BUILD A MULTI-FUNCTION SERVERLESS APPLICATION

• Typically consisting of AWS Lambda Functions or Google Cloud Functions, etc. (e.g. FaaS platfrom)

2. CONTRAST THE USE OF ALTERNATIVE CLOUD SERVICES TO INSTRUMENT SOME OR MULTIPLE ASPECTS OF THE APPLICATION

3. CONDUCT A PERFORMANCE EVALUATION, REPORT ON YOUR FINDINGS IN A LIGHTNING TALK (5-minutes) AND TERM PAPER

October 4, 2022

| TCCS462/552: (Software Engineering for) Cloud Computing [fall 2022] | School of Engineering and Technology, University of Valunageon - Taccoma

| 12.14

 ■ Alternative I: conduct a cloud-related research project on any topic focused on specific research goals / questions

• Can be used to help spur MS Capstone/Thesis projects or honors thesis

• If you're interested in this option, please talk with the instructor

• First step is to identify 1 - 2 research questions

■ Alternative II: conduct a gap-analysis literature survey of cloud computing research papers, produce a report which identifies open problems for future research in cloud computing that have tractable next steps

• Suitable for 1-person teams and students interested in research

■ Instructor will help guide projects throughout the quarter

October 4, 2022

| TCS462/S62: [Software Engineering For Cloud Computing [Fall 2022] | School of Engineering and Technology, University of Washington - Tacoma

15

PROJECT SUPPORT

Project cloud infrastructure support:

Standard AWS Account (RECOMMENDED)

Create standard AWS account with UW email

Credit card required

Instructor provides students with \$50 credit vouchers from AWS

When voucher is used up, request another voucher from instructor

Credits provided throughout Fall quarter (within reason)

Instructor provided IAM AWS Account

No Credit Card required

Instructor creates and manages account security and permissions

More restricted

TCSMGJ/S62: (Software Engineering for) Chook Computing [Fall 2022]

School of Engineering and Technology, University of Washington - Tacoma

PROJECT SUPPORT - 2

Cother Support:
City Comments of the Support - 2

Cit

17 18

Slides by Wes J. Lloyd L2.3

14





■ TCSS 562: CLASS PRESENTATION

■ TCSS 562 students will give a team presentation teams of ~3

■ Technology sharing presentation

■ PPT Slides, demonstration

■ Provide technology overview of one cloud service offering

■ Present overview of features, performance, etc.

■ Cloud Research Paper Presentation

■ PPT slides, identify research contributions, strengths and weaknesses of paper, possible areas for future work

October 4, 2022

| TCS462/S62: (Software Engineering for) Cloud Computing [Pail 2022] | School of Engineering and Technology, University of Washington - Tacoma

**CLASS PRESENTATION PEER REVIEWS** Students will submit reviews of class presentations using rubric worksheet (~ 1-page) Students will review a minimum of one presentation for each presentation day, for a minimum of 4 reviews Optionally additional reviews can be submitted (Extra Credit) In addition to the reviews, students will write two questions about content in the presentation. These can be questions to help clarify content from the presentation that was not clear, or any related questions inspired by the presentation. To ensure intellectual depth of questions, questions should not have yes-no answers. Peer reviews will be shared with presentation groups to provide feedback but will not factor into the grading of class presentations TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Taco October 4, 2022

21



OBJECTIVES - 10/4

Syllabus
Course Introduction
Demographics Survey
AWS Cloud Credits Survey
Tutorial 1 - Intro to Linux
Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

Cotober 4, 2022

TCCS462/562: [Software Engineering for] Cloud Computing [Fall 2022]
School of Engineering and Technology, University of Washington - Tacoma

12.24

23 24

Slides by Wes J. Lloyd L2.4



Syllabus
Course Introduction
Demographics Survey
AWS Cloud Credits Survey
Tutorial 1 - Intro to Linux
Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

Ctober 4, 2022
TCSS462/562: Software Engineering for) Cloud Computing [Fall 2022]
School of Engineering and Technology, University of Washington - Taxoma

25 26



OBJECTIVES - 10/4

Course Introduction
Syllabus
Demographics Survey
AWS Cloud Credits Survey
Tutorial 1 - Intro to Linux
Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

Ctober 4, 2022
TCSS42/562/567/sare Engineering for Cloud Computing [Fall 2022]
School of Engineering and Technology, University of Washington - Tacoma

27



Cloud Computing: How did we get here?
 Parallel and distributed systems
 (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

 Data, thread-level, task-level parallelism
 Parallel architectures
 SIMD architectures, vector processing, multimedia extensions
 Graphics processing units
 Speed-up, Amdahl's Law, Scaled Speedup
 Properties of distributed systems
 Modularity

October 4, 2022
 TICSS462/562: [Goftwave Engineering for) Cloud Computing [Fall 2022]
 School of Engineering and Technology, University of Washington - Tacoma

1.30

1.30

29 30

Slides by Wes J. Lloyd L2.5





| AMD'S 64-CORE 7NM CPUS |                  |                 |            |          |       |        |
|------------------------|------------------|-----------------|------------|----------|-------|--------|
| Epyc Rome (            | PUs              |                 |            |          |       |        |
| Announced A            | August 20        | 19              |            |          |       |        |
| EPYC 7H12 i            | equires li       | iquid co        | oling      |          |       |        |
|                        | AM               | D EPYC 70       | 02 Process | ors (2P) |       |        |
|                        | Cores<br>Threads | Frequency (GHz) |            | L3*      | TDP   | Price  |
|                        |                  | Base            | Max        |          |       |        |
| EPYC 7H12              | 64 / 128         | 2.60            | 3.30       | 256 MB   | 280 W | ?      |
| EPYC 7742              | 64 / 128         | 2.25            | 3.40       | 256 MB   | 225 W | \$6950 |
| EPYC 7702              | 64 / 128         | 2.00            | 3.35       | 256 MB   | 200 W | \$6450 |
| EPYC 7642              | 48 / 96          | 2.30            | 3.20       | 256 MB   | 225 W | \$4775 |
| EPYC 7552              | 48 / 96          | 2.20            | 3.30       | 192 MB   | 200 W | \$4025 |

**HOST SERVER VCPUS - AMAZON EC2** INFRASTRUCTURE-AS-A-SERVICE CLOUD Cloud server virtual CPUs/host Growth since 2006 - Amazon Compute Cloud (EC2) ■ 1<sup>st</sup> generation Intel: m1 - 8 vCPUs / host (Aug 2006) ■ 2<sup>nd</sup> generation Intel: m2 - 16 vCPUs / host (Oct 2009) ■ 3<sup>rd</sup> generation Intel: m3 - 32 vCPUs / host (Oct 2012) 4<sup>th</sup> generation Intel: m4 - 48 vCPUs / host (June 2015) ■ 5<sup>th</sup> generation Intel: m5 - 96 vCPUs / host (Nov 2017) ■ 6<sup>th</sup> generation Intel: m6i - 128 vCPUs / host (Aug 2021) ■ 6<sup>th</sup> generation AMD: m6a - 192 vCPUs / host (Nov 2021) October 4, 2022 L2.34

33

**HYPER THREADING** ■ Modern CPUs provide multiple instruction pipelines, supporting multiple execution threads, usually 2 to feed instructions to a single CPU core... ■ Two hyper-threads are not equivalent 4770 with HTT Vs. 4670 without HTT - 25% improvement w/ HTT to (2) CPU cores CPU Mark Relative to Top 10 Common CPUs As of 7th of February 2014 - Higher results represent better perfor i7-4770 and i5-4760 same CPU, with and without HTT ■ Example: → hyperthreads add +32.9% October 4, 2022 L2.35

**CLOUD COMPUTING: HOW DID WE GET HERE? - 2** ■ To make computing faster, we must go "parallel" Difficult to expose parallelism in scientific applications Not every problem solution has a parallel algorithm Chicken and egg problem... ■ Many commercial efforts promoting pure parallel programming efforts have failed ■ Enterprise computing world has been skeptical and less involved in parallel programming October 4, 2022

35 36

Slides by Wes J. Lloyd L2.6



**CLOUD COMPUTING: HOW DID WE GET HERE? - 4** ■ Big Data requires massive amounts of compute resources MAP - REDUCE Single instruction, multiple data (SIMD) Exploit data level parallelism Bioinformatics example October 4, 2022

**SMITH WATERMAN USE CASE** Applies dynamic programming to find best local alignment of two protein sequences Embarrassingly parallel, each task can run in isolation Use case for GPU acceleration AWS Lambda Serverless Computing Use Case: Goal: Pair-wise comparison of all unique human protein sequences (20,336) Python client as scheduler C Striped Smith-Waterman (SSW) execution engine From: Zhao M, Lee WP, Garrison EP, Marth GT: SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic application PLoS One 2013, 8:e82138 October 4, 2022 L2.39

**SMITH WATERMAN RUNTIME** Laptop server and client (2-core, 4-HT): 8.7 hours AWS Lambda FaaS, laptop as client: 2.2 minutes Partitions 20,336 sequences into 41 sets Execution cost: ~ 82¢ (~237x speed-up) AWS Lambda server, EC2 instance as client: 1.28 minutes Execution cost: ~ 87¢ (~408x speed-up) Hardware Laptop client: Intel i5-7200U 2.5 GHz :4 HT. 2 CPU Cloud client: EC2 Virtual Machine - m5.24xlarge: 96 vCPUs Cloud server: Lambda ~1000 Intel E5-2666v3 2.9GHz CPUs October 4, 2022

39



**OBJECTIVES** Cloud Computing: How did we get here? Parallel and distributed systems (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition) Data, thread-level, task-level parallelism Parallel architectures SIMD architectures, vector processing, multimedia extensions Graphics processing units Speed-up, Amdahl's Law, Scaled Speedup Properties of distributed systems Modularity TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Taco October 4, 2022

42

Slides by Wes J. Lloyd L2.7

38



PARALLELISM - 2

Coordination of nodes
Requires message passing or shared memory
Debugging parallel message passing code is easier than parallel shared memory code

Message passing: all of the interactions are clear
Coordination via specific programming API (MPI)

Shared memory: interactions can be implicit – must read the code!!

Processing speed is orders of magnitude faster than communication speed (CPU > memory bus speed)
Avoiding coordination actives the best speed-up

Cotaber 4, 2022

TSSSG2/S62: Coffmance Engineering (Col Cloud Computing (EIDI 2022)
School of Engineering and Technology, University of Washington - Tacoma

43 44



THREAD LEVEL PARALLELISM (TLP)

Number of threads an application runs at any one time
Varies throughout program execution
As a metric:
Minimum: 1 thread
Can measure average, maximum (peak)
QUESTION: What are the consequences of average (TLP) for scheduling an application to run on a computer with a fixed number of CPU cores and hyperthreads?
Let's say there are 4 cores, or 8 hyper-threads...
Key to avoiding waste of computing resources is knowing your application's TLP...

Costober 4, 2022

Costober 4, 2022

Costober 4, 2022

Costober 4, 2022

Costober 1, 2024

Costober 2, 2024

Costober 2, 2024

Costober 2, 2024

Costober 3, 2024

Costober 2, 2024

Costober 2, 2024

Costober 3, 2024

Costober 4, 2022

Costober 5, 2024

Costober 6, 2024

Costober 6, 2024

Costober 6, 2024

Costober 7, 2024

Co

45



CONTROL-FLOW ARCHITECTURE

Typical architecture used today - w/ multiple threads
By John von Neumann (1945)
Also called the Von Neumann architecture
Dominant computer system architecture
Program counter (PC) determines
next instruction to load into
instruction register
Program execution
is sequential

TCS3462/562: (Software Engineering for) Cloud Computing [Fall 2022]
School of Engineering and Technology, University of Washington - Tacoma

47 48

Slides by Wes J. Lloyd L2.8



DATA FLOW ARCHITECTURE

\*\*Alternate architecture\* used by network routers, digital signal processors, special purpose systems

\*\*Operations performed when input (data) becomes available

\*\*Envisioned to provide much higher parallelism

\*\*Multiple problems has prevented wide-scale adoption

\*\*Efficiently broadcasting data tokens in a massively parallel system

\*\*Efficiently dispatching instruction tokens in a massively parallel system

\*\*Building content addressable memory large enough to hold all of the dependencies of a real program

\*\*October 4, 2022\*\*

| Total Computer | Total Computing [Fall 2022] | Total Computing [Fall 2022] | Total Computing [Fall 2022] | Total Computing (Fall 2022) | Total Computing (Fa

49 50



BIT-LEVEL PARALLELISM

Computations on large words (e.g. 64-bit integer) are performed as a single instruction
Fewer instructions are required on 64-bit CPUs to process larger operands (A+B) providing dramatic performance improvements
Processors have evolved: 4-bit, 8-bit, 16-bit, 32-bit, 64-bit

QUESTION: How many Instructions are required to add two 64-bit numbers on a 16-bit CPU? (Intel 8088)

64-bit MAX int = 9,223,372,036,854,775,807 (signed)

16-bit MAX int = 32,767 (signed)
Intel 8088 - limited to 16-bit registers

Cotober 4, 2022

TCSS462/S62: (Software Engineering for) Gloud Computing [Fall 2022] school of Engineering and Technology, University of Washington - Tacoma

51



CPU PIPELINING

Clack Cycle

O 1 2 3 4 5 5 7 8 9

Instructions

Waiting Instructions

Sura 1 feet

Sura 3 feet

Sura 4 fee

53 54

Slides by Wes J. Lloyd L2.9



Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

TCS462/562: [Software Engineering for] Good Computing [fall 2022]
School of Engineering and Technology, University of Washington - Taccoms

55 56

## MICHAEL FLYNN'S COMPUTER ARCHITECTURE TAXONOMY Michael Flynn's proposed taxonomy of computer architectures based on concurrent instructions and number of data streams (1966) SISD (Single Instruction Single Data) SIMD (Single Instruction, Multiple Data) MIMD (Multiple Instructions, Multiple Data) LESS COMMON: MISD (Multiple Instructions, Single Data) Pipeline architectures: functional units perform different operations on the same data For fault tolerance, may want to execute same instructions redundantly to detect and mask errors – for task replication

FLYNN'S TAXONOMY

SISD (Single Instruction Single Data)
Scalar architecture with one processor/core.
Individual cores of modern multicore processors are "SISD"

SIMD (Single Instruction, Multiple Data)
Supports vector processing
When SIMD instructions are issued, operations on individual vector components are carried out concurrently
Two 64-element vectors can be added in parallel
Vector processing instructions added to modern CPUs
Example: Intel MMX (multimedia) instructions

Cotober 4, 2022

| TCS5462/S62: (Software Engineering Fort Cloud Computing [Fall 2022] | School of Engineering and Technology, University of Washington - Tacoma

57

(SIMD): VECTOR PROCESSING ADVANTAGES

Exploit data-parallelism: vector operations enable speedups

Vectors architecture provide vector registers that can store entire matrices into a CPU register

SIMD CPU extension (e.g. MMX) add support for vector operations on traditional CPUs

Vector operations reduce total number of instructions for large vector operations

Provides higher potential speedup vs. MIMD architecture

Developers can think sequentially; not worry about parallelism

1CS462/562: (Software Engineering for) Cloud Computing (Fall 2022) School of Engineering and Technology, University of Washington - Tacoma

FLYNN'S TAXONOMY - 2

MIMD (Multiple Instructions, Multiple Data) - system with several processors and/or cores that function asynchronously and independently

At any time, different processors/cores may execute different instructions on different data

Multi-core CPUs are MIMD

Processors share memory via interconnection networks

Hypercube, 2D torus, 3D torus, omega network, other topologies

MIMD systems have different methods of sharing memory

Uniform Memory Access (UMA)

Cache Only Memory Access (COMA)

Non-Uniform Memory Access (NUMA)

TCSS482/562: (Software Engineering for) Cloud Computing [fall 2022]
School of Engineering and Technology, University of Wishington - Tacoma

59 60

Slides by Wes J. Lloyd L2.10



ROOFLINE MODEL

When program reaches a given arithmetic intensity performance of code running on CPU hits a "roof"

CPU performance bottleneck changes from: memory bandwidth (left) → floating point performance (right)

Key take-aways: When a program's has low Arithmetic Intensity, memory bandwidth limits performance...

With high Arithmetic intensity, the system has peak parallel performance...

With high Arithmetic intensity, the system has peak parallel performance...

1CSS462/562: [Software Engineering for) Courd Computing [Fall 2022]
School of Engineering and Technology, University of Washington-Taxonal

61



GRAPHICAL PROCESSING UNITS (GPUs)

GPU provides multiple SIMD processors
Typically 7 to 15 SIMD processors each
32,768 total registers, divided into 16 lanes
(2048 registers each)
GPU programming model:
single instruction, multiple thread
Programmed using CUDA- C like programming
language by NVIDIA for GPUs
CUDA threads – single thread associated with each
data element (e.g. vector or matrix)
Thousands of threads run concurrently

october 4, 2022

TSSS62/562: [Coffuer Engineering for I Good Computing [Fall 2022]
scotlober 4, 2022

TESS62/562: [Coffuer Engineering for I Good Computing [Fall 2022]

63

Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

October 4, 2022

School of Engineering fort Cloud Computing (Fall 2022)
School of Engineering and Technology, University of Washington - Taccoma

PARALLEL COMPUTING

■ Parallel hardware and software systems allow:
■ Solve problems demanding resources not available on single system.
■ Reduce time required to obtain solution

■ The speed-up (S) measures effectiveness of parallelization:

S(N) = T(1) / T(N)

T(1) → execution time of total sequential computation T(N) → execution time for performing N parallel computations in parallel

October 4, 2022 | TCSS462/562: Coffwave Engineering for) Cloud Computing [Fell 2022] | School of Engineering and Technology, University of Washington - Tacoma | 12/26

65 66

Slides by Wes J. Lloyd L2.11

62



AMDAHL'S LAW

Amdahl's law is used to estimate the speed-up of a job using parallel computing

Divide job into two parts
Part A that will still be sequential
Part B that will be sped-up with parallel computing

Portion of computation which cannot be parallelized will determine (i.e. limit) the overall speedup

Amdahl's law assumes jobs are of a fixed size

Also, Amdahl's assumes no overhead for distributing the work, and a perfectly even work distribution

67



AMDAHL'S LAW EXAMPLE

Program with two independent parts:
Part A is 75% of the execution time
Part B is 25% of the execution time
Part B is made 5 times faster with
parallel computing
Estimate the percent improvement of task execution
Original Part A is 3 seconds, Part B is 1 second

N=5 (speedup of part B)
From Willippedia

69

GUSTAFSON'S LAW

Calculates the scaled speed-up using "N" processors S(N) = N + (1 - N) α

N: Number of processors α: fraction of program run time which can't be parallelized (e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program

Cotober 4, 2022

TCS462/562: (Software Engineering for) Cloud Computing [fall 2022]
School of Engineering and Technology, University of Washington - Tacoma

GUSTAFSON'S LAW

Calculates the scaled speed-up using "N" processors  $S(N) = N + (1 - N) \alpha$ N: Number of processors  $\alpha$ : fraction of program run time which can't be parallelized (e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program

Where  $\alpha = \sigma / (\pi + \sigma)$ Where  $\sigma$  = sequential time,  $\pi$  = parallel time

Our Amdahl's example:  $\sigma$  = 3s,  $\pi$  = 1s,  $\pi$  = .75

Cotober 4, 2022

TCSS462/SG2: (Software Engineering for) Gloud Computing [Fall 2022] School of Engineering and Technology, University of Washington-Taccoma

71 72

Slides by Wes J. Lloyd L2.12

68



GUSTAFSON'S EXAMPLE

\*\*QUESTION:
What is the maximum theoretical speed-up on a 2-core CPU?  $S(N) = N + (1 - N) \alpha$   $N = 2, \alpha = .75$  S(N) = 2 + (1 - 2) .75 S(N) = ?\*\*What is the maximum theoretical speed-up on a 16-core CPU?  $S(N) = N + (1 - N) \alpha$   $N = 16, \alpha = .75$  S(N) = 16 + (1 - 16) .75 S(N) = ?\*\*October 4, 2022 TCSS462/562: (Software Engineering for) Good Computing (full 2022)
Shood of Engineering and Technology, University of Washington - Tacomas

73



75

Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

Cotober 4, 2022

Indicator Computing Fail 2022 School of Engineering and Technology, University of Washington - Tacoma

1.77

**DISTRIBUTED SYSTEMS** Collection of autonomous computers, connected through a network with distribution software called "middleware" that enables coordination of activities and sharing of resources Key characteristics: Users perceive system as a single, integrated computing facility. ■ Compute nodes are autonomous Scheduling, resource management, and security implemented by every node Multiple points of control and failure Nodes may not be accessible at all times System can be scaled by adding additional nodes Availability at low levels of HW/software/network reliability TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma October 4, 2022

77 78

Slides by Wes J. Lloyd L2.13

74



TRANSPARENCY PROPERTIES OF DISTRIBUTED SYSTEMS

\*\*Access transparency: local and remote objects accessed using identical operations

\*\*Location transparency: objects accessed w/o knowledge of their location.

\*\*Concurrency transparency: several processes run concurrently using shared objects w/o interference among them

\*\*Replication transparency: multiple instances of objects are used to increase reliability

- users are unaware if and how the system is replicated

\*\*Failure transparency: concealment of faults

\*\*Migration transparency: concealment of faults

\*\*Migration transparency: objects are moved w/o affecting operations performed on them

\*\*Performance transparency: system can be reconfigured based on load and quality of service requirements

\*\*Scaling transparency: system and applications can scale w/o change in system structure and w/o affecting applications

\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Concerts\*\*Co

79 80



TYPES OF MODULARITY Soft modularity: TRADITIONAL Divide a program into modules (classes) that call each other and communicate with shared-memory A procedure calling convention is used (or method invocation) ■ Enforced modularity: CLOUD COMPUTING Program is divided into modules that communicate only through message passing ■ The ubiquitous client-server paradigm Clients and servers are independent decoupled modules System is more robust if servers are stateless May be scaled and deployed separately ■ May also FAIL separately! TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Taco October 4, 2022 L2.82

81

**CLOUD COMPUTING - HOW DID WE GET HERE? SUMMARY OF KEY POINTS** ■ Multi-core CPU technology and hyper-threading ■ What is a Heterogeneous system? Homogeneous system? • Autonomous or self-organizing system? Fine grained vs. coarse grained parallelism Parallel message passing code is easier to debug than shared memory (e.g. p-threads) ■ Know your application's max/avg Thread Level Parallelism (TLP) ■ Data-level parallelism: Map-Reduce, (SIMD) Single Instruction Multiple Data, Vector processing & GPUs TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma October 4, 2022 L2.83

**CLOUD COMPUTING - HOW DID WE GET HERE? SUMMARY OF KEY POINTS - 2** Bit-level parallelism Instruction-level parallelism (CPU pipelining) Flynn's taxonomy: computer system architecture classification • SISD - Single Instruction, Single Data (modern core of a CPU) • SIMD - Single Instruction, Multiple Data (Data parallelism) • MIMD - Multiple Instruction, Multiple Data MISD is RARE; application for fault tolerance... Arithmetic intensity: ratio of calculations vs memory RW Roofline model: Memory bottleneck with low arithmetic intensity • GPUs: ideal for programs with high arithmetic intensity SIMD and Vector processing supported by many large registers TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tac October 4, 2022 L2.84

83 84

Slides by Wes J. Lloyd L2.14

[Fall 2022]

TCSS 462: Cloud Computing TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma





Slides by Wes J. Lloyd L2.15