





TCSS 562 - Online Daily Feedback Survey - 10/5 **Ouiz Instructions** On a scale of 1 to 10, please classify your perspective on material covered in today's TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Tacom



MATERIAL / PACE Please classify your perspective on material covered in today's class (53 respondents): ■ 1-mostly review, 5-equal new/review, 10-mostly new - Average - 6.16 (↓ - first day f2023 - 6.79) Please rate the pace of today's class: ■ 1-slow, 5-just right, 10-fast ■ Average - 5.55 (\psi - first day f2023 - 5.66) Response rates: ■ TCSS 462: 37/41 - 90.2% (enrollment increase from Thurs  $41 \rightarrow 44$ ) ■ TCSS 562: 16/20 - 80.0% (enrollment increase from Thurs  $20 \rightarrow 21$ ) TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco October 1, 2024

6

Slides by Wes J. Lloyd L2.1



FEEDBACK - 2

I'm pretty unfamiliar with the term "serverless" computing, is that synonymous with cloud computing, or is it a distinct paradigm?

Serverless computing is an attribute of cloud services
Serverless cloud services do not require the user to provision infrastructure (i.e. virtual machines or servers)
The paradigm of serverless cloud services did not become predominant until -2016-2018
Many services, such as Amazon RDS (Relational Database Service) are 'serverful'. Using these services requires the user to provision an always-on device that sits idle and bills the customer for idle time
Popular services such as Amazon DocumentDB (aka MongoDB), and ElasticCache (aka redis) can have fixed deployments where the user must specific a 'VM' size (# of cores, ram)
Serverful services may be limited to vertical scaling

FEEDBACK - 3 Will we be going over SaaS (Software-as-a-Service) in this • What are examples of Software-as-a-Service ? Software-as-a-Service as software applications hosted as-a-service in the cloud Can you think of some you use everyday? MS Outlook • Office 365 Google Docs UW Workday GitHub A key feature of SaaS is customized configurations and deployments to support large scale users, i.e. University of Washington SaaS is cloud-provider hosted web applications where the user pays annual licensing fees for upkeep, etc. TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco October 1, 2024 12.9

FEEDBACK - 4

If the cloud provider (e.g. Amazon) puts you on a slower cloud server, do you just pay more for less performance given it takes more overall time?

YES - this is 'double whammy' of cloud computing

'Double Whammy' was made famous by a gameshow called 'Press your Luck'

A 'double whammy' is a twofold blow or setback

INFLATION: when the price of a good increases, so does the sales tax

With cloud computing, when the cloud service bills based on time, then slow performance due to cloud provider hardware (type and state), results in a higher customer bill

As customers, how can we avoid slow(er) servers?

9

Daily Feedback Surveys
Questions from Course Introduction

Demographics Survey

AWS Cloud Credits Survey

Tutorial 0 - Getting Started with AWS

Tutorial 1 - Intro to Linux

Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

October 1, 2024

TCSS462/562-Softwave Engineering for I Cloud Computing [Fell 2024]
School of Engineering and Technology, University of Washington - Tacoma

Please complete the ONLINE demographics survey:

We have received 19 of 65 responses so far.
Walting on 46 responses.
Class Office hours are set based on demographics Survey

https://forms.gle/6ER7PzfP521vdxYW9

Random drawing based on survey participants for two \$20 Amazon or Starbucks gift cards - October 8th in class

Linked from course webpage in Canvas:
http://faculty.washington.edu/wlloyd/courses/tcss562/announcements.html

| CSS462/S62: (Software Engineering for) Cloud Computing [Fall 2024] | Soptember 28, 2023 | ILLI2 | September 28, 2023 | ILLI2 | IL

11 12

Slides by Wes J. Lloyd L2.2



AWS CLOUD CREDITS SURVEY

Please complete the AWS Cloud Credits survey

Please complete as part of Tutorial 0

https://forms.gle/fmKkLZbxZECbAay16

Linked from course webpage in Canvas:

http://faculty.washington.edu/wlloyd/courses/tcss562/announcements.html

TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

13



OBJECTIVES - 10/1

Questions from Course Introduction
Daily Feedback Surveys
Demographics Survey
AWS Cloud Credits Survey
Tutorial 0 - Getting Started with AWS
Tutorial 1 - Intro to Linux
Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

October 1, 2024

TCSS462/562/Software Engineering for Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

15



OBJECTIVES - 10/1

 Dally Feedback Surveys
 Questions from Course Introduction
 Demographics Survey
 AWS Cloud Credits Survey

 Tutorial 0 - Getting Started with AWS
 Tutorial 1 - Intro to Linux

 Cloud Computing - How did we get here?
 Chapter 4 Marinescu 2<sup>nd</sup> edition:
 Introduction to parallel and distributed systems

 October 1, 2024

| TCSS462/562:|Software Engineering for) Cloud Computing | Fall 2024|
| School of Engineering and Technology, University of Washington - Tacoma

| 12.18

17 18

Slides by Wes J. Lloyd L2.3

14



CLOUD COMPUTING:
HOW DID WE GET HERE?

General interest in parallel computing
Moore's Law - # of transistors doubles every 18 months
Post 2004: heat dissipation challenges:
can no longer easily increase cloud speed
Overclocking to 7GHz takes
more than just liquid nitrogen:
https://tinyurl.com/y93s2yz2

Solutions:
Vary CPU clock speed
Add CPU cores
Multl-core technology

TCSS42/562: (Software Engineering for) Cloud Computing [fall 2024]
School of Engineering and Technology, University of Mashington - Tacoma

19



HYPER THREADING ■ Modern CPUs provide multiple instruction pipelines, supporting multiple execution threads, usually 2 to feed instructions to a single CPU core... ■ Two hyper-threads are not equivalent 4770 with HTT Vs. 4670 without HTT - 25% improvement w/ HTT to (2) CPU cores CPU Mark Relative to Top 10 Common CPUs As of 7th of February 2014 - Higher results represent better perfor ■ i7-4770 and i5-4760 same CPU, with and without HTT ■ Example: → hyperthreads add +32.9% TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco October 1, 2024

21

AMD'S 64-CORE 7NM CPUS ■ Epyc Rome CPUs Announced August 2019 ■ EPYC 7H12 requires liquid cooling EPYC 7702 64 / 128 2.00 3.35 256 MB 200 W EPYC 7642 48 / 96 2.30 3.20 256 MB 225 W 2.20 EPYC 7552 48 / 96 3.30 192 MB 200 W \$4025 October 1, 2024 L2.23

AMD EPYC 9654/9654P/9684X/9R14 (AWS 0EM):

June 2023: 96 cores, 192 hyper-threads CPUs

Mixes 4nm:APU (combines CPUs+GPU), 5nm:L3 cache
(8 CPU-chiplet), and 6nm:I/O dies, 2.25 to 3.7 burst
GHz, up to 400 watts

\$10,625 to \$14,756
AMD EPYC 9754: 128 cores, 256 hyperthreads!

2.25 to 3.1 burst GHz, 360 watts

\$11,900
AMD EPYC 9005: 192 cores, 384 threads, 3nm (in dev)

23 24

Slides by Wes J. Lloyd L2.4

20



ARM64 HOST SERVER VCPUS - AMAZON EC2 **INFRASTRUCTURE-AS-A-SERVICE CLOUD** Cloud server virtual CPUs/host (ARM64) Launched in 2018 on the Amazon Compute Cloud (EC2) • 64-bit ARM CPUs designed by AWS subsidiary Annapurna Labs Lower energy consumption compared to x86-64 Fixed (non-variable) clock rates, No hyperthreading ■ Each new release - performance boost of ~ 30% Cost savings of ~20% less for ARM resources on AWS ■ 1st generation Graviton: a1- 16 vCPUs / host (Nov 2018) ■ 2<sup>nd</sup> generation Graviton2: m6g- 64 vCPUs/host (Dec 2019) AWS Lambda limited to Graviton2 ■ 3<sup>rd</sup> generation Graviton3: m7g-64 vCPUs/host (May 2022) 4th generation Graviton4: m8g- 192 vCPUs/host(Sept 2024) October 1, 2024 TCSS462/562: (Sof School of Engineer

25 26

## CLOUD COMPUTING: HOW DID WE GET HERE? - 2 To make computing faster, we must go "parallel" Difficult to expose parallelism in scientific applications Not every problem solution has a parallel algorithm Chicken and egg problem... Many commercial efforts promoting pure parallel programming efforts have failed Enterprise computing world has been skeptical and less involved in parallel programming

CLOUD COMPUTING:
HOW DID WE GET HERE? - 3

Cloud computing provides access to "infinite" scalable compute infrastructure on demand
Infrastructure availability is key to exploiting parallelism

Cloud applications
Based on client-server paradigm
Thin clients leverage compute hosted on the cloud
Applications run many web service instances
Employ load balancing

27



SMITH WATERMAN USE CASE

Applies dynamic programming to find best local alignment of two protein sequences
Embarrassingly parallel, each task can run in isolation
Use case for GPU acceleration

AWS Lambda Serverless Computing Use Case:
Goal: Pair-wise comparison of all unique human protein sequences (20,336)

Python client as scheduler
C Striped Smith-Waterman (SSW) execution engine
From: Zhao M, Lee WP, Garrison EP, Marth GT. SSW library: an SIMD Smith-Waterman (C/C++ library for use in genomic applications.
PLoS One 2013, 8:e82138

TSSIGN/SGZ: Cioftwave Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington-Tacona

29 30

Slides by Wes J. Lloyd L2.5



CLOUD COMPUTING:
HOW DID WE GET HERE? - 3

Compute clouds are large-scale distributed systems
Heterogeneous systems
Homogeneous systems
Autonomous
Self organizing

31



**PARALLELISM**  Discovering parallelism and development of parallel algorithms requires considerable effort Example: numerical analysis problems, such as solving large systems of linear equations or solving systems of Partial Differential Equations (PDEs), require algorithms based on domain decomposition methods. How can problems be split into independent chunks? Fine-grained parallelism Only small bits of code can run in parallel without coordination Communication is required to synchronize state across nodes Coarse-grained parallelism Large blocks of code can run without coordination TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Tacc October 1, 2024 L2.34

33

Coordination of nodes
 Requires message passing or shared memory
 Debugging parallel message passing code is easier than parallel shared memory code

 Message passing: all of the interactions are clear
 Coordination via specific programming API (MPI)

 Shared memory: interactions can be implicit – must read the code!!

Processing speed is orders of magnitude faster than communication speed (CPU > memory bus speed)

Avoiding coordination achieves the best speed-up

 Cotober 1, 2024

| TSSA62/562 [Software Engineering for] Cloud Computing [fall 2024] | School of Engineering and Technology, University of Washington - Taxoma

| L355|

TYPES OF PARALLELISM

Parallelism:
Goal: Perform multiple operations at the same time to achieve a speed-up

Thread-level parallelism (TLP)
Control flow architecture (Von Neumann architecture)
Data-level parallelism
Data flow architecture
Bit-level parallelism
Instruction-level parallelism (ILP)

October 1, 2024

TCSS462/S62: Coffware Engineering for) Gloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

35 36

Slides by Wes J. Lloyd L2.6

32





37



Partition data into big chunks, run separate copies of the program on them with little or no communication

Problems are considered to be embarrassingly parallel

Also perfectly parallel or pleasingly parallel...

Little or no effort needed to separate problem into a number of parallel tasks

MapReduce programming model is an example

October 1, 2024

TCSS462/S62: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Tacoma

39

DATA FLOW ARCHITECTURE

Alternate architecture used by network routers, digital signal processors, special purpose systems

Operations performed when input (data) becomes available

Envisioned to provide much higher parallelism

Multiple problems has prevented wide-scale adoption

Efficiently broadcasting data tokens in a massively parallel system

Efficiently dispatching instruction tokens in a massively parallel system

Building content addressable memory large enough to hold all of the dependencies of a real program

October 1, 2024

| ISSMG/SGS: (Software Engineering fool Cloud Computing [Fall 2024] | School of Engineering and Technology, University of Washington - Tacoma

| Detail | Instrument | Instr

DATA FLOW ARCHITECTURE - 2

Architecture not as popular as control-flow

Modern CPUs emulate data flow architecture for dynamic instruction scheduling since the 1990s

Out-of-order execution - reduces CPU idle time by not blocking for instructions requiring data by defining execution windows

Execution windows: identify instructions that can be run by data dependency

Instructions are completed in data dependency order within execution window

Execution window

Execution window size typically 32 to 200 instructions

Utility of data flow architectures has been much less than envisioned

October 1, 2024

October 1, 2024

TASSAG2/56: Confume Engineering forl Goud Computing [Fall 2024]

TASSAG2/56: Confume Engineering forl Goud Computing [Fall 2024]

41 42

Slides by Wes J. Lloyd L2.7



**BIT-LEVEL PARALLELISM** Computations on large words (e.g. 64-bit integer) are performed as a single instruction ■ Fewer instructions are required on 64-bit CPUs to process larger operands (A+B) providing dramatic performance improvements Processors have evolved: 4-bit, 8-bit, 16-bit, 32-bit, 64-bit **QUESTION:** How many instructions are required to add two 64-bit numbers on a 16-bit CPU? (Intel 8088) **64-bit MAX int = 9,223,372,036,854,775,807 (signed)** ■ 16-bit MAX int = 32,767 (signed) ■ Intel 8088 - limited to 16-bit registers

44



**CPU PIPELINING** October 1, 2024

45

**INSTRUCTION LEVEL PARALLELISM - 2** RISC CPU: ■ After 5 clock cycles, all 5 stages of an instruction are Starting with 6<sup>th</sup> clock cycle, one full instruction completes each cycle ■ The CPU performs 5 tasks per clock cycle! Fetch, decode, execute, memory read, memory write back ■ Pentium 4 (CISC CPU) - processing pipeline w/ 35 stages! October 1, 2024 L2.47

**OBJECTIVES** Cloud Computing: How did we get here? Parallel and distributed systems (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition) Data, thread-level, task-level parallelism Parallel architectures SIMD architectures, vector processing, multimedia extensions Graphics processing units Speed-up, Amdahl's Law, Scaled Speedup Properties of distributed systems Modularity October 1, 2024

47 48

Slides by Wes J. Lloyd L2.8

## MICHAEL FLYNN'S COMPUTER ARCHITECTURE TAXONOMY Michael Flynn's proposed taxonomy of computer architectures based on concurrent instructions and number of data streams (1966) SISD (Single Instruction Single Data) SIMD (Single Instruction, Multiple Data) MIMD (Multiple Instructions, Multiple Data) LESS COMMON: MISD (Multiple Instructions, Single Data) Pipeline architectures: functional units perform different operations on the same data For fault tolerance, may want to execute same instructions redundantly to detect and mask errors – for task replication October 1, 2024 TSG660/762: Sistemer Expenseries (or) Courd Computing (cell 2024)

FLYNN'S TAXONOMY

SISD (Single Instruction Single Data)
Scalar architecture with one processor/core.
Individual cores of modern multicore processors are "SISD"

SIMD (Single Instruction, Multiple Data)
Supports vector processing
When SIMD instructions are issued, operations on individual vector components are carried out concurrently
Two 64-element vectors can be added in parallel
Vector processing instructions added to modern CPUs
Example: Intel MMX (multimedia) instructions

49



FLYNN'S TAXONOMY - 2 ■ MIMD (Multiple Instructions, Multiple Data) - system with several processors and/or cores that function asynchronously and independently At any time, different processors/cores may execute different instructions on different data ■ Multi-core CPUs are MIMD Processors share memory via interconnection networks Hypercube, 2D torus, 3D torus, omega network, other topologies MIMD systems have different methods of sharing memory Uniform Memory Access (UMA) Cache Only Memory Access (COMA) Non-Uniform Memory Access (NUMA) TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco October 1, 2024 L2.52

51

**ARITHMETIC INTENSITY** Arithmetic intensity: Ratio of work (W) to memory traffic r/w (Q) Example: # of floating-point ops per byte of data read Characterizes application scalability with SIMD support SIMD can perform many fast matrix operations in parallel High arithmetic Intensity: Programs with dense matrix operations scale up nicely (many calcs vs memory RW, supports lots of parallelism) Low arithmetic intensity: Programs with sparse matrix operations do not scale well with problem size (memory RW becomes bottleneck, not enough ops!) October 1, 2024 L2.53

ROOFLINE MODEL

When program reaches a given arithmetic intensity performance of code running on CPU hits a "roof"

CPU performance bottleneck changes from:

memory bandwidth (left) → floating point performance (right)

Key take-aways:
When a program's has low Arithmetic Intensity, memory bandwidth limits performance...

With high Arithmetic intensity, the system has peak parallel performance...

With high Arithmetic intensity, the system has peak parallel performance...

> performance is limited by??

October 1, 2024

ICSS462/502: [Software Engineering of Octool Computing [Fall 2024] School of Engineering and Technology, University of Washington - Tacoma

53 54

Slides by Wes J. Lloyd L2.9

50



CPU **GPU** ALU Low compute density \* High compute density Complex control logic High Computations per Memory Access Large caches (L1\$/L2\$, etc.) Built for parallel operations Optimized for serial operations ics is the best kno Deep pipelines (hundreds of stages) Higher clock speed Shallow pipelines (<30 stages) High Throughput Low Latency Tolerance \* High Latency Tolerance Newer CPUs have more parallelism Newer GPUs: Don't have one-way pipelines anym TCSS462/562: (Software En School of Engineering and To October 1, 2024

56



**OBJECTIVES** Cloud Computing: How did we get here? Parallel and distributed systems (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition) Data, thread-level, task-level parallelism Parallel architectures SIMD architectures, vector processing, multimedia extensions Graphics processing units Speed-up, Amdahl's Law, Scaled Speedup Properties of distributed systems Modularity TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco October 1, 2024 L2.58

57

PARALLEL COMPUTING ■ Parallel hardware and software systems allow: Solve problems demanding resources not available on single system. Reduce time required to obtain solution ■The speed-up (S) measures effectiveness of parallelization: S(N) = T(1) / T(N) $T(1) \rightarrow$  execution time of total sequential computation  $T(N) \rightarrow \text{execution time for performing N parallel}$ computations in parallel TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco October 1, 2024 L2.59

**SPEED-UP EXAMPLE**  Consider embarrassingly parallel image processing Eight images (multiple data) Apply image transformation (greyscale) in parallel ■ 8-core CPU, 16 hyper threads Sequential processing: perform transformations one at a time using a single program thread 8 images, 3 seconds each: T(1) = 24 seconds Parallel processing 8 images, 3 seconds each: T(N) = 3 seconds • Speedup: S(N) = 24 / 3 = 8x speedup Called "perfect scaling" Must consider data transfer and computation setup time October 1, 2024 L2.60

59 60

Slides by Wes J. Lloyd L2.10



 $S = \frac{1}{(1-f) + \frac{f}{N}}$  = S = theoretical speedup of the whole task  $= f = \text{fraction of work that is parallel} \qquad (ex. 25\% \text{ or } 0.25)$   $= N = \text{proposed speed up of the parallel part} \qquad (ex. 5 \text{ times speedup})$  = % improvement  $= f \text{ task execution} \qquad = 100 * (1 - (1/S))$  = Using Amdahl's law, what is the maximum possible speed-up?  $= \frac{1}{\text{CCSMeZ/SGZ: (Software Engineering for) Cloud Computing [Fall 2024]}}{\text{School of Engineering and Technology, University of Washington-Tacoma}}$ 

61



Calculates the scaled speed-up using "N" processors  $S(N) = N + (1 - N) \alpha$ N: Number of processors  $\alpha$ : fraction of program run time which can't be parallelized (e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program

TCSS462/552: [Software Engineering for) Cloud Computing [Fall 2024] Solood of Engineering and Technology, University of Vashington-Tacoma

63

GUSTAFSON'S LAW

Calculates the scaled speed-up using "N" processors  $S(N) = N + (1 - N) \alpha$ N: Number of processors  $\alpha$ : fraction of program run time which can't be parallelized (e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program

Where  $\alpha = \sigma / (\pi + \sigma)$ Where  $\sigma$  sequential time,  $\pi$  = parallel time

Our Amdahl's example:  $\sigma$  = 3s,  $\pi$  = 1s,  $\alpha$  = .75

Calculates the scaled speed-up using "N" processors

S(N) = N + (1 - N) α

N: Number of processors
α: fraction of program run time which can't be parallelized
(e.g. must run sequentially)

Example:
Consider a program that is embarrassingly parallel, but 75% cannot be parallelized. α=.75

QUESTION: If deploying the job on a 2-core CPU, what scaled speedup is possible assuming the use of two processes that run in parallel?

October 1, 2024

TCSS402/562: (Software Engineering for) Cloud Computing [full 2024]
School of Engineering and Technology, University of Machington - Taxons

65 66

Slides by Wes J. Lloyd L2.11

62



**GUSTAFSON'S EXAMPLE** • QUESTION: What is the maximum theoretical speed-up on a 2-core CPU?  $S(N) = N + (1 - N) \alpha$ N=2, α= For 2 CPUs, speed up is 1.25x S(N) = 2S(N) = ?For 16 CPUs, speed up is 4.75x ■ What is the  $S(N) = N + (1 - N) \alpha$  $N=16. \alpha = .75$ S(N) = 16 + (1 - 16).75S(N) = ?October 1, 2024 L2.68

67

■ Transistors on a chip doubles approximately every 1.5 years

■ CPUs now have billions of transistors

■ Power dissipation issues at faster clock rates leads to heat removal challenges

• Transition from: increasing clock rates → to adding CPU cores

■ Symmetric core processor - multi-core CPU, all cores have the same computational resources and speed

■ Asymmetric core processor - on a multi-core CPU, some cores have more resources and speed

■ Dynamic core processor - processing resources and speed can be dynamically configured among cores

■ Observation: asymmetric processors offer a higher speedup.

October 1, 2024

| TCS462/S62: [Software Engineering For) Cloud Computing [Fall 2024] | School of Engineering and Technology, University of Washington - Tacoma

Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

Ctober 1, 2024

ICSS462/S62: [Software Engineering for] Cloud Computing [Fall 2024]

School of Engineering and Technology, University of Washington - Tacoma

12.70

69

| Key non-functional attributes
| Known as "ilities" in software engineering

| Availability - 24/7 access?
| Reliability - Fault tolerance
| Accessibility - reachable?
| Usability - user friendly
| Understandability - can under
| Scalability - responds to variable demand
| Extensibility - can be easily modified, extended
| Maintainability - can be easily fixed
| Consistency - data is replicated correctly in timely manner

| October 1, 2024 | TCSS42/562: (Software Engineering for) Cloud Computing [Fall 2024] | School of Engineering and Technology, University of Washington - Tacoma | 12.72

71 72

Slides by Wes J. Lloyd L2.12

68



Cloud Computing: How did we get here?
 Parallel and distributed systems
 (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)
 Data, thread-level, task-level parallelism
 Parallel architectures
 SIMD architectures, vector processing, multimedia extensions
 Graphics processing units
 Speed-up, Amdahl's Law, Scaled Speedup
 Properties of distributed systems
 Modularity

October 1, 2024
 TISS462/S62: (Software Engineering for) Cloud Computing [Fall 2024]
 School of Engineering and Technology, University of Washington - Tacoma

1274

73 74



**CLOUD COMPUTING - HOW DID WE GET HERE?** SUMMARY OF KEY POINTS ■ Multi-core CPU technology and hyper-threading ■ What is a Heterogeneous system? Homogeneous system? • Autonomous or self-organizing system? Fine grained vs. coarse grained parallelism Parallel message passing code is easier to debug than shared memory (e.g. p-threads) Know your application's max/avg Thread Level Parallelism (TLP) ■ Data-level parallelism: Map-Reduce, (SIMD) Single Instruction Multiple Data, Vector processing & GPUs TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Taco L2.76

75

**CLOUD COMPUTING - HOW DID WE GET HERE? SUMMARY OF KEY POINTS - 2**  Bit-level parallelism Instruction-level parallelism (CPU pipelining) Flynn's taxonomy: computer system architecture classification • SISD - Single Instruction, Single Data (modern core of a CPU) SIMD - Single Instruction, Multiple Data (Data parallelism) • MIMD - Multiple Instruction, Multiple Data MISD is RARE; application for fault tolerance... Arithmetic intensity: ratio of calculations vs memory RW = Roofline model: Memory bottleneck with low arithmetic intensity • GPUs: ideal for programs with high arithmetic intensity SIMD and Vector processing supported by many large registers TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024 School of Engineering and Technology, University of Washington - Tac October 1, 2024 L2.77

CLOUD COMPUTING – HOW DID WE GET HERE?

SUMMARY OF KEY POINTS - 3

Speed-up (S)
S(N) = T(1) / T(N)
Amdahl's law:
S=1 / ((1-f) + f/N),s=latency, f=parallel fraction, N=speed-up
α = percent of program that must be sequential
Scaled speedup with N processes:
S(N) = N - α(N-1)
Moore's Law
Symmetric core, Asymmetric core, Dynamic core CPU
Distributed Systems Non-function quality attributes
Distributed Systems - Types of Transparency
Types of modularity- Soft, Enforced

77 78

Slides by Wes J. Lloyd L2.13

CSS 462: Cloud Computing [Fall 2024]

TCSS 462: Cloud Computing TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma



79

Slides by Wes J. Lloyd L2.14