

**OBJECTIVES - 10/3** ■ Dally Feedback Surveys Questions from Course Introduction ■ Demographics Survey AWS Cloud Credits Survey ■ Tutorial 0 - Getting Started with AWS ■ Tutorial 1 - Intro to Linux Cloud Computing - How did we get here? (10/4) Chapter 4 Marinescu 2<sup>nd</sup> edition: Introduction to parallel and distributed systems October 3, 2023 L2.2



TCSS 562 - Online Daily Feedback Survey - 10/5 **Ouiz Instructions** On a scale of 1 to 10, please classify your perspective on material covered in today's

3

MATERIAL / PACE Please classify your perspective on material covered in today's class (58 respondents): ■ 1-mostly review, 5-equal new/review, 10-mostly new - Average - 6.79 (↓ - previous 7.43 f2022) Please rate the pace of today's class: ■ 1-slow, 5-just right, 10-fast Average - 5.66 (↓ - previous 5.83 f2022) Response rates: TCSS 462: 40/45 - 88.9% ■ TCSS 562: 18/24 - 75.0% TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2023] School of Engineering and Technology, University of Washington - Tacoma October 3, 2023 L5.5 5

FEEDBACK FROM 9/28 I was not clear on whether the term project group needs to consist of at most 4 people or exactly 4 people • Ideally groups will be 4 people Even with 4 people per team there will still be 18 groups (large number) Smaller groups (< 4 people) have fewer resources</li> Larger groups may be more difficult to coordinate Are graduate students required to form groups of their own, or can groups include undergraduate and graduate students Groups can consist of both undergrad and graduate students The grading criteria is the same The class presentation is separate from the term project TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2023] School of Engineering and Technology, University of Washington - Taco October 3, 2023

Slides by Wes J. Lloyd L2.1



TCSS562 - SOFTWARE ENGINEERING
FOR CLOUD COMPUTING

Course webpage is embedded into Canvas
In CANVAS to access links:
RIGHT-CLICK - Open in new window

Daily Feedback Surveys online at:
http://faculty.washington.edu/wlloyd/courses/tcss562/

Grading
Schedule
Assignments

Ctober 3, 2023

TCSS462/562: (Software Engineering for) Cloud Computing [fall 2023]
School of Engineering and Technology, University of Washington - Tacoma

Daily Feedback Survey
 Questions from Course Introduction
 Demographics Survey
 AWS Cloud Credits Survey
 Tutorial 0 - Getting Started with AWS
 Tutorial 1 - Intro to Linux
 Cloud Computing - How did we get here?
 Chapter 4 Marinescu 2<sup>nd</sup> edition:
 Introduction to parallel and distributed systems

October 3, 2023
 TCS662/562:[Software Engineering for) Cloud Computing [fail 2023]
 School of Engineering and Technology, University of Washington - Taxoma

129

OBJECTIVES - 10/3

Daily Feedback Surveys
Questions from Course Introduction

Demographics Survey
AWS Cloud Credits Survey

Tutorial 0 - Getting Started with AWS
Tutorial 1 - Intro to Linux

Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

9

DEMOGRAPHICS SURVEY

Please complete the ONLINE demographics survey:

We have received 54 of 69 responses so far.

We are waiting on 15 responses.

https://forms.gle/QLiWGnHqbXDeNdYq7

Linked from course webpage in Canvas:

http://faculty.washington.edu/wlloyd/courses/tcss562/announcements.html

| TCSS462/562: (Software Engineering For) Cloud Computing [Fail 2023] school of Engineering and Technology, University of Washington - Tacoma

OBJECTIVES - 10/3

Daily Feedback Surveys
Questions from Course Introduction
Demographics Survey
AWS Cloud Credits Survey
Tutorial 0 - Getting Started with AWS
Tutorial 1 - Intro to Linux
Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

October 3, 2023

TCSS42/562/561/Software Engineering for Cloud Computing [Fall 2023]
School of Engineering and Technology, University of Washington - Tacoma

11 12

Slides by Wes J. Lloyd L2.2



13



OBJECTIVES - 10/3

Dally Feedback Surveys
Questions from Course Introduction
Demographics Survey
AWS Cloud Credits Survey
Tutorial 0 - Getting Started with AWS
Tutorial 1 - Intro to Linux

Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

October 3, 2023

ICCS662/S62/Software Engineering for) Cloud Computing [Fall 2023]
School of Engineering and Technology, University of Washington - Tacoma

15



CLOUD COMPUTING:
HOW DID WE GET HERE?

General interest in parallel computing
Moore's Law - # of transistors doubles every 18 months
Post 2004: heat dissipation challenges:
can no longer easily increase cloud speed
Overclocking to 7GHz takes
more than just liquid nitrogen:
https://tinyurl.com/y93s2yz2
Solutions:
Vary CPU clock speed
Add CPU cores
Multi-core technology

TISSS62/S62: Coffware Engineering for I Cloud Computing [Fall 2023]
school of Engineering and Technology, University of Washington - Tacoma

17 18

Slides by Wes J. Lloyd L2.3

14





19 20

**HOST SERVER VCPUS - AMAZON EC2** INFRASTRUCTURE-AS-A-SERVICE CLOUD ■ Cloud server virtual CPUs/host Growth since 2006 - Amazon Compute Cloud (EC2) ■ 1<sup>st</sup> generation Intel: m1 - 8 vCPUs / host (Aug 2006) ■ 2<sup>nd</sup> generation Intel: m2 - 16 vCPUs / host (Oct 2009) ■ 3<sup>rd</sup> generation Intel: m3 - 32 vCPUs / host (Oct 2012) ■ 4<sup>th</sup> generation Intel: m4 - 48 vCPUs / host (June 2015) ■ 5<sup>th</sup> generation Intel: m5 - 96 vCPUs / host (Nov 2017) ■ 6<sup>th</sup> generation Intel: m6i - 128 vCPUs / host (Aug 2021) ■ 6<sup>th</sup> generation AMD: m6a - 192 vCPUs / host (Nov 2021) October 3, 2023 12.21

HYPER THREADING ■ Modern CPUs provide multiple instruction pipelines, supporting multiple execution threads, usually 2 to feed instructions to a single CPU core... ■ Two hyper-threads are not equivalent 4770 with HTT Vs. 4670 without HTT - 25% improvement w/ HTT to (2) CPU cores CPU Mark Relative to Top 10 Common CPUs As of 7th of February 2014 - Higher results represent better perfor ■ i7-4770 and i5-4760 same CPU, with and without HTT ■ Example: → hyperthreads add +32.9% TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2023] School of Engineering and Technology, University of Washington - Taco October 3, 2023 L2.22

21

CLOUD COMPUTING:
HOW DID WE GET HERE? - 2

To make computing faster, we must go "parallel"
Difficult to expose parallelism in scientific applications
Not every problem solution has a parallel algorithm
Chicken and egg problem...
Many commercial efforts promoting pure parallel programming efforts have failed
Enterprise computing world has been skeptical and less involved in parallel programming

CLOUD COMPUTING:
HOW DID WE GET HERE? - 3

Cloud computing provides access to "infinite" scalable compute infrastructure on demand
Infrastructure availability is key to exploiting parallelism

Cloud applications
Based on client-server paradigm
Thin clients leverage compute hosted on the cloud
Applications run many web service instances
Employ load balancing

Ctober 3, 2023

TCS462/562: [Goftware Engineering for) Cloud Computing [Fail 2023]
School of Engineering and Technology, University of Washington - Taxoma

23 24

Slides by Wes J. Lloyd L2.4





SMITH WATERMAN RUNTIME

Laptop server and client (2-core, 4-HT): 8.7 hours

AWS Lambda FaaS, laptop as client: 2.2 minutes
Partitions 20,336 sequences into 41 sets
Execution cost: ~82¢ (~237x speed-up)

AWS Lambda server, EC2 instance as client: 1.28 minutes
Execution cost: ~87¢ (~408x speed-up)

Hardware
Laptop client: Intel i5-7200U 2.5 GHz :4 HT, 2 CPU
Cloud client: EC2 Virtual Machine - m5.24xlarge: 96 vCPUs
Cloud server: Lambda ~1000 Intel E5-2666v3 2.9GHz CPUs

October 3, 2023

CLOUD COMPUTING:
HOW DID WE GET HERE? - 3

Compute clouds are large-scale distributed systems
Heterogeneous systems
Homogeneous systems
Autonomous
Self organizing

27

Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism
Parallel architectures

SIMD architectures, vector processing, multimedia extensions
Graphics processing units
Speed-up, Amdahl's Law, Scaled Speedup
Properties of distributed systems
Modularity

October 3, 2023

TCSS462/562: (Software Engineering for) Cloud Computing [fail 2023] school of Engineering and Technology, University of Washington - Tacoma

PARALLELISM

Discovering parallelism and development of parallel algorithms requires considerable effort

Example: numerical analysis problems, such as solving large systems of linear equations or solving systems of Partial Differential Equations (PDEs), require algorithms based on domain decomposition methods.

How can problems be split into Independent chunks?

Fine-grained parallelism

Only small bits of code can run in parallel without coordination

Communication is required to synchronize state across nodes

Coarse-grained parallelism

Large blocks of code can run without coordination

TCSS462/562: (Grithwe Engineering fol Cloud Computing [fall 2023] school of Engineering and Technology, University of Washington - Tacoma

29 30

Slides by Wes J. Lloyd L2.5



TYPES OF PARALLELISM

Parallelism:
Goal: Perform multiple operations at the same time to achieve a speed-up

Thread-level parallelism (TLP)
Control flow architecture
Data-level parallelism
Data flow architecture
Bit-level parallelism
Instruction-level parallelism (ILP)

October3, 2023

TCSS462/562: (Software Engineeing for) Cloud Computing [Fall 2023]
School of Engineeing and Technology, University of Washington - Tacoma

31 32



TLP - PRIMES EXAMPLE

# Multi-threaded prime number generation
# Compute-bound workload
# Can use variable # of threads
# Generates n prime numbers
# Runtimes: 100,000 primes
# 1 thread: 59.15 s
# 2 threads: 30.957 s
# 4 threads: 15.539 s
# 8 threads: 12.112 s
# Observe TLP with top

time ./primes8 30000 >/dev/null

October3,2023

| TCS462/552: [Software Engineering Forl Cloud Computing [fall 2023] | School of Engineering and Technology, University of Washington - Tacoma | 1234

33





35 36

Slides by Wes J. Lloyd L2.6

## Partition data into big chunks, run separate copies of the program on them with little or no communication Problems are considered to be embarrassingly parallel Also perfectly parallel or pleasingly parallel... Little or no effort needed to separate problem into a number of parallel tasks MapReduce programming model is an example October 3, 2023 ICSS662/562: (Software Engineering for) Cloud Computing [Fail 2023] School of Engineering and Technology, University of Washington - Tacoma

DATA FLOW ARCHITECTURE

Alternate architecture used by network routers, digital signal processors, special purpose systems

Operations performed when input (data) becomes available

Envisioned to provide much higher parallelism

Multiple problems has prevented wide-scale adoption

Efficiently broadcasting data tokens in a massively parallel system

Efficiently dispatching instruction tokens in a massively parallel system

Building content addressable memory large enough to hold all of the dependencies of a real program

TCSS462/SGI. Conforwer Engineering for I Cloud Computing [Fall 2023] school of Engineering and Technology, University of Washington - Tacoma

1.238

37 38



Computations on large words (e.g. 64-bit integer) are performed as a single instruction

Fewer instructions are required on 64-bit CPUs to process larger operands (A+B) providing dramatic performance improvements

Processors have evolved: 4-bit, 8-bit, 16-bit, 32-bit, 64-bit

QUESTION: How many Instructions are required to add two 64-bit numbers on a 16-bit CPU? (Intel 8088)

64-bit MAX int = 9,223,372,036,854,775,807 (signed)

16-bit MAX int = 32,767 (signed)

Intel 8088 - limited to 16-bit registers

39



CPU PIPELINING

Clack Cycle

O 1 2 3 4 5 5 7 8 9

Instructions

Waiting Instructions

Sura 3 factor

Sura 4 factor

Sura 4 factor

Sura 4 factor

Sura 5 fac

41 42

Slides by Wes J. Lloyd L2.7



Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

Cocober 3, 2023

Cocober 3, 2024

Cocober 3, 2023

Cocober 3

43 44

## MICHAEL FLYNN'S COMPUTER ARCHITECTURE TAXONOMY Michael Flynn's proposed taxonomy of computer architectures based on concurrent instructions and number of data streams (1966) SISD (Single Instruction Single Data) SIMD (Single Instruction, Multiple Data) MIMD (Multiple Instructions, Multiple Data) LESS COMMON: MISD (Multiple Instructions, Single Data) Pipeline architectures: functional units perform different operations on the same data For fault tolerance, may want to execute same instructions redundantly to detect and mask errors – for task replication

FLYNN'S TAXONOMY

SISD (Single Instruction Single Data)
Scalar architecture with one processor/core.
Individual cores of modern multicore processors are "SISD"

SIMD (Single Instruction, Multiple Data)
Supports vector processing
When SIMD instructions are issued, operations on individual vector components are carried out concurrently
Two 64-element vectors can be added in parallel
Vector processing instructions added to modern CPUs
Example: Intel MMX (multimedia) instructions

October 3, 2023

\*\*CSC462/SS2: (Software Engineering for) Cloud Computing [Fail 2023]
School of Engineering and Technology, University of Washington - Taxonsa

45

## (SIMD): VECTOR PROCESSING ADVANTAGES Exploit data-parallelism: vector operations enable speedups Vectors architecture provide vector registers that can store entire matrices into a CPU register SIMD CPU extension (e.g. MMX) add support for vector operations on traditional CPUs Vector operations reduce total number of instructions for large vector operations Provides higher potential speedup vs. MIMD architecture Developers can think sequentially; not worry about parallelism October 3, 2023 TCSS462/SG2: (Software Engineering for) Cloud Computing [fall 2023] school of Engineering and Technology, University of Washington - Taccoma

FLYNN'S TAXONOMY - 2

\*\* MIMD (Multiple Instructions, Multiple Data) - system with several processors and/or cores that function asynchronously and independently

\*\* At any time, different processors/cores may execute different instructions on different data

\*\* Multi-core CPUs are MIMD

\*\* Processors share memory via interconnection networks

\*\* Hypercube, 2D torus, 3D torus, omega network, other topologies

\*\* MIMD systems have different methods of sharing memory

\*\* Uniform Memory Access (UMA)

\*\* Cache Only Memory Access (COMA)

\*\* Non-Uniform Memory Access (NUMA)

\*\* October 3, 2023

\*\* ITCSG62/562: (Software Engineering for) Cloud Computing [fall 2023]
School of Engineering and Technology, University of Washington - Tacoma

47 48

Slides by Wes J. Lloyd L2.8



49

Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

TCSS42/362: (Software Engineering for) Cloud Computing [Fell 2023]
School of Engineering and Technology, University of Visiolington-Taccoma

12.531

GRAPHICAL PROCESSING UNITS (GPUs)

GPU provides multiple SIMD processors
Typically 7 to 15 SIMD processors each
32,768 total registers, divided into 16 lanes
(2048 registers each)
GPU programming model:
single instruction, multiple thread
Programmed using CUDA- C like programming
language by NVIDIA for GPUs
CUDA threads – single thread associated with each
data element (e.g. vector or matrix)
Thousands of threads run concurrently

51

Cloud Computing: How did we get here?
 Parallel and distributed systems
 (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)
 Data, thread-level, task-level parallelism
 Parallel architectures
 SIMD architectures, vector processing, multimedia extensions
 Graphics processing units
 Speed-up, Amdahl's Law, Scaled Speedup
 Properties of distributed systems
 Modularity

October 3, 2023
 TCSS462/562: (Software Engineering For) Cloud Computing [Fall 2023] khool of Engineering and Technology, University of Washington - Taccoma

12.53

PARALLEL COMPUTING

■ Parallel hardware and software systems allow:

■ Solve problems demanding resources not available on single system.

■ Reduce time required to obtain solution

■ The speed-up (S) measures effectiveness of parallelization:

S(N) = T(1) / T(N)

T(1) → execution time of total sequential computation T(N) → execution time for performing N parallel computations in parallel

October 3, 2023

| TCSS462/S62: | Coffware Engineering for J Cloud Computing [Fell 2022] | School of Engineering and Technology, University of Washington - Tacoma

| 12.54

53 54

Slides by Wes J. Lloyd L2.9

50



AMDAHL'S LAW

Amdahl's law is used to estimate the speed-up of a job using parallel computing

Divide job into two parts
Part A that will still be sequential
Part B that will be sped-up with parallel computing

Portion of computation which cannot be parallelized will determine (i.e. limit) the overall speedup

Amdahl's law assumes jobs are of a fixed size

Also, Amdahl's assumes no overhead for distributing the work, and a perfectly even work distribution

Ctober 3, 2023

| IXSM62/562: Cofhware Engineering for J Cloud Computing [fall 2021] | School of Engineering and Technology, University of Washington - Tacoma

55



AMDAHL'S LAW EXAMPLE

Program with two independent parts:
Part A is 75% of the execution time
Part B is 25% of the execution time
Part B is made 5 times faster with parallel computing
Estimate the percent improvement of task execution
Original Part A is 3 seconds, Part B is 1 second

N=5 (speedup of part B)

f=.25 (only 25% of the whole job (A+B) will be sped-up)

S=1 / ((1.f) + f/S)
S=1 / ((.75) + .25/5)
S=1.25

with improvement = 100 \* (1 - 1/1.25) = 20%

October3, 2023

TCSS462/552: (Software Engineering for) Cloud Computing [Fall 2023]
School of Engineering and Technology, University of Versibington-Taccoms

57

GUSTAFSON'S LAW

Calculates the scaled speed-up using "N" processors S(N) = N + (1 - N) α

N: Number of processors α: fraction of program run time which can't be parallelized (e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program

Cotober 3, 2023

GUSTAFSON'S LAW

Calculates the scaled speed-up using "N" processors  $S(N) = N + (1 - N) \alpha$ N: Number of processors  $\alpha$ : fraction of program run time which can't be parallelized (e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program

Where  $\alpha = \sigma / (\pi + \sigma)$ Where  $\sigma = \text{sequential time}, \pi = \text{parallel time}$ Our Amdahl's example:  $\sigma = 3s, \pi = 1s, \alpha = .75$ October 3, 2023

TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2022] School of Engineering and Technology, University of Washington - Tacoma

59 60

Slides by Wes J. Lloyd L2.10

56



GUSTAFSON'S EXAMPLE

\*\*QUESTION:
What is the maximum theoretical speed-up on a 2-core CPU?  $S(N) = N + (1 - N) \alpha$   $N = 2, \alpha = .75$  S(N) = 2 + (1 - 2) .75 S(N) = ?\*\*What is the maximum theoretical speed-up on a 16-core CPU?  $S(N) = N + (1 - N) \alpha$   $N = 16, \alpha = .75$  S(N) = 16 + (1 - 16) .75 S(N) = ?\*\*October 3, 2023

\*\*TCSS462/SG: (Software Engineering for) Good Computing (full 2023)
\*\*School of Engineering and Technology, University of Washington - Tacoms

61



MOORE'S LAW

■ Transistors on a chip doubles approximately every 1.5 years
■ CPUs now have billions of transistors
■ Power dissipation issues at faster clock rates leads to heat removal challenges
■ Transition from: increasing clock rates ⇒ to adding CPU cores
■ Symmetric core processor — multi-core CPU, all cores have the same computational resources and speed
■ Asymmetric core processor — on a multi-core CPU, some cores have more resources and speed
■ Dynamic core processor — processing resources and speed can be dynamically configured among cores
■ Observation: asymmetric processor offer a higher speedup

| Total Core | Total Core

63

Cloud Computing: How did we get here?
 Parallel and distributed systems
 (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)
 Data, thread-level, task-level parallelism
 Parallel architectures
 SIMD architectures, vector processing, multimedia extensions
 Graphics processing units
 Speed-up, Amdahl's Law, Scaled Speedup
 Properties of distributed systems
 Modularity

October 3, 2023

| TCSS462/562: (Software Engineering For) Cloud Computing [Fall 2023] | School of Engineering and Technology, University of Washington - Tacoma

| Lago | La

**DISTRIBUTED SYSTEMS** Collection of autonomous computers, connected through a network with distribution software called "middleware" that enables coordination of activities and sharing of resources Key characteristics: Users perceive system as a single, integrated computing facility. ■ Compute nodes are autonomous Scheduling, resource management, and security implemented by every node Multiple points of control and failure Nodes may not be accessible at all times System can be scaled by adding additional nodes Availability at low levels of HW/software/network reliability TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2023] School of Engineering and Technology, University of Washington - Tacoma October 3, 2023

65 66

Slides by Wes J. Lloyd L2.11

62



TRANSPARENCY PROPERTIES OF DISTRIBUTED SYSTEMS

\*\*Access transparency: local and remote objects accessed using identical operations

\*\*Location transparency: objects accessed w/o knowledge of their location.

\*\*Concurrency transparency: several processes run concurrently using shared objects w/o interference among them

\*\*Replication transparency: multiple instances of objects are used to increase reliability

\*\*users are unaware if and how the system is replicated

\*\*Fallure transparency: concealment of faults

\*\*Migration transparency: objects are moved w/o affecting operations performed on them

\*\*Performance transparency: system can be reconfigured based on load and quality of service requirements

\*\*Scaling transparency: system and applications can scale w/o change in system structure and w/o affecting applications

October 3, 2023

\*\*Crosside/Size: Isoftware informering follow-formuring [fall 2021] School of Engineering and Technology, University of Washington - Tacoma\*

67



TYPES OF MODULARITY Soft modularity: TRADITIONAL Divide a program into modules (classes) that call each other and communicate with shared-memory A procedure calling convention is used (or method invocation) ■ Enforced modularity: CLOUD COMPUTING Program is divided into modules that communicate only through message passing ■ The ubiquitous client-server paradigm Clients and servers are independent decoupled modules System is more robust if servers are stateless May be scaled and deployed separately ■ May also FAIL separately! TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2023] School of Engineering and Technology, University of Washington - Taco October 3, 2023 L2.70

69

**CLOUD COMPUTING - HOW DID WE GET HERE? SUMMARY OF KEY POINTS** ■ Multi-core CPU technology and hyper-threading ■ What is a Heterogeneous system? Homogeneous system? • Autonomous or self-organizing system? Fine grained vs. coarse grained parallelism Parallel message passing code is easier to debug than shared memory (e.g. p-threads) ■ Know your application's max/avg Thread Level Parallelism (TLP) ■ Data-level parallelism: Map-Reduce, (SIMD) Single Instruction Multiple Data, Vector processing & GPUs TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2023] School of Engineering and Technology, University of Washington - Tacom October 3, 2023 L2.71

**CLOUD COMPUTING - HOW DID WE GET HERE? SUMMARY OF KEY POINTS - 2** Bit-level parallelism Instruction-level parallelism (CPU pipelining) Flynn's taxonomy: computer system architecture classification • SISD - Single Instruction, Single Data (modern core of a CPU) • SIMD - Single Instruction, Multiple Data (Data parallelism) • MIMD - Multiple Instruction, Multiple Data MISD is RARE; application for fault tolerance... Arithmetic intensity: ratio of calculations vs memory RW Roofline model: Memory bottleneck with low arithmetic intensity • GPUs: ideal for programs with high arithmetic intensity SIMD and Vector processing supported by many large registers TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2023] School of Engineering and Technology, University of Washington - Taco October 3, 2023 L2.72

71 72

Slides by Wes J. Lloyd L2.12

68

[Fall 2023]

TCSS 462: Cloud Computing TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma





7-

Slides by Wes J. Lloyd L2.13