

Syllabus

Course Introduction

Demographics Survey

AWS Cloud Credits Survey

Tutorial 0 - Getting Started with AWS

Tutorial 1 - Intro to Linux

Cloud Computing - How did we get here? (10/4) Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

September 26, 2024

TIMES AND A STANDARD A STANDARD

L



IMPROVING PERFORMANCE
IN COLLEGE CLASSES

DVD/Book (1990s): "Where there's a will there's an A"
Three simple things the instructor remembers for improving grades in college classes:

1. Attend every class
2. Sit in the front row (or as close to the front as possible)
3. Read the book (or assigned reading) – all of it

If not satisfied with recent grades, are you doing these things?

TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Taxoms

3

Pall 2024 TCSS 462/562:
 Video live-streams of lectures
 Video recordings of lectures
 Options to complete and submit most assignments remotely

Please note: UWT does not provide professional video production services. Quality of resources provided will be best-effort.

DISCLAIMER: use of online learning mechanisms can make it easier to disengage in class, fail to participate, not attend, not collaborate with your peers, produce lower quality work, and learn fewer skills to ultimately carry into a job search

| TCSS402/562: (Software Engineering for) Cloud Computing [Fall 2024] | School of Engineering and Rechnology, University of Washington-Tacoma | 115

5

Recently - the job market has become increasingly competitive
Many companies are going back to 100% in-person
Amazon - January 2025
Starbucks Corporate Offices
The instructor becomes exhausted repeating assignment details over and over again on Discord which have been covered in class
It is important to responsibly use online learning tools and not disengage
It can be difficult to participate and engage with peers in class when watching lecture from home synchronously or asynchronously (!!!)
SET GOALS for class attendance and engagement
Neither the instructor or UW is responsible for academic and professional outcomes if students do not show up and engage

September 26, 2024

TISSIGE/SEZ: (Gohaves Engineering for Cloud Computing [Fall 2026] School of Engineering for Cloud Computing [Fall 2026] School of Engineering for Cloud Computing [Fall 2026] School of Engineering and Technology, University of Washington - Taxona

6

Slides by Wes J. Lloyd





REFERENCES

In [1] Cloud Computing: Concepts, Technology and Architecture Thomas Erl, Prentice Hall 2013

In [2] Cloud Computing - Theory and Practice
In Dan Marinescu, Second Edition 2018\*, Third Edition 2023 (new)

In [3] Cloud Computing:
A Hands-On Approach
Arshdeep Bahga
2013

Cloud Computing:
A Hands-On Approach
Arshdeep Bahga
2013

TCSG65/SC2: [Software Engineering for) Cloud Computing [Fall 2024]
September 26, 2024

TCSG65/SC2: [Software Engineering for) Cloud Computing [Fall 2024]
TCSG65/SC2: [Software Engineering for) Cloud Computing [Fall 2024]

9

TCS462/562 COURSE WORK

Project Proposal

Project Status Reports / Activities

- 2-4 total items (??)

Variety of formats: in class, online, reading, activity

Quizzes

Open book, note, etc.

Class Presentation (TCSS 562) (TCSS 422-extra credit)

Class Presentation Summaries (TCSS 462/562)

Term Project / Paper or Presentation

September 26, 2024

TCS562/562: [Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

TERM PROJECT

Project description to be posted
Teams of ~4, self formed, one project leader
Project scope can vary based on team size and background w/ instructor approval
Proposal due: Tuesday October 15, 11:59pm (tentative)

Approach:
Build a "cloud native" web services application
Using serverless computing, containerization, or other
App will consist of multiple services (FaaS functions)
Objective is to compare outcomes of design trade-offs
Performance (runtime)
Cost (\$)

September 26, 2024

TISSS462/562: Isoftware Engineering for) Gloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

11 12

Slides by Wes J. Lloyd



TERM PROJECT - 3

A & B Testing
Compare performance of approaches: language A vs. B
Use statistical methods to infer which performs better
t-tests: student t-test, Welch's t-test (unequal sample sizes or variances), Mann-Whitney U test (non-normal data)
Specify and test specific performance goals
Performance goals: runtime, throughput, network latency, others possible

Other goals can be evaluated, but evaluation may be more difficult
Availability, accessibility, resilience to failure, usability

September 26, 2024

TSSS62/S62: Isofhuwae Engineering (for) Cloud Compating [fail 2028]
School of Engineering and Technology, University of Washington-Taxoma

13

## **TERM PROJECT - 4** Deliverables: ALL GROUPS: Very short live presentation in class at end of quarter showcasing key results (1 slide, 3-5 minutes) TCSS 562: Project paper (4-6 pgs IEEE format, template provided) TCSS 462: Comprehensive recorded video presentation (12-15 minutes), project paper is an alternative GitHub (project source) How-To document describing how to test the system (via GitHub markdown) Standard suggested application/use case(s) or propose (Example) Data Processing Pipeline: Extract-Transform-Load (ETL) data processing pipeline combing AWS Lambda, S3, and Amazon Aurora Serverless DB TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024 L1.15

**TERM PROJECT - 5** Primary goal for the term project is to implement a cloudbased application and investigate 1 or more design trade-offs Fall 2024 focus - programming language trade-offs for serverless (AWS Lambda) Teams evaluate the impact of different designs (implementations) on performance and cost objectives and report on the results ■ Creative projects encouraged! Groups do not have to follow the Fall 2024 THEME Groups can propose and implement another project which analyzes other design trade-offs (besides programming languages) TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024 L1.16

15

## **COMPARING DIFFERENT DESIGN TRADE-OFFS** • What other design trade-offs can be compared? Compare and contrast alternative designs using various cloud services, languages, platforms, etc. ■ Examples - Compare different: Cloud storage services: Object/blob storage services Amazon S3. Google blobstore, Azure blobstore, vs. self-hosted Cloud relational database services: Amazon Relational Database Service (RDS), Aurora, Self-Hosted DB Platform-as-a-Service (PaaS) alternatives for web app hosting: Amazon Elastic Beanstalk, Heroku, others Open source FaaS platforms Apache OpenWhisk, OpenFaaS, Fn. others. TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024 L1.17

**COMPARING DIFFERENT DESIGN TRADE-OFFS - 2** ■ Serverless storage alternatives From AWS Lambda: Amazon EFS, S3, Containers, others Container platforms Amazon ECS/Fargate, AKS, Azure Kubernetes, Self-hosted Kubernetes cluster on cloud VMs Contrasting queueing service alternatives Amazon SQS, Amazon MQ, Apache Kafka, RabbitMQ, Omq, ■ NoSOL database services DynamoDB, Google BigTable, MongoDB, Cassandra CPU architectures Intel (x86\_64), AMD (x86\_64), ARM (Graviton), MAC (M1) Service designs or compositions TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024

17 18

Slides by Wes J. Lloyd L1.3

14



TERM PROJECT - KEY REQUIREMENTS

1. Application should involve multiple processing steps
2. Implementation does not have to be Function-as-a-Service (FaaS)
3. Implementation leverages external cloud services (e.g. databases, object stores, queues)
4. Projects will contrast alternate designs
5. Define your comparison metrics:

Which designs offer the fastest performance (runtime)?

Lowest cost (\$)?

Best maintainability?
Consider size, lines of code (LOC), smaller programs are generally considered to be easier to maintain

September 26, 2024

1.35562/562: (Software Engineering for) Cloud Computing [fall 2024]
School of Engineering and Technology, University of Weishington -Tacoma

19 20

## TERM PROJECT: RESEARCH Alternative to analyzing design trade-offs for a cloud application: Conduct a cloud-related research project on any topic focused on specific research goals / questions Can help spur MS Capstone/Thesis or BS honors thesis projects Identify and investigate 1 - 2 research questions Implement a novel solution to an open problem Complete initial research towards publishing a conference or workshop paper If you're interested in this option, please talk with the Instructor will help guide projects throughout the quarter Explore our growing body of cloud research publications at: http://faculty.washington.edu/wlloyd/research.html TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024 L1.21

Project cloud infrastructure support:

Standard AWS Account (RECOMMENDED)

Create standard AWS account with UW email
Credit card required
Instructor provides students with \$100 credit vouchers from AWS
When voucher is used up, request another voucher from instructor
Additional credits available throughout Fall quarter (within reason)

Instructor provided IAM AWS Account
No Credit Card required
Instructor creates and manages account security and permissions
More restricted
Students must communicate with instructor to request additional permissions to cloud services – possible delays

September 26, 2024

TSSEQ:/Solvane Engineering and Technology, University of Washington - Tacoma

L122

21

PROJECT SUPPORT - 2

\*\*Other Support:

Github Student Developer Pack:

https://education.github.com/pack

Formerly offered AWS credits, but Microsoft bought GitHub
Includes up to \$200 in Digital Ocean Credits
Includes up to \$200 in Digital Ocean Credits
Includes up to \$200 in Digital Ocean Credits
Unlimited private git repositories
Several other benefits

Microsoft Azure for Students
Stool free credit per account valid for 1 year - no credit card (?)
https://azure.microsoft.com/en-us/free/students/
Google Cloud
S300 free credit for 1 year
https://cloud.google.com/free/
Chameleon / Cloud.ab
Bare metal NSF cloud - free

September 26, 2024

| Include September 26, 2024 | Include I

TERM PROJECT RESEARCH OPPORTUNITIES Projects can lead to papers or posters presented at ACM/IEEE/USENIX conferences, workshops Networking and research opportunity ... travel ??? Conference participation (posters, papers) helps differentiate your resume/CV from others ■ Project can support preliminary work for: UWT - BS honors, MS capstone/thesis projects Research projects provide valuable practicum experience with cloud systems analysis, prototyping Publications are key for building your resume/CV, Also very important for applying to PhD programs TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024 L1.24

23 24

Slides by Wes J. Lloyd L1.4



TCSS 562: CLASS PRESENTATION ■ TCSS 562 students will give a team presentation teams of ~3 >TCSS 462 students: team presentation for extra credit Technology sharing presentation PPT Slides, demonstration Provide technology overview of one cloud service offering Present overview of features, performance, etc. Cloud Research Paper Presentation • PPT slides, identify research contributions, strengths and weaknesses of paper, possible areas for future work

26



**CLASS PRESENTATION PEER REVIEWS - 2** For TCSS 462 - the 4 required peer reviews will count for the entire presentation score For TCSS 562 - the peer reviews will count as ~20% of the presentation score September 26, 2024

27



**DEMOGRAPHICS SURVEY** ■ Please complete the ONLINE demographics survey: https://forms.gle/6ER7PzfP521vdxYW9 Linked from course webpage in Canvas: http://faculty.washington.edu/wlloyd/courses/tcss562/ announcements.html September 26, 2024

29 30

Slides by Wes J. Lloyd L1.5



AWS CLOUD CREDITS SURVEY

Please complete the AWS CLOUD CREDITS survey:

https://forms.gle/fmKkLZbxZECbAay16

Linked from course webpage in Canvas:

http://faculty.washington.edu/wlloyd/courses/tcss562/announcements.html

31 32



OBJECTIVES - 9/26

Course Introduction
Syllabus
Demographics Survey
AWS Cloud Credits Survey
Tutorial 0 - Getting Started with AWS
Tutorial 1 - Intro to Linux

Cloud Computing - How did we get here?
Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

Chapter 4 Marinescu 2<sup>nd</sup> edition:
Introduction to parallel and distributed systems

33



Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

September 26, 2024

TICSS462/S62: (Software Engineering for) Gloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

11.16

35 36

Slides by Wes J. Lloyd L1.6





37 38

HYPER THREADING Modern CPUs provide multiple instruction pipelines, supporting multiple execution threads, usually 2 to feed instructions to a single CPU core... ■ Two hyper-threads are not equivalent 4770 with HTT Vs. 4670 without HTT - 25% improvement w/ HTT to (2) CPU cores CPU Mark Relative to Top 10 Common CPUs As of 7th of February 2014 - Higher results represent better perfor ■ i7-4770 and i5-4760 same CPU, with and without HTT ■ Example: → hyperthreads add +32.9% TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024

39

AMD'S 64-CORE < 14NM CPUS ■ Fnvc Rome CPUs Announced August 2019 ■ EPYC 7H12 requires liquid cooling AMD EPYC 7002 Proc 64 / 128 EPYC 7742 EPYC 7702 3.35 256 MB \$4775 EPYC 7642 48 / 96 2.30 3.20 225 W EPYC 7552 48 / 96 2.20 3.30 192 MB 200 W \$4025 omputing [Fall 2024] of Washington - Taco September 26, 2024 L1.40

AMD EPYC 9654/9654P/9684X/9R14 (AWS 0EM):

June 2023: <u>96 cores</u>, 192 hyper-threads CPUs

Mixes 4nm:APU (combines CPUs+GPU), 5nm:L3 cache
(8 CPU-chiplet), and 6nm:I/O dies, 2.25 to 3.7 burst
GHz, up to 400 watts

\$10,625 to \$14,756

AMD EPYC 9754: <u>128 cores</u>, 256 hyperthreads!

2.25 to 3.1 burst GHz, 360 watts

\$11,900

AMD EPYC 9005: <u>192 cores</u>, 384 threads, 3nm (in dev)

CLOUD COMPUTING:
HOW DID WE GET HERE? - 2

To make computing faster, we must go "parallel"
Difficult to expose parallelism in scientific applications
Not every problem solution has a parallel algorithm
Chicken and egg problem...

Many commercial efforts promoting pure parallel programming efforts have failed
Enterprise computing world has been skeptical and less involved in parallel programming

TCSS462/562: [Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

41 42

Slides by Wes J. Lloyd L1.7



43

CLOUD COMPUTING:
HOW DID WE GET HERE? - 4

Big Data requires massive amounts of compute resources

MAP - REDUCE
Single instruction, multiple data (SIMD)
Exploit data level parallelism
Bioinformatics example

SMITH WATERMAN USE CASE

Applies dynamic programming to find best local alignment of two protein sequences
 Embarrassingly parallel, each task can run in isolation
 Use case for GPU acceleration

AWS Lambda Serverless Computing Use Case:
Goal: Pair-wise comparison of all unique human protein sequences (20,336)

Python client as scheduler
 C Striped Smith-Waterman (SSW) execution engine
From: Zhao M, Lee WP, Garrison EP, Marth GT: SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.
PLoS One 2013, 8:e82138

September 26, 2024

September 26, 2024

CSG642/RSSC: [Software Engineering for) Cloud Computing [Fail 2024]
School of Engineering for) Cloud Computing [Fail 2024]
LL6

SMITH WATERMAN RUNTIME

Laptop server and client (2-core, 4-HT): 8.7 hours

AWS Lambda FaaS, laptop as client: 2.2 minutes
Partitions 20,336 sequences into 41 sets
Execution cost: ~82¢ (~237x speed-up)

AWS Lambda server, EC2 instance as client: 1.28 minutes
Execution cost: ~87¢ (~408x speed-up)

Hardware
Laptop client: Intel i5-7200U 2.5 GHz :4 HT, 2 CPU
Cloud client: EC2 Virtual Machine - m5.24xlarge: 96 vCPUs
Cloud server: Lambda ~1000 Intel E5-2666v3 2.9GHz CPUs

September 26, 2024

TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

45



Cloud Computing: How did we get here?
 Parallel and distributed systems
 (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

 Data, thread-level, task-level parallelism
 Parallel architectures

 SIMD architectures, vector processing, multimedia extensions
 Graphics processing units
 Speed-up, Amdahl's Law, Scaled Speedup
 Properties of distributed systems

 Modularity

| Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity | Modularity |

47 48

Slides by Wes J. Lloyd L1.8

44



PARALLELISM - 2

Coordination of nodes
Requires message passing or shared memory
Debugging parallel message passing code is easier than parallel shared memory code

Message passing: all of the interactions are clear
Coordination via specific programming API (MPI)

Shared memory: interactions can be implicit – must read the code!!

Processing speed is orders of magnitude faster than communication speed (CPU > memory bus speed)
Avoiding coordination achieves the best speed-up

49 50



THREAD LEVEL PARALLELISM (TLP)

Number of threads an application runs at any one time
Varies throughout program execution
As a metric:
Minimum: 1 thread
Can measure average, maximum (peak)
QUESTION: What are the consequences of average (TLP) for scheduling an application to run on a computer with a fixed number of CPU cores and hyperthreads?
Let's say there are 4 cores, or 8 hyper-threads...
Key to avoiding waste of computing resources is knowing your application's TLP...

September 26, 2024

TCSAG2/SG2; Ecoftware Engineering for Cloud Computing [Tail 2024]
School of Engineering and Technology (Washington - Tacoma)

1152

51



Partition data into big chunks, run separate copies of the program on them with little or no communication

Problems are considered to be embarrassingly parallel

Also perfectly parallel or pleasingly parallel...

Little or no effort needed to separate problem into a number of parallel tasks

MapReduce programming model is an example

1055462/562: [Software Engineering for] Gloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taccoma

53 54

Slides by Wes J. Lloyd L1.9





BIT-LEVEL PARALLELISM

Computations on large words (e.g. 64-bit integer) are performed as a single instruction
Fewer instructions are required on 64-bit CPUs to process larger operands (A+B) providing dramatic performance improvements
Processors have evolved: 4-bit, 8-bit, 16-bit, 32-bit, 64-bit

QUESTION: How many Instructions are required to add two 64-bit numbers on a 16-bit CPU? (Intel 8088)

64-bit MAX int = 9,223,372,036,854,775,807 (signed)
Intel 8088 - limited to 16-bit registers

September 26, 2024

INSURANCE OF COMMENT OF THE PROPERTY OF THE

INSTRUCTION-LEVEL PARALLELISM (ILP)

CPU pipelining architectures enable ILP
CPUs have multi-stage processing pipelines
Pipelining: split instructions into sequence of steps that can execute concurrently on different CPU circuitry

Basic RISC CPU - Each instruction has 5 pipeline stages:
IF - Instruction fetch
ID- instruction decode
EX - instruction execution
MEM - memory access
WB - write back

September 26, 2024

CSS462/562: (Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

57



RISC CPU:

After 5 clock cycles, all 5 stages of an instruction are loaded

Starting with 6<sup>th</sup> clock cycle, one full instruction completes each cycle

The CPU performs 5 tasks per clock cycle!
Fetch, decode, execute, memory read, memory write back

Pentium 4 (CISC CPU) – processing pipeline w/ 35 stages!

59 60

Slides by Wes J. Lloyd L1.10



MICHAEL FLYNN'S COMPUTER
ARCHITECTURE TAXONOMY

Michael Flynn's proposed taxonomy of computer
architectures based on concurrent instructions and
number of data streams (1966)

SISD (Single Instruction Single Data)

SIMD (Single Instruction, Multiple Data)

MIMD (Multiple Instructions, Multiple Data)

LESS COMMON: MISD (Multiple Instructions, Single Data)

Pipeline architectures: functional units perform different
operations on the same data

For fault tolerance, may want to execute same instructions
redundantly to detect and mask errors – for task replication

September 26, 2024

TGSGEZIGE: Schwarz Expinerity for Good Computing (Irid 2024)

September 26, 2024

61 62



(SIMD): VECTOR PROCESSING ADVANTAGES

Exploit data-parallelism: vector operations enable speedups

Vectors architecture provide vector registers that can store entire matrices into a CPU register

SIMD CPU extension (e.g. MMX) add support for vector operations on traditional CPUs

Vector operations reduce total number of instructions for large vector operations

Provides higher potential speedup vs. MIMD architecture

Developers can think sequentially; not worry about parallelism

September 26, 2024

TCSS62/S62: (Software Engineering for) Cloud Computing [Fall 2024] school of Engineering and Technology, University of Washington - Tacoma

63

FLYNN'S TAXONOMY - 2 • MIMD (Multiple Instructions, Multiple Data) - system with several processors and/or cores that function asynchronously and independently At any time, different processors/cores may execute different instructions on different data ■ Multi-core CPUs are MIMD Processors share memory via interconnection networks Hypercube, 2D torus, 3D torus, omega network, other topologies ■ MIMD systems have different methods of sharing memory Uniform Memory Access (UMA) - Cache Only Memory Access (COMA) Non-Uniform Memory Access (NUMA) TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024 L1.65 ■ Arithmetic Intensity: Ratio of work (W) to memory traffic r/w (Q) Example: # of floating-point ops per byte of data read

■ Characterizes application scalability with SIMD support

■ SIMD can perform many fast matrix operations in parallel

■ High arithmetic Intensity:

Programs with dense matrix operations scale up nicely (many calcs vs memory RW, supports lots of parallelism)

■ Low arithmetic Intensity:

Programs with sparse matrix operations do not scale well with problem size (memory RW becomes bottleneck, not enough ops!)

September 26, 2024

TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington- Tacoma

65 66

Slides by Wes J. Lloyd



**OBJECTIVES** Cloud Computing: How did we get here? Parallel and distributed systems (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition) Data, thread-level, task-level parallelism Parallel architectures SIMD architectures, vector processing, multimedia extensions Graphics processing units Speed-up, Amdahl's Law, Scaled Speedup Properties of distributed systems Modularity September 26, 2024



**OBJECTIVES** Cloud Computing: How did we get here? Parallel and distributed systems (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition) Data, thread-level, task-level parallelism Parallel architectures SIMD architectures, vector processing, multimedia extensions Graphics processing units Speed-up, Amdahl's Law, Scaled Speedup Properties of distributed systems Modularity TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024

69

PARALLEL COMPUTING ■ Parallel hardware and software systems allow: Solve problems demanding resources not available on single system. Reduce time required to obtain solution ■The speed-up (S) measures effectiveness of parallelization: S(N) = T(1) / T(N) $T(1) \rightarrow$  execution time of total sequential computation  $T(N) \rightarrow \text{execution time for performing N parallel}$ computations in parallel TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco September 26, 2024 L1.71

**SPEED-UP EXAMPLE**  Consider embarrassingly parallel image processing Eight images (multiple data) Apply image transformation (greyscale) in parallel ■ 8-core CPU, 16 hyper threads Sequential processing: perform transformations one at a time using a single program thread 8 images, 3 seconds each: T(1) = 24 seconds Parallel processing 8 images, 3 seconds each: T(N) = 3 seconds • Speedup: S(N) = 24 / 3 = 8x speedup Called "perfect scaling" Must consider data transfer and computation setup time September 26, 2024

71 72

Slides by Wes J. Lloyd L1.12

68



 $S = \frac{1}{(1-f) + \frac{f}{N}}$  = S = theoretical speedup of the whole task  $= f = \text{fraction of work that is parallel} \qquad (ex. 25\% \text{ or } 0.25)$   $= N = \text{proposed speed up of the parallel part} \qquad (ex. 5 \text{ times speedup})$  = % improvement  $= f \text{ task execution} \qquad = 100 * (1 - (1/S))$  = Using Amdahl's law, what is the maximum possible speed-up?  $= \frac{1}{\text{Cctober 14, 2020}} = \frac{1}{\text{CCSS62: Software Engineering for Cloud Computing [Fail 2020]}}{\frac{1}{\text{School of Engineering and Technology, University of Washington - Tacoma}}{\frac{1}{\text{L5.74}}}$ 

73 74



GUSTAFSON'S LAW

Calculates the scaled speed-up using "N" processors  $S(N) = N + (1 - N) \alpha$ N: Number of processors  $\alpha$ : fraction of program run time which can't be parallelized (e.g. must run sequentially)

Can be used to estimate runtime of parallel portion of program

Can be used to estimate runtime of parallel portion of program

/5

 GUSTAFSON'S LAW

Calculates the scaled speed-up using "N" processors  $S(N) = N + (1 - N) \alpha$ N: Number of processors α: fraction of program run time which can't be parallelized (e.g. must run sequentially)

Example:
Consider a program that is embarrassingly parallel, but 75% cannot be parallelized. α=.75
QUESTION: If deploying the job on a 2-core CPU, what scaled speedup is possible assuming the use of two processes that run in parallel?

September 26, 2024

1CSS462/562: [Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoma

77 78

Slides by Wes J. Lloyd



**GUSTAFSON'S EXAMPLE** • QUESTION: What is the maximum theoretical speed-up on a 2-core CPU?  $S(N) = N + (1 - N) \alpha$ N=2, α= For 2 CPUs, speed up is 1.25x S(N) = 2S(N) = ?For 16 CPUs, speed up is 4.75x ■ What is the  $S(N) = N + (1 - N) \alpha$  $N=16. \alpha = .75$ S(N) = 16 + (1 - 16).75S(N) = ?September 26, 2024 L1.80

79

■ Transistors on a chip doubles approximately every 1.5 years

■ CPUs now have billions of transistors

■ Power dissipation issues at faster clock rates leads to heat removal challenges

■ Transition from: increasing clock rates → to adding CPU cores

■ Symmetric core processor — multi-core CPU, all cores have the same computational resources and speed

■ Asymmetric core processor — on a multi-core CPU, some cores have more resources and speed

■ Dynamic core processor — processing resources and speed can be dynamically configured among cores

■ Observation: asymmetric processors offer a higher speedup

| ICSSEC/ROS. (Schware Engineering for) Cloud Computing [fall 2024] | School of Engineering and Technology. University of Washington—Taxoma

Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

September 26, 2024

TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacoms

1132

81

Collection of autonomous computers, connected through a network with distribution software called "middleware" that enables coordination of activities and sharing of resources

Key characteristics:
Users perceive system as a single, integrated computing facility.
Compute nodes are autonomous
Scheduling, resource management, and security implemented by every node
Multiple points of control and failure
Nodes may not be accessible at all times
System can be scaled by adding additional nodes
Availability at low levels of HW/software/network reliability

September 26, 2024

ILLS
September 26, 2024

School of Engineering and Technology, University of Washington - Tacoma

| Key non-functional attributes
| Known as "ilities" in software engineering
| Availability - 24/7 access?
| Reliability - Fault tolerance
| Accessibility - reachable?
| Usability - user friendly
| Understandability - can under
| Scalability - responds to variable demand
| Extensibility - can be easily modified, extended
| Maintainability - can be easily fixed
| Consistency - data is replicated correctly in timely manner

| September 26, 2024 | TESS462/562: (Software Engineering for) Cloud Computing [Fail 2024] | School of Engineering and Technology, University of Washington - Tacoma | 1134

83 84

Slides by Wes J. Lloyd L1.14

80



Cloud Computing: How did we get here?
 Parallel and distributed systems
 (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)
 Data, thread-level, task-level parallelism
 Parallel architectures
 SIMD architectures, vector processing, multimedia extensions
 Graphics processing units
 Speed-up, Amdahl's Law, Scaled Speedup
 Properties of distributed systems
 Modularity

September 26, 2024

| TCSS462/S62: (Software Engineering for) Cloud Computing [Fall 2024]
| School of Engineering and Technology, University of Washington - Tacoma

| 1126

85 86



**CLOUD COMPUTING - HOW DID WE GET HERE?** SUMMARY OF KEY POINTS ■ Multi-core CPU technology and hyper-threading ■ What is a Heterogeneous system? Homogeneous system? • Autonomous or self-organizing system? Fine grained vs. coarse grained parallelism Parallel message passing code is easier to debug than shared memory (e.g. p-threads) Know your application's max/avg Thread Level Parallelism (TLP) ■ Data-level parallelism: Map-Reduce, (SIMD) Single Instruction Multiple Data, Vector processing & GPUs TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024]
School of Engineering and Technology, University of Washington - Tacc L1.88

87

CLOUD COMPUTING - HOW DID WE GET HERE? **SUMMARY OF KEY POINTS - 2**  Bit-level parallelism Instruction-level parallelism (CPU pipelining) Flynn's taxonomy: computer system architecture classification • SISD - Single Instruction, Single Data (modern core of a CPU) • SIMD - Single Instruction, Multiple Data (Data parallelism) • MIMD - Multiple Instruction, Multiple Data MISD is RARE; application for fault tolerance... Arithmetic intensity: ratio of calculations vs memory RW = Roofline model: Memory bottleneck with low arithmetic intensity • GPUs: ideal for programs with high arithmetic intensity SIMD and Vector processing supported by many large registers TCSS462/562: (Software Engineering for) Cloud Computing (Fall 2024 School of Engineering and Technology, University of Washington - Tac September 26, 2024 L1.89 CLOUD COMPUTING – HOW DID WE GET HERE?

SUMMARY OF KEY POINTS - 3

Speed-up (S)
S(N) = T(1) / T(N)
Amdahl's law:
S=1 / ((1-f) + f/N),s=latency, f=parallel fraction, N=speed-up
α = percent of program that must be sequential
Scaled speedup with N processes:
S(N) = N - α(N-1)
Moore's Law
Symmetric core, Asymmetric core, Dynamic core CPU
Distributed Systems Non-function quality attributes
Distributed Systems – Types of Transparency
Types of modularity- Soft, Enforced

September 26, 2024

School of Engineering, Interchology, University of Washington – Taxoma

89 90

Slides by Wes J. Lloyd L1.15

S 462: Cloud Computing [Fall 2024]

TCSS 462: Cloud Computing TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma





<u>-</u>

Slides by Wes J. Lloyd L1.16