



DEMOGRAPHICS SURVEY

Please complete the ONLINE demographics survey:

https://forms.gle/dE4Q7Lt13rAXtahJ9

Linked from course webpage in Canvas:

http://faculty.washington.edu/wlloyd/courses/tcss562/announcements.html

rcsssc:software Engineering for Cloud Computing [Fall 2019]
School of Engineering and Technology, University of Washington-Taxoma

3

5

TCSS562 - SOFTWARE ENGINEERING
FOR CLOUD COMPUTING

Syllabus online at:
http://faculty.washington.edu/wlloyd/courses/tcss562/

Grading
Schedule
Assignments

TCSS62: Software Engineering for Cloud Computing [Fall 2019]
School of Engineering and Technology, University of Washington-Tacoma

REFERENCES

I [1] Cloud Computing: Concepts, Technology and Architecture\*
Thomas Erl, Prentice Hall 2013

I [2] Cloud Computing - Theory and Practice
Dan Marinescu, First Edition 2013\*, Second Edition 2018

I [3] Cloud Computing:
A Hands-On Approach
Arshdeep Bahga
2013

Cloud Computing:
A residency sequence of the control of the contr

REFERENCES - 2

| [4] Systems Performance: Enterprise and the Cloud \* |
| Brendan Gregg, First Edition 2013 |
| [5] AWS Administration - The Definitive Guide \* |
| Yohan Wadia, First Edition 2016 |
| Research papers | Systems Performance | AWS Administration |
| The Definitive Guide | AWS Administ

6

Slides by Wes J. Lloyd L1.1



TCS562 COURSE WORK

Project Proposal

Project Status Reports / Activities / Quiz

~ 2-4 total items (??)

Variety of formats: in class, online, reading, activity

Midterm

Open book, note, etc.

Class Presentation

Term Project / Paper / Presentation

Term Project / Paper / Presentation

TCSS62-Software Engineering for Cloud Computing [Fall 2019]
School of Engineering and Technology, University of Washington-Tacoma

 TCS562 TERM PROJECT

Project description to be posted
Teams of ~3, self formed, one project leader
Proposal due: Friday October 11, 11:59pm (tentative)

Focus:
Build a native cloud serverless application
Compose multiple FaaS functions (services)
Compare alternate implementation of:
Service compositions
Application flow control - AWS Step Functions, laptop client, etc.
External cloud components (e.g. database, key-value store)
How does application design impact cost and performance?

9

TCSS562 TERM PROJECT - 2

Deliverables
Demo in class at end of quarter (TBD)
Project report paper (4-6 pgs IEEE format, template provided)
GitHub (project source)
How-To document (via GitHub markdown)

TCSS562-Software Engineering for Cloud Computing [Fall 2019]
School of Engineering and Technology, University of Washington-Tacoma

ALTERNATE TERM PROJECT IDEAS • GOAL: propose cloud development project that serves as a vehicle to compare and contrast the use of alternative cloud services Examples: Object/blob storage services Amazon S3, Google blobstore, Azure blobstore, vs. self-hosted Cloud Relational Database services Amazon Relational Database Service (RDS), Aurora, Self-Hosted DB ■ Platform-as-a-Service hosting (PaaS) alternatives Amazon Elastic Beanstalk, Heroku, others ■ Function-as-a-Service platforms Google Cloud Functions, Azure Functions, IBM Cloud Functions TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma September 25, 2019 L1.12

11 12

Slides by Wes J. Lloyd L1.2

8



TERM PROJECT: RESEARCH

Alternative: conduct a cloud-related research project on any topic to answer a set of research questions
Can be used to help spur MS Capstone/Thesis work

If you're interested in this option, please talk with the instructor

First step is to identify 1 - 2 research questions

Instructor will help guide projects throughout the quarter

Approval based on team preparedness to execute project

September 25, 2019

TCSS62: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington-Tacoma

14

PROJECT SUPPORT

Project cloud infrastructure support:
Sign up for the Github Student Developer Pack:
https://education.github.com/pack
Includes up to \$150 in Amazon Cloud Credits
Includes up to \$100 in Microsoft Azure Credits
AWS credit extensions provided as needed
Microsoft Azure for Students

- \$100 free credit per account valid for 1 year
- https://azure.microsoft.com/en-us/free/students/
- Also: \$200 free credit option for 1 month
- Google Cloud
- \$300 free credit for 1 year
- https://cloud.google.com/free/
- Chameleon / CloudLab
   Bare metal NSF cloud free

Bare metal NSF cloud - free

September 25, 2019
TCSSS62: Software Engineering for Cloud Computing [Fall 2019]
School of Engineering and Technology, University of Washington - Tacoma

15

Projects can lead to papers or posters presented at ACM/IEEE/USENIX conferences, workshops

Networking and travel opportunity
Conference participation (posters, papers) helps differentiate your resume from others

Project can support preliminary work for:
UWT - MS capstone/thesis project proposals
Research projects provide valuable practicum experience with cloud systems analysis, prototyping

Publications are key for building your resume/CV, Also key if applying to PhD programs

TCSSS62:Software Engineering for Cloud Computing [Fall 2019]
School of Engineering and Technology, University of Washington-Tacoma

11.16

16

L1.15

 Cloud Computing: How did we get here?

Parallel and distributed systems
(Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Data, thread-level, task-level parallelism

Parallel architectures

SIMD architectures, vector processing, multimedia extensions

Graphics processing units

Speed-up, Amdahl's Law, Scaled Speedup

Properties of distributed systems

Modularity

September 25, 2019

CXSS62 Software Engineering for Cloud Computing [fall 2019]
School of Engineering and Technology, University of Washington - Tacoma

17 18

Slides by Wes J. Lloyd L1.3





19 20

| Alv          | ID'S 6           | 4-C0            | RE 7       | NM C     | PUS   |        |
|--------------|------------------|-----------------|------------|----------|-------|--------|
| Epyc Rome C  | PUs              |                 |            |          |       |        |
| Announced A  | ugust 20         | 19              |            |          |       |        |
| EPYC 7H12 rd | equires li       | quid co         | oling      |          |       |        |
|              | AM               | D EPYC 70       | 02 Process | ors (2P) |       |        |
|              | Cores<br>Threads | Frequency (GHz) |            | L3*      | TDP   | Price  |
|              |                  | Base            | Max        |          |       |        |
| EPYC 7H12    | 64 / 128         | 2.60            | 3.30       | 256 MB   | 280 W | ?      |
| EPYC 7742    | 64 / 128         | 2.25            | 3.40       | 256 MB   | 225 W | \$6950 |
| EPYC 7702    | 64 / 128         | 2.00            | 3.35       | 256 MB   | 200 W | \$6450 |
| EPYC 7642    | 48 / 96          | 2.30            | 3.20       | 256 MB   | 225 W | \$4775 |
| EPYC 7552    | 48 / 96          | 2.20            | 3.30       | 192 MB   | 200 W | \$4025 |

HYPER THREADING

■ Modern CPUs provide multiple instruction pipelines, supporting multiple execution threads, usually 2 to feed instructions to a single CPU core...

■ Two hyper-threads are not equivalent to (2) CPU cores

■ 17-4770 and i5-4760 same CPU, with and without HTT

■ Example: → hyperthreads add +32.9%

■ Example: → hyperthreads add +32.9%

■ TCSSS62:Software Engineering for Goud Computing [Fall 2019] School of Engineering and Technology, University of Washington-Tacoma

21 22

CLOUD COMPUTING:
HOW DID WE GET HERE? - 2

To make computing faster, we must go "parallel"
Difficult to expose parallelism in scientific applications
Not every problem solution has a parallel algorithm
Chicken and egg problem...
Many commercial efforts promoting pure parallel programming efforts have failed
Enterprise computing world has been skeptical and less involved in parallel programming

TCSSSG: Software Engineering for Cloud Computing [Fall 2019] school of Engineering and Technology, University of Washington-Tacoma

CLOUD COMPUTING:
HOW DID WE GET HERE? - 3

Cloud computing provides access to "infinite" scalable compute infrastructure on demand
Infrastructure availability is key to exploiting parallelism

Cloud applications
Based on client-server paradigm
Thin clients leverage compute hosted on the cloud
Applications run many web service instances
Employ load balancing

September 25, 2019

CCSSG2. Software Engineering for Cloud Computing (Fall 2019) School of Engineering and Technology, University of Washington - Tacoma

23 24

Slides by Wes J. Lloyd L1.4



SMITH WATERMAN USE CASE Applies dynamic programming to find best local alignment of two protein sequences Embarrassingly parallel, each task can run in isolation Use case for GPU acceleration AWS Lambda Serverless Computing Use Case: **Goal:** Pair-wise comparison of all unique human protein sequences (20,336) Python client as scheduler C Striped Smith-Waterman (SSW) execution engine From: Zhao M, Lee WP, Garrison EP, Marth GT: SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS One 2013, 8:e82138 September 25, 2019 L1.26



**CLOUD COMPUTING: HOW DID WE GET HERE? - 3** Compute clouds are large-scale distributed systems Heterogeneous systems Homogeneous systems Autonomous Self organizing TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma September 25, 2019 L1.28

27



**PARALLELISM - 2** Coordination of nodes Requires <u>message passing</u> or <u>shared memory</u> Debugging parallel message passing code is easier than parallel **shared memory** code Message passing: all of the interactions are clear Coordination via specific programming API (MPI) ■ Shared memory: interactions can be implicit - must read the code!! Processing speed is orders of magnitude faster than communication speed (CPU > memory bus speed) Avoiding coordination achieves the best speed-up September 25, 2019 TCSS562: Software Engineering for Cloud Computing [Fall 2019]
School of Engineering and Technology, University of Washington - Tacoma

30

Slides by Wes J. Lloyd L1.5

26





32



DATA-LEVEL PARALLELISM Partition data into big chunks, run separate copies of the program on them with little or no communication ■ Problems are considered to be embarrassingly parallel Also perfectly parallel or pleasingly parallel... Little or no effort needed to separate problem into a number of parallel tasks MapReduce programming model is an example TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tac

33



**DATA FLOW ARCHITECTURE - 2** Architecture not as popular as control-flow Modern CPUs emulate data flow architecture for dynamic instruction scheduling since the 1990s Out-of-order execution - reduces CPU idle time by not blocking for instructions requiring data by defining execution windows Execution windows: identify instructions that can be run by data dependency Instructions are completed in data dependency order within execution window Execution window size typically 32 to 200 instructions <u>Utility of data flow architectures has been</u> much less than envisioned September 25, 2019 TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma L1.36

35 36

L1.6 Slides by Wes J. Lloyd



**INSTRUCTION-LEVEL PARALLELISM (ILP)**  CPU pipelining architectures enable ILP CPUs have multi-stage processing pipelines ■ Pipelining: split instructions into sequence of steps that can execute concurrently on different CPU circuitry ■ Basic RISC CPU - Each instruction has 5 pipeline stages: ■ IF - instruction fetch ■ ID- instruction decode ■ EX - instruction execution ■ <u>MEM</u> – memory access ■ WB - write back TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma September 25, 2019 L1.38

38



**INSTRUCTION LEVEL PARALLELISM - 2** RISC CPU: After 5 clock cycles, all 5 stages of an instruction are loaded Starting with 6th clock cycle, one full instruction completes each cycle ■ The CPU performs 5 tasks per clock cycle! Fetch, decode, execute, memory read, memory write back ■ Pentium 4 (CISC CPU) - processing pipeline w/ 35 stages! TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tac September 25, 2019 L1.40

MICHAEL FLYNN'S COMPUTER ARCHITECTURE TAXONOMY Michael Flynn's proposed taxonomy of computer architectures based on concurrent instructions and number of data streams (1966) SISD (Single Instruction Single Data) SIMD (Single Instruction, Multiple Data) ■ LESS COMMON: MISD (Multiple Instructions, Single Data)

MIMD (Multiple Instructions, Multiple Data)

■ Pipeline architectures: functional units perform different operations on the same data

■ For fault tolerance, may want to execute same instructions redundantly to detect and mask errors - for task replication

TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma September 25, 2019

40

## **FLYNN'S TAXONOMY**

SISD (Single Instruction Single Data) Scalar architecture with one processor/core.

 Individual cores of modern multicore processors are "SISD"

SIMD (Single Instruction, Multiple Data) Supports vector processing

- When SIMD instructions are issued, operations on individual vector components are carried out concurrently
- Two 64-element vectors can be added in parallel
- Vector processing instructions added to modern CPUs
- Example: Intel MMX (multimedia) instructions

TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma September 25, 2019

42 41

Slides by Wes J. Lloyd

L1.42



FLYNN'S TAXONOMY - 2 ■ MIMD (Multiple Instructions, Multiple Data) - system with several processors and/or cores that function asynchronously and independently At any time, different processors/cores may execute different instructions on different data ■ Multi-core CPUs are MIMD Processors share memory via interconnection networks Hypercube, 2D torus, 3D torus, omega network, other topologies ■ MIMD systems have different methods of sharing memory Uniform Memory Access (UMA) Cache Only Memory Access (COMA) Non-Uniform Memory Access (NUMA) TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Taco September 25, 2019 L1.44

43 44

ARITHMETIC INTENSITY Arithmetic intensity: Ratio of work (W) to I = $\overline{Q}$ memory traffic r/w (Q) Example: # of floating point ops per byte of data read Characterizes application scalability with SIMD support SIMD can perform many fast matrix operations in parallel High arithmetic intensity: **Programs with dense matrix operations scale up nicely** (many calcs vs memory RW, supports lots of parallelism) Low arithmetic Intensity:
 Programs with sparse matrix operations do not scale well with problem size (memory RW becomes bottleneck, not enough ops!) TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma L1.45

45

GRAPHICAL PROCESSING UNITS (GPUS)

GPU provides multiple SIMD processors
Typically 7 to 15 SIMD processors each
32,768 total registers, divided into 16 lanes
(2048 registers each)
GPU programming model:
single instruction, multiple thread
Programmed using CUDA- C like programming language by NVIDIA for GPUs
CUDA threads – single thread associated with each data element (e.g. vector or matrix)
Thousands of threads run concurrently

September 25, 2019

TCSSS62: Software Engineering for Cloud Computing [Fail 2019]
School of Engineering and Technology, University of Washington-Tacoma

47

PARALLEL COMPUTING

Parallel hardware and software systems allow:
Solve problems demanding resources not available on single system.
Reduce time required to obtain solution

The speed-up (S) measures effectiveness of parallelization:

S(N) = T(1) / T(N)

T(1) → execution time of total sequential computation T(N) → execution time for performing N parallel computations in parallel

September 25, 2019

TCSS62: Solvant Engineering for Cloud Computing [Fall 2019]
School of Engineering and Enchnology, University of Washington - Tacoma

Slides by Wes J. Lloyd L1.8

48



AMDAHL'S LAW

Portion of computation which cannot be parallelized determines the overall speedup
For an embarrassingly parallel job of fixed size
Assuming no overhead for distributing the work, and a perfectly even work distribution

α: fraction of program run time which can't be parallelized (e.g. must run sequentially)

Maximum speedup is:

S = 1/ α

Example:
Consider a program where 25% cannot be parallelized
Q: What is the maximum possible speedup of the program?

September 25, 2019

TCSSS62: Software Engineering for Cloud Computing [Figl 2019]
September 25, 2019

49 50

Calculates the scaled speed-up using "N" processors

S(N) = N + (1 - N) α

N: Number of processors
α: fraction of program run time which can't be parallelized
(e.g. must run sequentially)

Example:
Consider a program that is embarrassingly parallel,
but 25% cannot be parallelized. α=.25

QUESTION: If deployIng the Job on a 2-core CPU, what
scaled speedup is possible assuming the use of two
processes that run in parallel?

September 25, 2019

TCSSS62: Software Engineering for Cloud Computing [Fall 2019]
School of Engineering and Technology, University of Washington-Tacoma

GUSTAFSON'S EXAMPLE

\*\*QUESTION:\* What is the maximum theoretical speed-up on a 2-core CPU? S(N) = N + (1 - N)  $\alpha$  N=2,  $\alpha$ =.25 S(N) = 2 + (1 - 2).25 S(N) = ?

\*\*What is the maximum theoretical speed-up on a 4-core CPU? S(N) = N + (1 - N)  $\alpha$  N=4,  $\alpha$ =.25 S(N) = 4 + (1 - 4).25 S(N) = ?

\*\*September 25, 2019\*\*

| TCSSS2: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington-Tacoma

52

51

Transistors on a chip doubles approximately every 1.5 years
 CPUs now have billions of transistors
 Power dissipation issues at faster clock rates leads to heat removal challenges

 Transition from: increasing clock rates → to adding CPU cores

 Symmetric core processor — multi-core CPU, all cores have the same computational resources and speed
 Asymmetric core processor — on a multi-core CPU, some cores have more resources and speed
 Dynamic core processor — processing resources and speed can be dynamically configured among cores
 Observation: asymmetric processors offer a higher speedup
 September 25, 2019
 TCSSSG2: Software Engineering for Cloud Computing [fail 2019] school of Engineering and Technology, University of Washington - Tacoma

Collection of autonomous computers, connected through a network with distribution software called "middleware" that enables coordination of activities and sharing of resources

Key characteristics:
Users perceive system as a single, integrated computing facility.
Compute nodes are autonomous
Scheduling, resource management, and security implemented by every node
Multiple points of control and failure
Nodes may not be accessible at all times
System can be scaled by adding additional nodes
Availability at low levels of HW/software/network reliability

September 25, 2019

September 25, 2019

LLM

LLM

53 54

Slides by Wes J. Lloyd

L1.9



TRANSPARENCY PROPERTIES OF DISTRIBUTED SYSTEMS

- Access transparency: local and remote objects accessed using identical operations
- Location transparency: objects accessed w/o knowledge of their location.
- Concurrency transparency: several processes run concurrently using shared objects w/o interference among them
- Replication transparency: multiple instances of objects are used to increase reliability
- users are unaware if and how the system is replicated
- Fallure transparency: concealment of faults
- Migration transparency: objects are moved w/o affecting operations performed on them
- Performance transparency: system can be reconfigured based on load and quality of service requirements
- Scaling transparency: system and applications can scale w/o change in system structure and w/o affecting applications

- September 25, 2019
- TSSSGS: Software Engineering for Cloud Computing [fail 2019]
- School of Enginering and Technology (Investry) (Waldigoton-Tacoma)

55 56



CLOUD COMPUTING - HOW DID WE GET HERE? **SUMMARY OF KEY POINTS** • Multi-core CPU technology and hyper-threading ■ What is a Heterogeneous system? Homogeneous system? • Autonomous or self-organizing system? Fine grained vs. coarse grained parallelism Parallel message passing code is easier to debug than shared memory (e.g. p-threads) Know your application's max/avg Thread Level Parallelism (TLP) ■ Data-level parallelism: Map-Reduce, (SIMD) Single Instruction Multiple Data, Vector processing & GPUs TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma L1.58

57

CLOUD COMPUTING - HOW DID WE GET HERE? **SUMMARY OF KEY POINTS - 2** Bit-level parallelism Instruction-level parallelism (CPU pipelining) Flynn's taxonomy: computer system architecture classification • SISD - Single Instruction, Single Data (modern core of a CPU) • SIMD - Single Instruction, Multiple Data (Data parallelism) • MIMD - Multiple Instruction, Multiple Data MISD is RARE; application for fault tolerance... ■ Arithmetic Intensity: ratio of calculations vs memory RW Roofline model: Memory bottleneck with low arithmetic intensity GPUs: ideal for programs with high arithmetic intensity SIMD and Vector processing supported by many large registers TCSS562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, University of Washington - Tacoma September 25, 2019 L1.59

59

CLOUD COMPUTING – HOW DID WE GET HERE? SUMMARY OF KEY POINTS - 3

Speed-up (S) S(N) = T(1) / T(N)Amdahl's law:  $S = 1 / \alpha$   $\alpha = percent of program that must be sequential$ Scaled speedup with N processes:  $S(N) = N - \alpha(N-1)$ Moore's Law
Symmetric core, Asymmetric core, Dynamic core CPU
Distributed Systems Non-function quality attributes
Distributed Systems - Types of Transparency
Types of modularity- Soft, Enforced

September 25, 2019

TCSSS62: Software Engineering for Cloud Computing [Fail 2019]
School of Engineering and Technology, University of Washington-Tacoma

60

Slides by Wes J. Lloyd L1.10



61

Slides by Wes J. Lloyd