









|                                                   | HYPER-THREADING - 2                                                |
|---------------------------------------------------|--------------------------------------------------------------------|
| = <u>How do I us</u>                              | e hyper-threading?                                                 |
| Hyper-thread                                      | ling is automatic                                                  |
| Modern CPUs                                       | s expose each physical CPU core as two CPU cores                   |
| cat /proc                                         | /cpuinfo command lists individual cores                            |
| <ul> <li>Operating sy<br/>hyper-thread</li> </ul> | stem schedules processes & threads to run on a                     |
| On CPUs with<br>threads                           | h hyper-threading, each CPU core has two hyper-                    |
| To the opera<br>independent                       | ting system they are seen as full-featured<br>CPU cores            |
|                                                   | TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2024] |



Hyper-Threading (HT) Technology Provides more satisfactory solution Single physical processor is shared as two logical processors Each logical processor has its own architecture state Single set of execution units are shared between logical processors N-logical PUs are supported Have the same gain % with only 5% diesize penalty. HT allows single processor to fetch and execute two separate code streams simultaneously. Figure 3: Pr ors with Hyper-Th

9







10



### TCSS 462: Cloud Computing TCSS 562: Software Engineering for Cloud Computing School of Engineering and Technology, UW-Tacoma



**FEEDBACK - 3**  If I use a computer with 8 cores (client) to rent a virtual machine with 1.28 cores through a cloud provider, the computer with less cores won't decrease the performance of the virtual machine with more cores because they are separate? CORRECT, the performance will not decrease. The 8-core (laptop/desktop) is just used to access the remote computer via ssh/graphical desktop The laptop/desktop acts as a client computer used to access the powerful remote server Any applications / jobs /workloads are run on the remote server, but are launched by the client Through a terminal session (ssh), or remote graphical desktop Or by calling a web service hosted on the powerful server You may experience network latency between the client and server for large data transfers TCSS462/562: School of Eng ware Engineering for) Cloud Computing [Fall 2024] ing and Technology, University of Washington - Tac October 3, 2024 L3.14

14

























| OBJECTIVES - 10/3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | CLASS ACTIVITY 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Questions from 10/1     Tutorial 0, Tutorial 1, Tutorial 2     Cloud Computing – How did we get here?     (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)     Class Activity 1 – Implicit vs Explicit Parallelism     SIMD architectures, vector processing, multimedia     extensions     Graphics processing units     Speed-up, Amdahl's Law, Scaled Speedup     Properties of distributed systems     Modularity     October 3,2024     TCS462/552;foftware Engineering for Courd Computing [Fall 2024]     School of Engineering and Technology, University of Washington - Taxoma | <ul> <li>Form groups of ~3 - in class or with Zoom breakout rooms</li> <li>Each group will complete a MSWORD DOCX worksheet</li> <li>Be sure to add names at top of document as they appear in<br/>Canvas</li> <li>Activity can be completed in class or after class</li> <li>The activity can also be completed individually</li> <li>When completed, <u>one person</u> should submit a PDF of the<br/>documet to Canvas</li> <li>Instructor will score all group members based on the uploaded<br/>PDF file</li> <li>To get started:         <ul> <li>Follow the link: (link also available in Canvas)<br/>https://faculty.washington.edu/wlloyd/courses/tcss562/<br/>assignments/tcss462_562_f2024_tps1.docx</li> </ul> </li> <li>October 3, 224</li> </ul> |









| PARALLELISM QUESTIONS - 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                               |    |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|----|--|
| <ul> <li>9. An application developer measures the average and peak thread level parallelism (TLP) of an application prior to deployment on the AWS EC2. The developer measures an average TLP of 2.3, and a peak TLP of 7.3. The application is to be deployed using a compute-optimized (c-series) ec2 instance. Using resources online, such as the websites below, propose a good virtual machine (ec2 type) that satisfies average TLP, and a second for satisfying peak TLP.</li> <li>https://docs.aws.amazon.com/ec2/latest/instancetypes/co.html</li> <li>https://instances.vantage.sh/</li> </ul> |                                                                                                                                               |    |  |
| October 3, 2024                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2024]<br>School of Engineering and Technology, University of Washington - Tacoma | 31 |  |

Questions from 10/1

extensions

Modularity

October 3, 2024

Tutorial 0, Tutorial 1, Tutorial 2

Graphics processing units



33





32





36

L3.33

## (SIMD): VECTOR PROCESSING **ADVANTAGES**

Exploit data-parallelism: vector operations enable speedups

- Vectors architecture provide vector registers that can store entire matrices into a CPU register
- SIMD CPU extension (e.g. MMX) add support for vector operations on traditional CPUs
- Vector operations reduce total number of instructions for large vector operations
- Provides higher potential speedup vs. MIMD architecture
- Developers can think sequentially; not worry about parallelism

TCSS462/562: (Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Taco October 3, 2024

L3.36

















45















**GUSTAFSON'S LAW** 

 $\label{eq:constraint} \begin{array}{|c|c|c|} \hline \textbf{GUSTAFSON'S EXAMPLE} \\ \hline \textbf{GUSTAFSON'S EXAMPLE} \\ \hline \textbf{What is the maximum theoretical speed-up on a 2-core CPU ? } \\ S(N) = N + (1 - N) \alpha \\ N = 2, \alpha = .75 \\ S(N) = 2 + (1 - 2) .75 \\ S(N) = ? \\ \hline \textbf{What is the maximum theoretical speed-up on a 16-core CPU? } \\ S(N) = N + (1 - N) \alpha \\ N = 16, \alpha = .75 \\ S(N) = 16 + (1 - 16) .75 \\ S(N) = ? \\ \hline \textbf{Cotober 3.202} \\ \hline \textbf{TCSME2/R62/Software Engineering for Cloud Computing (Fall 2024) } \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Stable of Stable of Engineering and Redmoding. University of Washington - Texama \\ \hline \textbf{Stable of Stable of Stable of Stable of Engineering and Redmoding. \\ \hline \textbf{Stable of Stable of Sta$ 

51





 OBJECTIVES - 10/3
 Questions from 10/1
 Tutorial 0, Tutorial 1, Tutorial 2
 Cloud Computing - How did we get here? (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)
 Class Activity 1 - Implicit vs Explicit Parallelism
 SIMD architectures, vector processing, multimedia extensions
 Graphics processing units
 Speed-up, Amdahl's Law, Scaled Speedup
 Properties of distributed systems
 Modularity



L3.56



56

55

# TRANSPARENCY PROPERTIES OF DISTRIBUTED SYSTEMS

- Access transparency: local and remote objects accessed using identical operations
- Location transparency: objects accessed w/o knowledge of their location.
   Concurrence transparence: operate processor run concurrent
- Concurrency transparency: several processes run concurrently using shared objects w/o interference among them
   Replication transparency: multiple instances of objects are
- Replication transparency: multiple instances of objects are used to increase reliability

   users are unaware if and how the system is replicated
- Failure transparency: concealment of faults
- Migration transparency: objects are moved w/o affecting operations performed on them
- Performance transparency: system can be reconfigured based on load and quality of service requirements
- Scaling transparency: system and applications can scale w/o change in system structure and w/o affecting applications

October 3, 2024 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Tacoma

57



58

L3.57

L3.59

# TYPES OF MODULARITY Soft modularity: TRADITIONAL Divide a program into modules (classes) that call each other and communicate with shared-memory A procedure calling convention is used (or method invocation) Enforced modularity: CLOUD COMPUTING

- Program is divided into modules that communicate only through message passing
- The ubiquitous client-server paradigm
- Clients and servers are independent decoupled modules
- System is more robust if servers are stateless
- May be scaled and deployed separately
- May also FAIL separately!

#### October 3, 2024 TCSS462/562:(Software Engineering for) Cloud Computing [Fall 2024] School of Engineering and Technology, University of Washington - Tacom

### CLOUD COMPUTING - HOW DID WE GET HERE? SUMMARY OF KEY POINTS Multi-core CPU technology and hyper-threading What is a Heterogeneous system? Homogeneous system? Autonomous or self-organizing system? Fine grained vs. coarse grained parallelism Parallel message passing code is easier to debug than shared memory (e.g. p-threads) Know your application's max/avg Thread Level Parallelism (TLP)

 Data-level parallelism: Map-Reduce, (SIMD) Single Instruction Multiple Data, Vector processing & GPUs October3,2224
 Stool dispitenty and returning [Jin 2024] Stool dispitenty and returning (Jin 2024) Vashington - Tatoma



L3.60



