

### FEEDBACK FROM 11/28

- Assignment #3
- A good starting point is to first iterate the set of processes in Linux, and print out the proc ID and name.
- This link, Chapter #3, "The Process Family Tree", should be helpful:
- https://notes.shichao.io/lkd/ch3/

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma









### **MOTIVATION FOR EXPANDING THE ADDRESS SPACE**

- Can provide illusion of an address space larger than physical RAM
- For a single process
  - Convenience
  - Ease of use
- For multiple processes
  - Large virtual memory space for many concurrent processes

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.7

### **LATENCY TIMES**

- Design considerations
  - SSDs 4x the time of DRAM
  - HDDs 80x the time of DRAM

| Action                             | Latency (ns)  | (µs)      |                              |
|------------------------------------|---------------|-----------|------------------------------|
| L1 cache reference                 | 0.5ns         |           |                              |
| L2 cache reference                 | 7 ns          |           | 14x L1 cache                 |
| Mutex lock/unlock                  | 25 ns         |           |                              |
| Main memory reference              | 100 ns        |           | 20x L2 cache, 200x L1        |
| Read 4K randomly from SSD*         | 150,000 ns    | 150 μs    | ~1GB/sec SSD                 |
| Read 1 MB sequentially from memory | 250,000 ns    | 250 μs    |                              |
| Read 1 MB sequentially from SSD*   | 1,000,000 ns  | 1,000 µs  | 1 ms ~1GB/sec SSD, 4X memory |
| Read 1 MB sequentially from disk   | 20,000,000 ns | 20,000 μs | 20 ms 80x memory, 20X SSD    |

- Latency numbers every programmer should know
- From: https://gist.github.com/jboner/2841832#file-latency-txt

TCSS422: Operating Systems [Fall 2018] December 3, 2018

School of Engineering and Technology, University of Washington - Tacoma





### **PAGE FAULT**

- OS steps in to handle the page fault
- Loading page from disk requires a free memory page
- Page-Fault Algorithm

```
PFN = FindFreePhysicalPage()
         if (PFN == -1)
2:
                                         // no free page found
                                        // run replacement algorithm
3:
                PFN = EvictPage()
         DiskRead(PTE.DiskAddr, pfn)
4:
                                        // sleep (waiting for I/O)
         PTE.present = True
                                         // set PTE bit to present
6:
         PTE.PFN = PFN
                                          // reference new loaded page
7:
         RetryInstruction()
                                          // retry instruction
```

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

### PAGE REPLACEMENTS

- Page daemon
  - Background threads which monitors swapped pages
- Low watermark (LW)
  - Threshold for when to swap pages to disk
  - Daemon checks: free pages < LW</p>
  - Begin swapping to disk until reaching the highwater mark
- High watermark (HW)
  - Target threshold of free memory pages
  - Daemon free until: free pages >= HW

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

L18.12





- Replacement policies apply to "any" cache
- Goal is to minimize the number of misses
- Average memory access time can be estimated:

$$AMAT = (P_{Hit} * T_M) + (P_{Miss} * T_D)$$

| Argument   | Meaning                                                      |
|------------|--------------------------------------------------------------|
| $T_{M}$    | The cost of accessing memory (time)                          |
| $T_D$      | The cost of accessing disk (time)                            |
| $P_{Hit}$  | The probability of finding the data item in the cache(a hit) |
| $P_{Miss}$ | The probability of not finding the data in the cache(a miss) |

- Consider  $T_M = 100 \text{ ns}, T_D = 10 \text{ms}$
- Consider P<sub>hit</sub> = .9 (90%), P<sub>miss</sub> = .1
- Consider P<sub>hit</sub> = .999 (99.9%), P<sub>miss</sub> = .001

December 3, 2018 TCSS422: Operating Systems [Fall 2018]
School of Engineering and Technology, University of Washington - Tacoma

### **OPTIMAL REPLACEMENT POLICY**

- What if:
  - We could predict the future (... with a magical oracle)
  - All future page accesses are known
  - Always replace the page in the cache used farthest in the future
- Used for a comparison
- Provides a "best case" replacement policy
- Consider a 3-element empty cache with the following page accesses:

0 1 2 0 1 3 0 3 1 2 1

What is the hit/miss ratio?

6 hits

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.15

### FIFO REPLACEMENT

- Queue based
- Always replace the oldest element at the back of cache
- Simple to implement
- Doesn't consider importance... just arrival ordering
- Consider a 3-element empty cache with the following page accesses:

0 1 2 0 1 3 0 3 1 2 1

- What is the hit/miss ratio?
- How is FIFO different than LRU?

4 hits

**LRU** incorporates history

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma











# IMPLEMENTING LRU Implementing last recently used (LRU) requires tracking access time for all system memory pages Times can be tracked with a list For cache eviction, we must scan an entire list Consider: 4GB memory system (2<sup>32</sup>), with 4KB pages (2<sup>12</sup>) This requires 2<sup>20</sup> comparisons !!! Simplification is needed Consider how to approximate the oldest page access December 3, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

### **IMPLEMENTING LRU - 2**

- Harness the Page Table Entry (PTE) Use Bit
- HW sets to 1 when page is used
- OS sets to 0
- Clock algorithm (approximate LRU)
  - Refer to pages in a circular list
  - Clock hand points to current page
  - Loops around
    - IF USE\_BIT=1 set to USE\_BIT = 0
    - IF USE\_BIT=0 replace page

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.23

### **CLOCK ALGORITHM** Not as efficient as LRU, but better than other replacement algorithms that do not consider history The 80-20 Workload 100% 80% Hit Rate 60% LRU Clock 40% FIFO 100 Cache Size (Blocks) TCSS422: Operating Systems [Fall 2018] December 3, 2018 L18.24 School of Engineering and Technology, University of Washington - Tacoma

### **CLOCK ALGORITHM - 2**

- Consider dirty pages in cache
- If DIRTY (modified) bit is FALSE
  - No cost to evict page from cache
- If DIRTY (modified) bit is TRUE
  - Cache eviction requires updating memory
  - Contents have changed
- Clock algorithm should favor no cost eviction

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.25

### WHEN TO LOAD PAGES

- On demand → demand paging
- Prefetching
  - Preload pages based on anticipated demand
  - Prediction based on locality
  - Access page P, suggest page P+1 may be used
- What other techniques might help anticipate required memory pages?
  - Prediction models, historical analysis
  - In general: accuracy vs. effort tradeoff
  - High analysis techniques struggle to respond in real time

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

### OTHER SWAPPING POLICIES

- Page swaps / writes
  - Group/cluster pages together
  - Collect pending writes, perform as batch
  - Grouping disk writes helps amortize latency costs
- Thrashing
  - Occurs when system runs many memory intensive processes and is low in memory
  - Everything is constantly swapped to-and-from disk

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.27

### OTHER SWAPPING POLICIES - 2

- Working sets
  - Groups of related processes
  - When thrashing: prevent one or more working set(s) from running
  - Temporarily reduces memory burden
  - •Allows some processes to run, reduces thrashing

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma













December 3, 2018

## CANONICAL DEVICE: HARDWARE INTERFACE Status register Maintains current device status Command register Where commands for interaction are sent Data register Used to send and receive data to the device General concept: The OS interacts and controls device behavior by reading and writing the device registers.

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma







### **INTERRUPTS VS POLLING - 2**

### What is the tradeoff space?

- Interrupts are not always the best solution
  - How long does the device I/O require?
  - What is the cost of context switching?

If device I/O is fast  $\rightarrow$  polling is better. If device I/O is slow  $\rightarrow$  interrupts are better.

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.39

### **INTERRUPTS VS POLLING - 3**

- One solution is a two-phase hybrid approach
  - Initially poll, then sleep and use interrupts
- Livelock problem
  - Common with network I/O
  - Many arriving packets generate many many interrupts
  - Overloads the CPU!
  - No time to execute code, just interrupt handlers!
- Livelock optimization
  - Coalesce multiple arriving packets (for different processes) into fewer interrupts
  - Must consider number of interrupts a device could generate

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

## DEVICE I/O

- To interact with a device we must send/receive **DATA**
- There are two general approaches:
  - Programmed I/O (PIO)
  - Direct memory access (DMA)

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

| Transfer Modes  |                       |                                 |              |  |  |  |
|-----------------|-----------------------|---------------------------------|--------------|--|--|--|
| Mode +          | # +                   | Maximum transfer rate<br>(MB/s) | cycle time + |  |  |  |
| PIO             | 0                     | 3.3                             | 600 ns       |  |  |  |
|                 | 1                     | 5.2                             | 383 ns       |  |  |  |
|                 | 2                     | 8.3                             | 240 ns       |  |  |  |
|                 | 3                     | 11.1                            | 180 ns       |  |  |  |
|                 | 4                     | 16.7                            | 120 ns       |  |  |  |
| Single-word DMA | 0                     | 2.1                             | 960 ns       |  |  |  |
|                 | 1                     | 4.2                             | 480 ns       |  |  |  |
|                 | 2                     | 8.3                             | 240 ns       |  |  |  |
| Multi-word DMA  | 0                     | 4.2                             | 480 ns       |  |  |  |
|                 | 1                     | 13.3                            | 150 ns       |  |  |  |
|                 | 2                     | 16.7                            | 120 ns       |  |  |  |
|                 | 3[34]                 | 20                              | 100 ns       |  |  |  |
|                 | 4[34]                 | 25                              | 80 ns        |  |  |  |
| Ultra DMA       | 0                     | 16.7                            | 240 ns ÷ 2   |  |  |  |
|                 | 1                     | 25.0                            | 160 ns ÷ 2   |  |  |  |
|                 | 2 (Ultra ATA/33)      | 33.3                            | 120 ns ÷ 2   |  |  |  |
|                 | 3                     | 44.4                            | 90 ns ÷ 2    |  |  |  |
|                 | 4 (Ultra ATA/66)      | 66.7                            | 60 ns ÷ 2    |  |  |  |
|                 | 5 (Ultra ATA/100)     | 100                             | 40 ns ÷ 2    |  |  |  |
|                 | 6 (Ultra ATA/133)     | 133                             | 30 ns ÷ 2    |  |  |  |
|                 | 7 (Ultra ATA/167)[35] | 167                             | 24 ns + 2    |  |  |  |





### PROGRAMMED I/O DEVICE (PIO) **INTERACTION**

- Two primary PIO methods
  - Port mapped I/O (PMIO)
  - Memory mapped I/O (MMIO)

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.45

### PORT MAPPED I/O (PMIO)

- Device specific CPU I/O Instructions
- Follows a CISC model: extra instructions
- x86-x86-64: in and out instructions
- outb, outw, outl
- 1, 2, 4 byte copy from EAX  $\rightarrow$  device's I/O port

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

### MEMORY MAPPED I/O (MMIO)

- Device's memory is mapped to CPU memory
- Tenet of RISC CPUs: instructions are eliminated, CPU is simpler
- Old days: 16-bit CPUs didn't have a lot of spare memory space
- Today's CPUs: 32-bit (4GB addr space) & 64-bit (128 TB addr space)
- Regular CPU instructions used to access device: mapped to memory
- Devices monitor CPU address bus and respond to their addresses
- I/O device address areas of memory are <u>reserved</u> for I/O
  - Must not be available for normal memory operations.

December 3, 2018 TCSS422: Operating Systems [Fall 2018]
School of Engineering and Technology, University of Washington - Tacoma

L18.47

L18.48

### **DIRECT MEMORY ACCESS (DMA)**

- Copy data in memory by offloading to "DMA controller"
- Many devices (including CPUs) integrate DMA controllers
- CPU gives DMA: memory address, size, and copy instruction
- DMA performs I/O independent of the CPU
- DMA controller generates CPU interrupt when I/O completes



December 3, 2018 TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

### **DIRECTORY MEMORY ACCESS - 2**

- Many devices use DMA
  - HDD/SSD controllers (ISA/PCI)
  - Graphics cards
  - Network cards
  - Sound cards
  - Intra-chip memory transfer for multi-core processors
- DMA allows computation and data transfer time to proceed in parallel

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.49

### **DEVICE INTERACTION**

- The OS must interact with a variety of devices
- Example: for DISK I/O consider the variety of disks:
- SCSI, IDE, USB flash drive, DVD, etc.
- Device drivers use abstraction to provide general interfaces for vendor specific hardware
- In Linux: block devices

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma









### **HARD DISK DRIVE (HDD)**

- Primary means of data storage (persistence) for decades
- Consists of a large number of data sectors
- Sector size is 512-bytes
- An n sector HDD can be is addressed as an array of 0..n-1 sectors

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.55

### **HDD INTERFACE**

- Writing disk sectors is atomic (512 bytes)
- Sector writes are completely successful, or fail
- Many file systems will read/write 4KB at a time
  - Linux ext3/4 default filesystem blocksize 4096
- Same as typical memory page size



December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

### **BLOCK SIZE IN LINUX EXT4**

mkefs.ext4 -i bytes-per-inode

Specify the bytes/inode ratio. mke2fs creates an inode for every bytes-per-inode bytes of space on the disk. larger the bytes-per-inode ratio, the fewer inodes will be created. This value generally shouldn't be smaller than blocksize of the filesystem, since in that case more inodes would be made than can ever be used. Be warned that it is not possible to expand the number of inodes on a filesystem after it is created, so be careful deciding the correct value for this parameter.

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.57

### **EXAMPLE: USDA SOIL EROSION MODEL** WEB SERVICE (RUSLE2)

- Host ~2,000,000 files totaling 9.5 GB on a ~20GB filesystem on a cloud-based Virtual Machine
- With default inode ratio (4096 block size), only ~488,000 files will fit
- Drive less than half full, but files will not fit!
- HDDs support a minimum block size of 512 bytes
- OS filesystems such as ext3/ext4 can support "finer grained" management at the expense of a larger catalog size

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

### **EXAMPLE: USDA SOIL EROSION MODEL** WEB SERVICE (RUSLE2) - 2

■ Free space in bytes (df)

Device total size bytes-used bytes-free usage /dev/vda2 13315844 9556412 3049188 76%/mnt

■ Free inodes (df -i) @ 512 bytes / node

Device total inodes used free usage /dev/vda2 3552528 1999823 1552705 57% /mnt

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.59

### **HDD INTERFACE - 2**

- Torn write
  - When OS uses larger block size than HDD
  - Block writes not atomic they SPAN multiple HDD sectores
  - Upon power failure only a portion of the OS block is written
- HDD access
  - Sequential reads of sectors is fastest
  - Random sector reads are slow
  - Disk head continuously must jump to different tracks



December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma















# FOUR PHASES OF SEEK Acceleration → coasting → deceleration → settling Acceleration: the arm gets moving Coasting: arm moving at full speed Deceleration: arm slow down Settling: Head is carefully positioned over track Settling time is often high, from .5 to 2ms TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

### HDD I/O

- Data transfer
  - •Final phase of I/O: time to read or write to disk surface
- Complete I/O cycle:
  - 1. Seek (accelerate, coast, decelerate, settle)
  - 2. Wait on rotational latency
  - 3. Data transfer

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.69

### TRACK SKEW

- Sectors are offset across tracks to allow time for head to reposition for sequential reads
- Without track skew, when head is repositioned sector would have already been passed



December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma



### **HDD CACHE**

- Buffer to support caching reads and writes
- Improves drive response time
- Up to 128 MB, slowly have been growing
- Two styles
  - Writeback cache
    - Report write complete immediately when data is transferred to HDD cache
    - Dangerous
  - Writethrough cache
    - Reports write complete only when write is physically completed on disk

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma





### **MODERN HDD SPECS**

- See sample HDD configurations here:
- https://www.hgst.com/products/hard-drives

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.75

### **DISK SCHEDULING**

- Disk scheduler: determine how to order I/O requests
- Multiple levels OS and HW
- OS: provides ordering
- HW: further optimizes using intricate details of physical **HDD** implementation and state

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

### SSTF - SHORTEST SEEK TIME FIRST

- Disk scheduling which I/O request to schedule next
- Shortest Seek Time First (SSTF)
- Order queue of I/O requests by nearest track



SSTF: Scheduling Request 21 and 2
Issue the request to 21 → issue the request to 2

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.77

### **SSTF ISSUES**

- Problem 1: HDD abstraction
- Drive geometry not available to OS. Nearest-block-first is a comparable alternate algorithm.
- Problem 2: Starvation
- Steady stream of requests for local tracks may prevent arm from traversing to other side of platter

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

### **DISK SCHEDULING ALGORITHMS**

- SWEEP
- Single repeated passes across disk
- Issue: if request arrives for a recently visited track it will not be revisited until a full cycle completes
- F-SCAN
- Freeze request queue during sweep
- Cache arriving requests until later
- Elevator (C-SCAN) circular scan
- Sweep from outer to inner track and reverse, inner to outer track, etc.

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma

L18.79

### SHORTEST TIME POSITIONING FIRST

- Determine next sector to read?
- On which track?
- On which sector?



On modern drives, both seek and rotation are roughly equivalent: Thus, SPTF (Shortest Positioning Time First) is useful.

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

### I/O MERGING

- Group temporary adjacent requests
- Reduce overhead
- Read (memory blocks): 33 8 34
- How long we should wait for I/O ?
- When do we know we have waited too long?

December 3, 2018

TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma





## HDD CAPACITY

- Superparamagnetism limits HDD capacity
- In sufficiently small nanoparticles, magnetization can randomly flip direction under the influence of temperature.
- HDD capacity is limited by the minimum usable size of particles – the superparamagnetic limit.

December 3, 2018

TCSS422: Operating Systems [Fall 2018]

School of Engineering and Technology, University of Washington - Tacoma

