

#### **OBJECTIVES**

■ Mon 3/11 (4pm): Husky Alumni Visit from T-Mobile Q&A

CS work life after graduation-room 206C Garrett Lahmann ('18), Vlad Kaganyuk ('17)

- Wed 3/13: Prof. Mohamed Ali- UWT CSS Grad Program
- Active Reading Quiz Posted Chapter 19
- Assignment 3
- **Memory Virtualization**
- Chapter 18 Introduction to Paging
- Chapter 19 Translation Lookaside Buffer (TLB)
- Chapter 20 Smaller Page Tables

March 6, 2019 TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma

# FEEDBACK FROM 3/4

- What is stored in data headers (for malloc) besides the size?
- See Malloc.c source code Line 1044:
- https://code.woboq.org/userspace/glibc/malloc/malloc.c.html
- Also:

https://reverseengineering.stackexchange.com/questions/ 15033/how-does-glibc-malloc-work

```
1044
     struct malloc chunk {
1045
                            mchunk_prev_size; /* Size of previous chunk (if free). */
1046
       INTERNAL SIZE T
                                              /* Size in bytes, including overhead. */
       INTERNAL SIZE T
                            mchunk size;
1048
1049
       struct malloc chunk* fd;
                                       /* double links -- used only if free. */
1050
       struct malloc chunk* bk;
1051
1052
       /* Only used for large blocks: pointer to next larger size. */
       struct malloc chunk* fd nextsize; /* double links -- used only if free. */
1053
       struct malloc chunk* bk nextsize;
1054
1055 };
```

#### FEEDBACK - 2

- Can internal fragments ever be recovered?
- No, not without changing how data chunks are provisioned from memory to the programmer (OS change)
- Internal fragmentation: no tracking (data) of unused portion of a chunk
- OS provides programmer with chunks of memory that are too big
- Programmer receives memory chunk that is larger than the original request

March 6, 2019 TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma

#### FEEDBACK - 3

- Could you post solutions to the class activity
  - Happy to share answers after class, etc.
- How many notes for the final? Will it cover all material?
  - Final is comprehensive, 2 pages of notes, double-sided
- Do you have room for students interested in cloud computing for TCSS 499 Independent Study and TCSS 498 **Directed Readings?** 
  - Yes, here's some quick background

March 6, 2019

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma







#### **PAGING**

- Split up address space of process into <u>fixed sized pieces</u> called pages
- Alternative to <u>variable sized pieces</u> (Segmentation) which suffers from significant fragmentation
- Physical memory is split up into an array of fixed-size slots called page frames.
- Each process has a page table which translates virtual addresses to physical addresses

March 6, 2019

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma

L14.9

# **ADVANTAGES OF PAGING**

- Flexibility
  - Abstracts the process address space into pages
  - No need to track direction of HEAP / STACK growth
    - Just add more pages...
  - No need to store unused space
    - As with segments...
- Simplicity
  - Pages and page frames are the same size
  - Easy to allocate and keep a free list of pages

March 6, 2019

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma









# (1) WHERE ARE PAGE TABLES STORED?

- **Example:** 
  - Consider a 32-bit process address space (up to 4GB)
  - With 4 KB pages
  - 20 bits for VPN (2<sup>20</sup> pages)
  - 12 bits for the page offset (2<sup>12</sup> unique bytes in a page)
- Page tables for each process are stored in RAM
  - Support potential storage of 2<sup>20</sup> translations
    - = 1,048,576 pages per process
  - Each page has a page table entry size of 4 bytes

March 6, 2019

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.15

#### PAGE TABLE EXAMPLE

- With 2<sup>20</sup> slots in our page table for a single process
- Each slot dereferences a VPN
- Provides physical frame number
- Each slot requires 4 bytes (32 bits)
  - 20 for the PFN on a 4GB system with 4KB pages
  - 12 for the offset which is preserved
  - (note we have no status bits, so this is unrealistically small)

VPN<sub>o</sub> VPN<sub>1</sub> VPN<sub>2</sub> VPN<sub>1048576</sub>

How much memory to store page table for 1 process?

4.194.304 bytes (or 4MB) to index one process

March 6, 2019

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma

#### NOW FOR AN ENTIRE OS

- If 4 MB is required to store one process
- Consider how much memory is required for an entire OS?
  - With for example 100 processes...
- Page table memory requirement is now 4MB x 100 = 400MB
- If computer has 4GB memory (maximum for 32-bits), the page table consumes 10% of memory

400 MB / 4000 GB

Is this efficient?

March 6, 2019

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.17

L14.18

# (2) WHAT'S ACTUALLY IN THE PAGE TABLE

- Page table is data structure used to map virtual page numbers (VPN) to the physical address (Physical Frame Number PFN)
  - Linear page table → simple array
- Page-table entry
  - 32 bits for capturing state



March 6, 2019 TCSS422: Operating System School of Engineering and

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma



# **PAGE TABLE ENTRY - 2**

- Common flags:
- Valid Bit: Indicating whether the particular translation is valid.
- Protection Bit: Indicating whether the page could be read from, written to, or executed from
- Present Bit: Indicating whether this page is in physical memory or on disk(swapped out)
- Dirty Bit: Indicating whether the page has been modified since it was brought into memory
- Reference Bit(Accessed Bit): Indicating that a page has been accessed

March 6, 2019 TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

# (3) HOW BIG ARE PAGE TABLES?

- Page tables are too big to store on the CPU
- Page tables are stored using physical memory
- Paging supports efficiently storing a sparsely populated address space
  - Reduced memory requirement
     Compared to base and bounds, and segments

March 6, 2019

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.21

# (4) DOES PAGING MAKE THE SYSTEM TOO SLOW?

- Translation
- Issue #1: Starting location of the page table is needed
  - HW Support: Page-table base register
    - stores active process
    - Facilitates translation

Page Table:

VP0 → PF3

VP1 → PF7

 $VP2 \rightarrow PF5$  $VP3 \rightarrow PF2$ 

- Issue #2: Each memory address translation for paging requires an extra memory reference
  - HW Support: TLBs (Chapter 19)

March 6, 2019

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma

Stored in RAM →

```
PAGING MEMORY ACCESS
        // Extract the VPN from the virtual address
2.
        VPN = (VirtualAddress & VPN_MASK) >> SHIFT
3.
4.
        // Form the address of the page-table entry (PTE)
5.
        PTEAddr = PTBR + (VPN * sizeof(PTE))
        // Fetch the PTE
8.
        PTE = AccessMemory(PTEAddr)
9.
        // Check if process can access the page
10.
        if (PTE.Valid == False)
11.
                 RaiseException(SEGMENTATION_FAULT)
13.
        else if (CanAccess(PTE.ProtectBits) == False)
14.
                 RaiseException(PROTECTION_FAULT)
15.
        else
16.
                 // Access is OK: form physical address and fetch it
                 offset = VirtualAddress & OFFSET_MASK
17.
18.
                 PhysAddr = (PTE.PFN << PFN_SHIFT) | offset
19.
                 Register = AccessMemory(PhysAddr)
                   TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma
    March 6, 2019
                                                                         L14.23
```

#### **COUNTING MEMORY ACCESSES**

Example: Use this Array initialization Code

Assembly equivalent:

```
0x1024 movl $0x0,(%edi,%eax,4)
0x1028 incl %eax
0x102c cmpl $0x03e8,%eax
0x1030 jne 0x1024
```

March 6, 2019 TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma



#### PAGING SYSTEM EXAMPLE

- Consider a 4GB Computer:
- With a 4096-byte page size (4KB)
- How many pages would fit in physical memory?
- Now consider a page table:
- For the page table entry, how many bits are required for the VPN?
- If we assume the use of 4-byte (32 bit) page table entries, how many bits are available for status bits?

School of Engineering and Technology, University of Washington - Tacoma

- How much space does this page table require?
  Page Table Entries x Number of pages
- How many page tables (for user processes) would fill the entire 4GB of memory?

would fill the entire 4GB of memory?

TCSS422: Operating Systems [Winter 2019]

L14.26

March 6, 2019

Slides by Wes J. Lloyd





## TRANSLATION LOOKASIDE BUFFER

- Legacy name...
- Better name, "Address Translation Cache"
- TLB is an on CPU cache of address translations
  - ■virtual → physical memory

March 6, 2019

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma

L14.29

#### TRANSLATION LOOKASIDE BUFFER - 2 Page Table[39] ■ Goal: **Reduce access** 1174 to the page Page Table[1] 1124 tables 0-0000-0000-0000-000 Example: 50 RAM accesses 40100 7132 for first 5 for-loop 7282 40050 iterations 40000 7232 Move lookups 4196 1124 from RAM to TLB 1074 4146 by caching page 1024 table entries 50 **Memory Access** TCSS422: Operating Systems [Winter 2019] March 6, 2019 L14.30 School of Engineering and Technology, University of Washington - Tacoma









# TLB - ADDRESS TRANSLATION CACHE

- Key detail:
- For a TLB miss, we first access the page table in RAM to populate the TLB... we then requery the TLB
- All address translations go through the TLB

March 6, 2019

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma

**TLB EXAMPLE** 

int sum = 0; for( i=0; i<10; i++){</pre> 1: 2: sum+=a[i];

- **Example:**
- Program address space: 256-byte
  - Addressable using 8 total bits (28)
  - 4 bits for the VPN (16 total pages)
- Page size: 16 bytes
  - Offset is addressable using 4-bits
- Store an array: of (10) 4-byte integers

TCSS422: Operating Systems [Winter 2019] March 6, 2019 School of Engineering and Technology, University of Washington - Tacoma

L14.36

L14.35

VPN = 00

VPN = 01VPN = 03**VPN** = 04

VPN = 07

VPN = 08

VPN = 09

**VPN** = 12

**VPN** = 13 **VPN** = 14

**VPN** = 15

a[8]









# **OBJECTIVES**

- Chapter 20
  - Smaller tables
  - Hybrid tables
  - Multi-level page tables

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.41

#### LINEAR PAGE TABLES

- Consider array-based page tables:
  - Each process has its own page table
  - 32-bit process address space (up to 4GB)
  - With 4 KB pages
  - 20 bits for VPN
  - 12 bits for the page offset

November 26, 2018

TCSS422: Operating Systems [Winter 2019]

L14.42 School of Engineering and Technology, University of Washington - Tacoma

#### **LINEAR PAGE TABLES - 2**

- Page tables stored in RAM
- Support potential storage of 2<sup>20</sup> translations
  - = 1,048,576 pages per process @ 4 bytes/page
- Page table size 4MB / process

Page table size = 
$$\frac{2^{32}}{2^{12}} * 4Byte = 4MByte$$

- Consider 100+ OS processes
  - Requires 400+ MB of RAM to store process information

November 26, 2018

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma

L14.43

#### **LINEAR PAGE TABLES - 2**

- Page tables stored in RAM
- Support potential storage of 2<sup>20</sup> translations
  - = 1,048,576 pages per process @ 4 bytes/page
- Page table size 4MB / process

Page tables are too big and consume too much memory.

**Need Solutions ...** 

- Consider 100+ OS processes
  - Requires 400+ MB of RAM to store process information

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

## **PAGING: USE LARGER PAGES**

- <u>Larger pages</u> = 16KB = 2<sup>14</sup>
- 32-bit address space: 2<sup>32</sup>
- $2^{18} = 262,144$  pages

$$\frac{2^{32}}{2^{14}} * 4 = 1MB$$
 per page table

- Memory requirement cut to 1/4
- However pages are huge
- Internal fragmentation results
- 16KB page(s) allocated for small programs with only a few variables

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.45



A 16KB Address Space with 1KB Pages

10 11

12 13 A Page Table For 16KB Address Space

November 26, 2018

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma









#### **MULTI-LEVEL PAGE TABLES - 3**

- Advantages
  - Only allocates page table space in proportion to the address space actually used
  - Can easily grab next free page to expand page table
- Disadvantages
  - Multi-level page tables are an example of a time-space tradeoff
  - Sacrifice address translation time (now 2-level) for space
  - Complexity: multi-level schemes are more complex

November 26, 2018

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma L14.51

**EXAMPLE** ■ 16KB address space, 64byte pages How large would a one-level page table need to be? ■  $2^{14}$  (address space) /  $2^{6}$  (page size) =  $2^{8}$  = 256 (pages) 0000 0000 0000 0001 (free) Address space (free) Page size 64 byte heap Virtual address heap (free) Offset Page table entry 28(256) 1111 1111 A 16-KB Address Space With 64-byte Pages 13 | 12 | 11 | 10 | 8 6 5 3 0 Offset TCSS422: Operating Systems [Winter 2019] November 26, 2018 L14.52 School of Engineering and Technology, University of Washington - Tacoma

#### **EXAMPLE - 2**

- 256 total page table entries (64 bytes each)
- 1,024 bytes page table size, stored using 64-byte pages
   (1024/64) = 16 page directory entries (PDEs)
- Each page directory entry (PDE) can hold 16 page table entries (PTEs) e.g. lookups
- 16 page directory entries (PDE) x 16 page table entries (PTE) = 256 total PTEs
- Key idea: the page table is stored using pages too!

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.53

#### PAGE DIRECTORY INDEX

- Now, let's split the page table into two:
  - 8 bit VPN to map 256 pages
  - 4 bits for page directory index (PDI 1st level page table)
  - 6 bits offset into 64-byte page



November 26, 2018

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma

#### **PAGE TABLE INDEX**

- 4 bits page directory index (PDI 1st level)
- 4 bits page table index (PTI 2<sup>nd</sup> level)



- To dereference one 64-byte memory page,
  - We need one page directory entry (PDE)
  - One page table Index (PTI) can address 16 pages

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.55

#### **EXAMPLE - 3**

- For this example, how much space is required to store as a single-level page table with any number of PTEs?
- 16KB address space, 64 byte pages
- 256 page frames, 4 byte page size
- 1,024 bytes required (single level)
- How much space is required for a two-level page table with only 4 page table entries (PTEs)?
- Page directory = 16 entries x 4 bytes (1 x 64 byte page)
- Page table = 4 entries x 4 bytes (1 x 64 byte page)
- 128 bytes required (2 x 64 byte pages)
  - Savings = using just 12.5% the space !!!

November 26, 2018

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma

#### 32-BIT EXAMPLE

- Consider: 32-bit address space, 4KB pages, 2<sup>20</sup> pages
- Only 4 mapped pages
- Single level: 4 MB (we've done this before)
- Two level: (old VPN was 20 bits, split in half)
- Page directory = 2<sup>10</sup> entries x 4 bytes = 1 x 4 KB page
- Page table = 4 entries x 4 bytes (mapped to 1 4KB page)
- 8KB (8,192 bytes) required
- Savings = using just .78 % the space !!!
- 100 sparse processes now require < 1MB for page tables

November 26, 2018

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma

L14.57

#### MORE THAN TWO LEVELS

- Consider: page size is 29 = 512 bytes
- Page size 512 bytes / Page entry size 4 bytes
- VPN is 21 bits



November 26, 2018

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma









#### **MORE THAN TWO LEVELS - 4**

- We can now address 1GB with "fine grained" 512 byte pages
- Using multiple levels of indirection



- Consider the implications for address translation!
- How much space is required for a virtual address space with 4 entries on a 512-byte page? (let's say 4 32-bit integers)
- PD0 1 page, PD1 1 page, PT 1 page = 1,536 bytes
- Memory Usage= 1,536 (3-level) / 8,388,608 (1-level) = .0183% !!!

November 26, 2018

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma

L14.63

#### **ADDRESS TRANSLATION CODE**

```
// 5-level Linux page table address lookup
//

// Inputs:
// mm_struct - process's memory map struct
// vpage - virtual page address

// Define page struct pointers
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pud;
pmd_t *pmt;
pte_t *pte;
struct page *page;

November 26, 2018

TCSS422: Operating Systems [Winter 2019]
School of Engineering and Technology, University of Washington - Tacoma
```

#### **ADDRESS TRANSLATION - 2**

pgd = pgd offset(mm, vpage); if (pgd none (\*pgd) || pgd bad (\*pgd)) | for the process, returns the PGD entry that return 0; p4d = p4d\_offset(pgd, vpage); if (p4d\_none(\*p4d) || p4d\_bad(\*p4d)) return 0: pud = pud offset(p4d, vpage); if (pud\_none(\*pud) || pud\_bad(\*pud)) return 0; pmd = pmd\_offset(pud, vpage); if (pmd\_none(\*pmd) || pmd\_bad(\*pmd)) return 0; if (!(pte = pte\_offset\_map(pmd, vpage))) return 0; if (!(page = pte\_page(\*pte))) return 0; physical page addr = page to phys(page) pte\_unmap(pte);

#### pgd\_offset():

Takes a vpage address and the mm\_struct covers the requested address...

#### p4d/pud/pmd\_offset():

Takes a vpage address and the pgd/p4d/pud entry and returns the relevant p4d/pud/pmd.

#### pte\_unmap()

release temporary kernel mapping for the page table entry

return physical\_page\_addr; // param to send back

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.65

#### **INVERTED PAGE TABLES**



- Keep a single page table for each physical page of memory
- Consider 4GB physical memory
- Using 4KB pages, page table requires 4MB to map all of RAM
- Page table stores
  - Which process uses each page
  - Which process virtual page (from process virtual address space) maps to the physical page
- All processes share the same page table for memory mapping, kernel must isolate all use of the shared structure
- Finding process memory pages requires search of 2<sup>20</sup> pages
- Hash table: can index memory and speed lookups

November 26, 2018

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma

## **MULTI-LEVEL PAGE TABLE EXAMPLE**

- Consider a 16 MB computer which indexes memory using 4KB pages
- (#1) For a single level page table, how many pages are required to index memory?
- (#2) How many bits are required for the VPN?
- (#3) Assuming each page table entry (PTE) can index any byte on a 4KB page, how many offset bits are required?
- (#4) Assuming there are 8 status bits, how many bytes are required for each page table entry?

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.67

#### **MULTI LEVEL PAGE TABLE EXAMPLE - 2**

- (#5) How many bytes (or KB) are required for a single level page table?
- Let's assume a simple HelloWorld.c program.
- HelloWorld.c requires virtual address translation for 4 pages:
  - 1 code page
- 1 stack page
- 1 heap page
- 1 data segment page
- (#6) Assuming a two-level page table scheme, how many bits are required for the Page Directory Index (PDI)?
- (#7) How many bits are required for the Page Table Index (PTI)?

November 26, 2018

TCSS422: Operating Systems [Winter 2019]

School of Engineering and Technology, University of Washington - Tacoma

## **MULTI LEVEL PAGE TABLE EXAMPLE - 3**

- Assume each page directory entry (PDE) and page table entry (PTE) requires 4 bytes:
  - 6 bits for the Page Directory Index (PDI)
  - 6 bits for the Page Table Index (PTI)
  - 12 offset bits
  - 8 status bits
- (#8) How much total memory is required to index the HelloWorld.c program using a two-level page table when we only need to translate 4 total pages?
- HINT: we need to allocate one Page Directory and one Page Table...
- HINT: how many entries are in the PD and PT

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

L14.69

#### **MULTI LEVEL PAGE TABLE EXAMPLE - 4**

- (#9) Using a single page directory entry (PDE) pointing to a single page table (PT), if all of the slots of the page table (PT) are in use, what is the total amount of memory a two-level page table scheme can address?
- (#10) And finally, for this example, as a percentage (%), how much memory does the 2-level page table scheme consume compared to the 1-level scheme?
- <u>HINT</u>: two-level memory use / one-level memory use

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

# **ANSWERS**

- **#1** 4096 pages
- #2 12 bits
- #3 12 bits
- #4 4 bytes
- #5 4096 x 4 = 16,384 bytes (16KB)
- #6 6 bits
- #7 6 bits
- #8 256 bytes for Page Directory (PD) (64 entries x 4 bytes)
   256 bytes for Page Table (PT) TOTAL = 512 bytes
- #9 64 entries, where each entry maps a 4,096 byte page With 12 offset bits, can address 262,144 bytes (256 KB)
- #10-512/16384 = .03125  $\rightarrow$  3.125%

November 26, 2018

TCSS422: Operating Systems [Winter 2019] School of Engineering and Technology, University of Washington - Tacoma

