# TCSS 422: OPERATING SYSTEMS Memory Virtualization, Segmentation, Memory Paging Wes J. Lloyd School of Engineering and Technology, University of Washington - Tacoma November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington Tacom ### FEEDBACK FROM 11/14 - How to kill all child threads with a pthread\_cond\_broadcast()? - At end of the program, some threads (producers or consumers) may be asleep waiting on a signal. - For consumers, there are no more matrices being produced, so there is no signal for "consumption" - Need some way to shutdown/end the program - Can leverage when producer threads finish their work - Producers last "signal" can be a "broadcast" to awaken all consumers to evaluate special "end of program" state variable. November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma # **OBJECTIVES** Memory Virtualization ■ Chapter 14 - The Memory API ■ Chapter 15 - Address Translation Segments Program 2 Program 3 - Chapter 16 Segmentation - Chapter 17 Free Space Management - Paging - Chapter 18 Introduction to Paging - Chapter 19 Translation Lookaside Buffer November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ``` #include<stdio.h> What will this code do? int * set_magic_number_a() int a = 53247; return &a; } void set_magic_number_b() int b = 11111; } int main() int * x = NULL; x = set_magic_number_a(); printf("The magic number is=%d\n",*x); set_magic_number_b(); printf("The magic number is=%d\n",*x); return 0; ``` ``` #include<stdio.h> What will this code do? int * set_magic_number_a() int a = 53247; Output: return &a; } $ ./pointer error The magic number is=53247 void set_magic_number_b() The magic number is=11111 int b = 111111; We have not changed *x but int main() the value has changed!! int * x = NULL; Why? x = set_magic_number_a(); printf("The magic number is=%d\n",*x); set_magic_number_b(); printf("The magic number is=%d\n",*x); return 0; ``` ### DANGLING POINTER (1/2) - Dangling pointers arise when a variable referred (a) goes "out of scope", and it's memory is destroyed/overwritten (by b) without modifying the value of the pointer (\*x). - The pointer still points to the original memory location of the deallocated memory (a), which has now been reclaimed for (b). November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ### DANGLING POINTER (2/2) Fortunately in the case, a compiler warning is generated: ``` $ g++ -o pointer_error -std=c++0x pointer_error.cpp pointer_error.cpp: In function 'int* set_magic_number_a()': pointer_error.cpp:6:7: warning: address of local variable 'a' returned [enabled by default] ``` This is a common mistake - - accidentally referring to addresses that have gone "out of scope" November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.11 ### CALLOC() ``` #include <stdlib.h> void *calloc(size t num, size t size) ``` - Allocate "C"lear memory on the heap - Calloc wipes memory in advance of use... - size t num : number of blocks to allocate - size t size: size of each block(in bytes) - Calloc() prevents... ``` char *dest = malloc(20); printf("dest string=%s\n", dest); dest string=��F ``` November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ### **SYSTEM CALLS** - brk(), sbrk() - Used to change data segment size (the end of the heap) - Don't use these - Mmap(), munmap() - Can be used to create an extra independent "heap" of memory for a user program - See man page November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma # OBJECTIVES Address translation Base and bounds HW and OS Support Memory segments Memory fragmentation TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma # MEMORY MANAGEMENT UNIT MMU **FAULT** - Portion of the CPU dedicated to address translation - Contains base & bounds registers - Base & Bounds Example: - Consider address translation - 4 KB (4096 bytes) address space, loaded at 16 KB physical location | Virtual Address | Physical Address | |-----------------|-----------------------| | 0 | 16384 | | 1024 | 17408 | | 3000 | 19384 | | 4400 | 20784 (out of bounds) | November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ## DYNAMIC RELOCATION OF PROGRAMS Hardware requirements: | Requirem | ents | HW support | | |----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|--------| | Privileged mode | | CPU modes: kernel, user | | | Base / bounds registe | ers | Registers to support address translation | 1 | | Translate virtual addr bounds | ; check if in | Translation circuitry, check limits | | | Privileged instruction(s) to update base / bounds regs | | Instructions for modifying base/bound registers | | | Privileged instruction(s) to register exception handlers | | Set code pointers to OS code to handle faults | | | Ability to raise exceptions | | For out-of-bounds memory access, or attempts to access privileged instr. | | | Ability to raise exceptions | | | | | November 19, 2018 | TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma | | L14.22 | ### **OS SUPPORT FOR MEMORY VIRTUALIZATION** - For base and bounds OS support required - When process starts running - Allocate address space in physical memory - When a process is terminated - Reclaiming memory for use - When context switch occurs - Saving and storing the base-bounds pair - Exception handlers - Function pointers set at OS boot time November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.23 L14.24 ### OS: WHEN PROCESS STARTS RUNNING - OS searches for free space for new process - Free list: data structure that tracks available memory slots November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ### **DYNAMIC RELOCATION** - OS can move process data when not running - 1. OS deschedules process from scheduler - 2. OS copies address space from current to new location - 3. OS updates PCB (base and bounds registers) - 4. OS reschedules process - When process runs new base register is restored to CPU - Process doesn't know it was even moved! November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ### SEGMENTATION DEREFERENCE // get top 2 bits of 14-bit VA Segment = (VirtualAddress & SEG MASK) >> SEG SHIFT // now get offset Offset = VirtualAddress & OFFSET MASK if (Offset >= Bounds[Segment]) RaiseException(PROTECTION\_FAULT) PhysAddr = Base[Segment] + Offset Register = AccessMemory(PhysAddr) VIRTUAL ADDRESS = 01000001101000 (on heap) $\blacksquare$ SEG\_MASK = 0x3000 (1100000000000) ■ SEG\_SHIFT = $01 \rightarrow heap$ (mask gives us segment code) OFFSET\_MASK = 0xFFF (00111111111111) • OFFSET = 000001101000 = 104 (isolates segment offset) OFFSET < BOUNDS : 104 < 2048</p> TCSS422: Operating Systems [Fall 2018] November 19, 2018 L14.36 School of Engineering and Technology, University of Washington - Tacoma ### **SHARED CODE SEGMENTS** - Code sharing: enabled with HW support - Supports storing shared libraries in memory only once - DLL: dynamic linked library - .so (linux): shraed object in Linux (under /usr/lib) Segment Register Values (with Protection) - Many programs can access them - Protection bits: track permissions to segment | Segment Register values (with Following) | | | | | | | | |------------------------------------------|---------|------|------|------------------------------|---------------------------------|-------|---| | | Segment | Base | Size | Grows Positive? | Protection | | | | | Code | 32K | 2K | 1 | Read-Execute | | | | | Heap | 34K | 2K | 1 | Read-Write | | | | | Stack | 28K | 2K | 0 | Read-Write | | | | | | | | | | | | | | | | | | | | | | | | | | perating Systems [Fall 2018] | iversity of Washington - Tacoma | L14.3 | 8 | ### **SEGMENTATION GRANULARITY** - Coarse-grained - Manage memory as large purpose based segments: - Code segment - Heap segment - Stack segment November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.39 ### **SEGMENTATION GRANULARITY - 2** - Fine-grained - Manage memory as list of segments - Code, heap, stack segments composed of multiple smaller segments - Segment table - On early systems - Stored in memory - Tracked large number of segments November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ### ### **MEMORY HEADERS - 3** - Size of memory chunk is: - Header size + user malloc size - N bytes + sizeof(header) - Easy to determine address of header ``` void free(void *ptr) { header t *hptr = (void *)ptr - sizeof(header t); ``` November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.51 ### THE FREE LIST ■ Simple free list struct ``` typedef struct __node_t { int size; struct __node_t *next; } nodet t; ``` - Use mmap to create free list - 4kb heap, 4 byte header, one contiguous free chunk ``` // mmap() returns a pointer to a chunk of free space node_t *head = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0); head->size = 4096 - sizeof(node_t); head->next = NULL; ``` November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ### **MEMORY ALLOCATION STRATEGIES** - Best fit - Traverse free list - Identify all candidate free chunks - Note which is smallest (has best fit) - When splitting, "leftover" pieces are small (and potentially less useful -- fragmented) - Worst fit - Traverse free list - Identify largest free chunk - Split largest free chunk, leaving a still large free chunk November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ### **MEMORY ALLOCATION STRATEGIES - 2** ### First fit - Start search at beginning of free list - Find first chunk large enough for request - Split chunk, returning a "fit" chunk, saving the remainder - Avoids full free list traversal of best and worst fit ### Next fit - Similar to first fit, but start search at last search location - Maintain a pointer that "cycles" through the list - Helps balance chunk distribution vs. first fit - Find first chunk, that is large enough for the request, and split - Avoids full free list traversal November 19, 2018 | ICSS422: Operating System School of Engineering and TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.61 ### **SEGREGATED LISTS** - For popular sized requests e.g. for kernel objects such as locks, inodes, etc. - Manage as segregated free lists - Provide object caches: stores pre-initialized objects - How much memory should be dedicated for specialized requests (object caches)? - If a given cache is low in memory, can request "slabs" of memory from the general allocator for caches. - General allocator will reclaim slabs when not used November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ### **PAGING** - Split up address space of process into <u>fixed sized pieces</u> called pages - Alternative to <u>variable sized pieces</u> (Segmentation) which suffers from significant fragmentation - Physical memory is split up into an array of fixed-size slots called page frames. - Each process has a page table which translates virtual addresses to physical addresses November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.67 ### **ADVANTAGES OF PAGING** - Flexibility - Abstracts the process address space into pages - No need to track direction of HEAP / STACK growth - Just add more pages... - No need to store unused space - As with segments... - Simplicity - Pages and page frames are the same size - Easy to allocate and keep a free list of pages November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma Page Table: **PAGING: EXAMPLE** $VP0 \rightarrow PF3$ $VP1 \rightarrow PF7$ VP2 → PF5 VP3 → PF2 Consider a 128 byte address space with 16-byte pages page frame 0 of reserved for OS physical memory 16 Consider a 64-byte program (unused) page frame 1 address space page frame 2 page 3 of AS page 0 of AS page frame 3 64 0 (page 0 of page frame 4 (unused) the address space) 16 80 page 2 of AS page frame 5 (page 1) 32 96 (page 2) (unused) page frame 6 48 112 (page 3) page 1 of AS page frame 7 128 A Simple 64-byte Address Space 64-Byte Address Space Placed In Physical Memory TCSS422: Operating Systems [Fall 2018] November 19, 2018 L14.68 School of Engineering and Technology, University of Washington - Tacoma ### PAGING DESIGN QUESTIONS - (1) Where are page tables stored? - (2) What are the typical contents of the page table? - (3) How big are page tables? - (4) Does paging make the system too slow? November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.71 ### (1) WHERE ARE PAGE TABLES STORED? - Example: - Consider a 32-bit process address space (up to 4GB) - With 4 KB pages - 20 bits for VPN (2<sup>20</sup> pages) - 12 bits for the page offset (2<sup>12</sup> unique bytes in a page) - Page tables for each process are stored in RAM - Support potential storage of 2<sup>20</sup> translations - = 1,048,576 pages per process - Each page has a page table entry size of 4 bytes November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma ## PAGE TABLE EXAMPLE - With 2<sup>20</sup> slots in our page table for a single process - Each slot dereferences a VPN - Provides physical frame number - Each slot requires 4 bytes (32 bits) - 20 for the PFN on a 4GB system with 4KB pages - 12 for the offset which is preserved - (note we have no status bits, so this is unrealistically small) VPN<sub>0</sub> VPN<sub>1</sub> VPN<sub>2</sub> ... VPN<sub>1048576</sub> How much memory to store page table for 1 process? 4,194,304 bytes (or 4MB) to index one process November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.73 ### NOW FOR AN ENTIRE OS - If 4 MB is required to store one process - Consider how much memory is required for an entire OS? - With for example 100 processes... - Page table memory requirement is now 4MB x 100 = 400MB - If computer has 4GB memory (maximum for 32-bits), the page table consumes 10% of memory 400 MB / 4000 GB Is this efficient? November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.74 ### **PAGE TABLE ENTRY - 2** - Common flags: - Valid Bit: Indicating whether the particular translation is valid. - Protection Bit: Indicating whether the page could be read from, written to, or executed from - Present Bit: Indicating whether this page is in physical memory or on disk(swapped out) - Dirty Bit: Indicating whether the page has been modified since it was brought into memory - Reference Bit(Accessed Bit): Indicating that a page has been accessed November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.77 # (3) HOW BIG ARE PAGE TABLES? - Page tables are too big to store on the CPU - Page tables are stored using physical memory - Paging supports efficiently storing a sparsely populated address space - Reduced memory requirement Compared to base and bounds, and segments November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.78 # (4) DOES PAGING MAKE THE SYSTEM TOO SLOW? - Translation - Issue #1: Starting location of the page table is needed - HW Support: Page-table base register - stores active process - Facilitates translation Stored in RAM → Page Table: VP0 → PF3 VP1 → PF7 VP2 → PF5 VP3 → PF2 - Issue #2: Each memory address translation for paging requires an extra memory reference - HW Support: TLBs (Chapter 19) November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.79 # **PAGING MEMORY ACCESS** ``` // Extract the VPN from the virtual address 2. VPN = (VirtualAddress & VPN_MASK) >> SHIFT 3. // Form the address of the page-table entry (PTE) 5. PTEAddr = PTBR + (VPN * sizeof(PTE)) 6. // Fetch the PTE 7. PTE = AccessMemory(PTEAddr) 8. 10. // Check if process can access the page if (PTE.Valid == False) 11. 12. RaiseException(SEGMENTATION_FAULT) 13. else if (CanAccess(PTE.ProtectBits) == False) 14. RaiseException(PROTECTION_FAULT) 15. // Access is OK: form physical address and fetch it 16. 17. offset = VirtualAddress & OFFSET_MASK PhysAddr = (PTE.PFN << PFN_SHIFT) | offset 18. 19. Register = AccessMemory(PhysAddr) TCSS422: Operating Systems [Fall 2018] L14.80 November 19, 2018 School of Engineering and Technology, University of Washington - Tacoma ``` ### PAGING SYSTEM EXAMPLE - Consider a 4GB Computer: - With a 4096-byte page size (4KB) - How many pages would fit in physical memory? - Now consider a page table: - For the page table entry, how many bits are required for the - If we assume the use of 4-byte (32 bit) page table entries, how many bits are available for status bits? - How much space does this page table require? Page Table Entries x Number of pages - How many page tables (for user processes) would fill the entire 4GB of memory? November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.83 CHAPTER 19: **TRANSLATION LOOKASIDE BUFFER** (TLB) TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington -November 19, 2018 L14.84 # **OBJECTIVES** - Chapter 19 - TLB Algorithm - TLB Tradeoffs - TLB Context Switch November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.85 # TRANSLATION LOOKASIDE BUFFER - Legacy name... - Better name, "Address Translation Cache" - ■TLB is an on CPU cache of address translations - •virtual → physical memory November 19, 2018 TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma L14.86 # TLB - ADDRESS TRANSLATION CACHE Key detail: For a TLB miss, we first access the page table in RAM to populate the TLB... we then requery the TLB All address translations go through the TLB TCSS422: Operating Systems [Fall 2018] School of Engineering and Technology, University of Washington - Tacoma