Ethics - Computer Action Team

Ethics - Computer Action Team

ECE 371 Microprocessors Chapter 7 Demand-Paged Virtual Memory Management Herbert G. Mayer, PSU Status 10/10/2015 For use at CCUT Fall 2015 1 Syllabus

Introduction High-Level Steps of Paged VMM VMM Mapping Steps Two Levels History of Paging Goals and Methods of Paging An Unrealistic Paging Scheme

Realistic Paging Scheme 32-bit Architecture Typical PD and PT Entries Page Replacement Using FIFO Appendix Bibliography 2 Introduction With the advent of 64-bit architectures, paged Virtual Memory Management (VMM) experienced a revival in the 1990s Originally, the force behind VMM systems (paged or segmented or both on Multics [4]) was the need for more addressable, logical memory than was typically available in physical memory

Scarceness of physical memory then was caused by high cost of memory. The early technology of core memories carried a high price tag due to the tedious manual labor involved High cost per byte of memory is gone, but the days of insufficient physical memory have returned, with larger data sets due to 64 bit addresses 3 Introduction Paged VMM is based on the idea that some memory areas can be relocated out to disk, while other reused areas can be moved back from disc into memory on demand Disk space abounds while main memory is more constrained. This relocation in and out, called

swapping, can be handled transparently, thus imposing no additional burden on the application programmer The system must detect situations in which an address references an object that is on disk and must therefore perform the hidden swap-in automatically and transparently At the expense of additional time 4 Introduction Paged VMM trades speed for address range. The loss in speed is caused by mapping logical-tophysical, and by the slow disk accesses; slow disk vs. faster memory access Typical disk access can be thousands to millions of

times more expensive in number of cycles than a memory access However, if virtual memory mapping allows large programs to run -albeit slowly- that previously could not execute due to their high demands of memory, then the trade-off is worth the loss in speed The real trade-off is enabling memory-hungry programs to run slowly vs. not executing at all 5 Steps of Two-Level VMM Mapping On 32-Bit Architecture

6 Paged VMM --from Wikipedia [6] 7 High-Level Steps of Paged VMM Some instruction references a logical address la VMM determines, whether la maps onto resident page If yes, memory access completes. In a system with a data cache, such an access is fast; see also special purpose TLB cache for VMM If memory access is a store operation, (AKA write) the fact of data modification must be recorded If access does not address a resident page:

Then the missing page is found on disk and made available to memory via swap-in or else the missing page is created for the first time ever, generally with initialization for system pages and no initialization for user pages; yet even for user pages a frame must be found Making a new page available requires finding memory space of page frame size, aligned on a page boundary 8 High-Level Steps of Paged VMM

If such space can be allocated from unused memory (usually during initial program execution), a page frame is now reserved from such available memory If no page frame is available, a currently resident page must be swapped out and the freed space reused for the new page Preferred is swapping out a page of the current process If needed, a page from other processes will be removed Note preference of locating unmodified pages, i.e. pages with no writes; no swap-out needed!

Should the page to be replaced be dirty, it must first be written to disk Otherwise a copy already exists on disk and the costly swap-out operation can be skipped 9 VMM Mapping Steps Two Levels Some instruction references logical address (la) Processor finds datum in L1 data cache, then the operation is complete

Else the work for VMM starts In our paged VMM discussions we ignore data caches But do include a special purpose, cache, known as TLB Processor finds start address of Page Directory (PD) Logical address is partitioned into three bit fields, Page Directory Index, Page Table (PT) Index, and user-page Page Offset The PD finds the PT, the PT finds the user Page Frame (actual page address) Typical entries are 32-bits long, 4 bytes on byteaddressable architecture Entry in PD is found by adding PD Index left-shifted by 2, added to the start address of PD 10

VMM Mapping Steps Two Levels A PD entry yields a PT address, stored without trailing zeros PT Index left-shifted by 2 is then added to the previously found PT address; this yields a Page Address Add Page Offset to previously found Page Address; this yields byte address Along the way there may have been 3 page faults, with swap-outs and swap-ins During book-keeping (e.g. find clean pages, locate LRU page etc.) many more memory accesses can result 11

History of Paging Developed in the 1960s at University of Manchester for Atlas Computers Used commercially in KDF-9 computer [5] of English Electric Co. In fact, KDF-9 was one of the major architectural milestones in computer history aside from von Neumanns and Atanasoffs earlier machines KDF9 incorporated first cache and VMM, had hardware display for stack frame addresses, etc. HW display was necessary/helpful, since assumed programming language was Algol-60 with nested scopes, different from todays C++ 12

History of Paging In the late 1960s to early 1980s, memory was highly expensive, processors were expensive and getting faster Programs grew larger Insufficient memories were common, and become one aspect of the growing Software Crisis since the late 1970s 16-bit minicomputers and 32-bit mainframes became common; also 18-bit address architectures (CDC and Cyber) of 60-bit words were used 13 History of Paging

Paging grew increasingly popular: fast execution was traded for a larger address range By mid 1980s, memories became cheaper, faster By the late 1980s, memories had become cheaper yet, the address range remained 32-bit, and large physical memories became possible and available Supercomputers were designed in the 1980s whose OS provided no virtual memory at all E.g. Cray systems were built and marketed without VMM The Intel Hypercube NX operating system had no virtual memory management 14 History of Paging

Just when VMM was falling into disfavor, the addressing limitation of 32 bits started constraining programs In the 1990s, 64-bit architectures became commonplace, rather than an exception Intermediate steps between evolving architectural generations: The Harris 3-byte system with 24-bit addresses, not making the jump to 32 bits The Pentium Pro with 36 bits in extended addressing mode, not at all making the jump to 64 bit addresses Early Itanium family processors had 44 physical address bits, not quite growing to 64 bits 15 History of Paging

By the 1990s, 64-bit addresses were common, 64bit integer arithmetic was generally performed in hardware, no longer via slow library extensions Pentium Pro has 4 kB pages by default, 4 MB pages if page size extension (pse) bit is set in normal addressing mode However, if the physical address extension (pae) bit and pse bit are set, the default page size changes from 4 kB to 2 MB 16 Goals, Methods of Paging Goal to make full logical (virtual) address space available, even if smaller physical memory installed Perform mapping transparently. Thus, if at a later

time the target receives a larger physical memory, the same program will run unchanged except faster Note that overlays were managed explicitly by programmer Map logical onto physical address, even at cost of performance; special purpose cache (tlb) helped Implement the mapping in a way that the overhead is small in relation to the total program execution 17 Goals, Methods of Paging A necessary requirement is a sufficiently large working set, else thrashing happens Moreover, this is accomplished by caching

page directory- and page table entries Or by placing complete page directory into a special cache 18 Unrealistic Paging Scheme 32-Bit Arch. On a 32-bit architecture, byte-addressable, 4-byte words, 4 kB page size Single-level paging mechanism; later well cover multi-level mapping On 4kB page, rightmost 12 bits of each page address are all 0, or implied Page Table thus has 220 entries, each entry typically 4 bytes; due to number of bits: 32-12 = 20

Page offset of 12 bits identifies each byte on a 4 kB page within that page 19 Unrealistic Paging Scheme 32-Bit Arch. Logical LogicalAddress Address Page PageOffset Offset(12) (12)

Page PageAddress Address(20) (20) 20 Unrealistic Paging Scheme 32-Bit Arch. With Page Table entry consuming 4 bytes, results in page table of 4 MB This is a miserable paging scheme, since the scarce resource, physical memory, consumes already a 4 MB overhead Note that Page Tables should be resident, as long as some entries point to resident user pages

Thus the overhead may exceed the total available resource, which is memory! Worse: most entries are empty; their associated pages do not exist yet; the pointers are null! 21 Unrealistic Paging Scheme 32-Bit Arch. Page PageOffset Offset(12) (12) Page PageAddress Address(20)

(20) Logical LogicalAddress Address Page PageTable Table4MB 4MB User UserPages Pages4kB 4kB

22 Unrealistic Paging Scheme 32-Bit Arch. Problem: The one data structure that should be resident is too large, consuming much/most of the physical memory that is so scarce --or was so scare in the 1970s, when VMM was productized So: break the table overhead into smaller units! Disadvantage of additional mapping: more memory accesses Advantage: Avoiding large 4 MB Page Table Well forget this paging scheme on 32-bit architectures On contemporary 64-bit architectures, multi-MB

page sizes are common again 23 Realistic Paging Scheme 32-Bit Arch. Next try: assume a 32-bit architecture, byte-addressable, 4-byte words, 4 kB page size This time use two-level paging mechanism, consisting of Page Directory and Page Tables in addition to user pages With 4kB page size, again the rightmost 12 bits of any page address are 0 implied All user pages, page tables, and the page directory can fit into identically sized page frames!

24 Realistic Paging Scheme 32-Bit Arch. Mechanism to determine a physical address: Start with a HW register pdbr that points to Page Directory Or else implement the Page Directory as a special-purpose cache Or else place PD into a-priori known memory location But there must be a way to locate the PD, best without a memory access 25

Realistic Paging Scheme 32-Bit Arch. Intel x86 pdbr, AKA CR3; note Intel term for logical address is linear address: 26 Realistic Paging Scheme 32-Bit Arch. Design principle: have user pages, Page Tables, and Page Directory look similar, for example, all should consume one integral page frame each Every logical address is broken into three parts: 1. 2. 3.

two indices, generally 10 bits on a 32-bit architecture with 4 kB page frames one offset, generally 12 bits, to locate any offset within a 4 kB page total adds up to 32 bits The Page Directory Index is indeed an index; to find the actual entry in PD, first << 2 (or * by 4) Then add this number to the start address of PD; thus a PD entry can be found 27 Realistic Paging Scheme 32-Bit Arch Similarly, Page Table Index is an index; to find the entry in a page table: left-shift by two (<< 2) and

add result to the start address of PT found in previous step; thus an entry in the PT is found PT entry holds the address of the user page, yet the rightmost 12 bits are all implied, are all 0, hence dont need to be stored in the page table The rightmost 12 bits of a logical address define the page offset within the found user page Since pages are 4 kB in size, 12 bits suffice to identify any byte in a page Given that the address of a user page is found, add the offset to its address, the final byte address (physical address) is thus identified 28 Realistic Paging Scheme 32-Bit Arch

PD Index PT Index Page Offset Logical LogicalAddress Address pdbr pdbr User UserPages Pages

Page PageDirectory Directory Page PageTables Tables 29 Realistic Paging Scheme 32-Bit Arch Disadvantage: Multiple memory accesses, total up to 3; can be worse if page-replacement algorithm also has to search through memory In fact, can become way worse, since any of these

total 3 accesses could cause page faults, resulting in 3 swap-ins, with disc accesses and many memory references; it is a good design principle never to swap out the page directory, and rarely if everthe page tables! Some of these 3 accesses may cause swap-out, if a page frame has to be located, making matters even worse Performance loss could be tremendous; i.e. several decimal orders of magnitude slower than a single memory access 30 Realistic Paging Scheme 32-Bit Arch Thus some of these VMM-related data structures

should be cached In all higher-performance architectures e.g. Intel Pentium Pro a Translation Look-Aside Buffer (TLB) is a special purpose cache for PD and PT entries Also possible to cache the complete PD, since it is contained in size (4 KB) Conclusion: VMM via paging works only, if locality is good; another way of saying this, paged VMM works well, only if the working set is large enough; else there will be thrashing 31 Typical PD and PT Entries

Fundamental assumption (design requirement) is that all PD and PT entries be 4 bytes long, 32 bits Entries in PD and PT need 20 bits for address; the lower 12 bits are implied zeros; no need to store! The left-over 12 bits (beyond the 20) in any page PD or PT entry can be used for additional, crucial information; for example: P-Bit, AKA present bit, indicating, is the referenced page present? (aka resident) Modified bit, has page experienced a store? AKA dirty bit

R/W/E bit: Can referenced page be read, written, executed, all? User/Supervisor bit: OS dependent, is page reserved for privileged code? 32 Typical PD and PT Entries Typically the P-Bit is positioned for quick, easy access: rightmost bit in 4-byte word Some information is unique to PD or PT For example, a user page may be shared (global bit set in PT), but a PT may never be sharable between multiple processes; hence no global bit needed in PD On systems with varying page sizes, page

size information must be recorded See examples below: 33 Typical PD and PT Entries Page Directory Entry Unused -- Bits Page Table Address Accessed Bit User/Supervisor Read/Write Present Bit

34 Typical PD and PT Entries If the Operating System exercises a policy to not ever swap-out Page Tables, then the respective PT entries in the PD do not need to track the fact that a PT was modified (i.e. the Dirty bit) Hence there may be reasons that the data structures for PT and PD entries exhibit minor differences But in general, the format and structure of PTs and PDs are very similar, and the size of user pages, PTs and PDs are identical

35 Typical PD and PT Entries Page Table Entry User Page Address Dirty Bit Accessed Bit User/Supervisor Read/Write Present Bit 36 Page Replacement Using FIFO

When a page needs to be swapped-in or created for the first time, it is placed into an empty page frame How does the VMM find an empty page frame? If no free frame is available, another existing page needs to be removed out of its page frame. This page is referred to as the victim page The victim page may even be pirated from another processes set of page frames; but this is an OS decision, the user generally has no such control! If the replaced page the victim page was modified, it must be swapped out before its frame can be overwritten, to preserve the recent changes Priority for VM to locate victim page that is not dirty 37

Page Replacement Using FIFO Otherwise, if the page is unmodified, it can be overwritten without swap-out, as an exact copy exists already on mass storage; yet it must be marked as not present in its PT entry The policy used to identify a victim page is called the replacement policy; the algorithm is named the replacement algorithm Typical replacement policies are the FIFO method, the random method, or the LRU policy; similar to replacing cache lines in a data cache When storing the age, it is sufficient to record relative ages, not necessarily the exact time when it was created or referenced! Again, like in a data cache line

38 Page Replacement Using FIFO For example, LRU information may be recorded implicitly by linking user pages in a linked-list fashion, and removing the victim at the head, adding a new page at the tail MAY cause many memory references, making such a method of repeated lookups in memory impractical! 39 FIFO Sample, Track Page Faults

In the two examples below, the numbers refer to 5 distinct page frames that shall be accessed. We use the following reference string of these 5 user pages; stolen with pride from [7]: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Well track the number of page faults and hits, caused by an assumed FIFO replacement algorithm Strict FIFO age tracking requires an access to page table entries for each page reference! See later some significant, even crude yet effectivesimplifications in Unix! 40 Page Replacement Using FIFO Sample 1: In this first example, we use 3 physical page frames for placing 5 logical pages:

Track the number of page faults, given 3 frames! Cleary, when a page is referenced for the first time, it does not exist, so the memory access will cause a page fault automatically Page accesses causing a fault are listed in red frame 0 Page frame 0 Page frame 1 Page frame 2 User 1 4 5 2 1 1 3 2 2

page placed into page frame 5 3 4 41 Page Replacement Using FIFO We observe a total of 9 pages faults for the 12 page references in Sample 1 Would it not be desirable to have a smaller number of page faults? With less faults, performance will improve, since less swap-in and swap-out activity would be required

Generally, adding computing resources improves SW performance 42 Page Replacement Using FIFO Sample 2: Run this same page-reference example with 4 page frames now; again faults listed in red: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 frame 0 Page Page Page Page

frame frame frame frame 0 1 2 3 1 2 3 4

User page placed into page frame 1 5 4 2 1 5 2 3 43 Page Replacement Using FIFO A strange thing happened: Now the total number of page faults has increased to 10, despite having had more resources: We now have one additional page frame!

Phenomenon is known as the infamous Belady Anomaly 44 Appendix: Some Definitions 45 Definitions Alignment Attribute of some memory address A, stating that A must be evenly divisible by some power of two

For example, word-aligned on a 4-byte, 32bit architecture means, an address is divisible by 4, or the rightmost 2 address bits are both 0 46 Definitions Demand Paging Policy that allocates a page frame in physical memory only if an address on that page is actually referenced (demanded) in the executing program 47

Definitions Dirty Bit Single bit data structure that tells whether the associated page was written after its last swap-in (or creation) 48 Definitions Global page A page that is used in more than one program Typically found in multi-programming environment with shared pages

49 Definitions Logical Address Address as defined by the architecture; is the address as seen by the programmer or compiler Synonym: virtual address and on Intel architecture: linear address Antonym: physical address 50 Definitions Overlay

Before advent of VMM, programmer had to manually relocate information out, to free memory for next data This reuse of the same data area was called: to overlay memory 51 Definitions Page A portion of logical addressing space of a particular size and alignment Start address of a page is an integral multiple of the page size, thus it also is page-size aligned

A logical page is placed into a physical page frame Antonym: Segment 52 Definitions Page Directory A page holding a list of addresses for Page Tables Typically this directory consumes an integral number of pages as well, ideally exactly one page In addition to page table addresses, each entry also contains information about

presence, access rights, written to or not, global, etc. similar to Page Table entries 53 Definitions Page Directory Base Register (pdbr) HW resource (typically a machine register) that holds the address of the Page Directory page Due to alignment constraints, it may be shorter than 32 bits on a 32-bit architecture Bits are saved due to defined alignment constraint See Intel x86 CR3, AKA pdbr:

54 Definitions Page Fault Logical address references a page that is not resident Consequently, space must be found for the referenced page, and that page must be either created or else swapped in if it has existed before Space will be placed into a page frame 55 Definitions

Page Frame A portion of physical memory that is aligned and fixed in size, able to hold one page; doesnt always hold a page of info It starts at a boundary evenly divisible by the page size Total physical memory should be an integral multiple of the size of page frames; else we see fragmentation Logical memory is an integral multiple of page size by definition 56 Definitions Page Table Base Register (ptbr)

Rare! Resource (typically a machine register) that holds the address of the Page Table Used in a single-level paging scheme In dual-level scheme use pdbr 57 Definitions Page Table A list of addresses of user Pages Typically each page table consumes an integral number of pages, e.g. 1 In addition to page addresses, each entry also contains information about presence,

access rights, dirty, global, etc. similar to Page Directory entries 58 Definitions Physical Memory Main memory actually available physically on a processor Antonym or related: Logical memory Historically, small physical memories were a driving force behind the development of paged VMM Since advent of 64-bit computing, the driving force for VMM now is impracticality

of all memory being physically available 59 Definitions Present Bit Single-bit HW data structure that tells, whether the associated page is present or not If not, it may be swapped out onto disk, or perhaps must be created for the first time Synonym: resident 60 Definitions

Resident (adj.) Attribute of a page (or of any memory object) referenced by executing code: Object is physically in memory, then it is resident Else if not in memory, then it is nonresident 61 Definitions Swap-In Transfer of a page of information from secondary storage to primary storage, fitting into a page frame in memory Physical move from disk to physical

memory Necessary requirement that page frame exist Antonym: Swap-Out 62 Definitions Swap-Out Transfer of a page of information from primary to secondary storage; from physical memory to disk Antonym: Swap-In Caused by need to evict. Eviction generally caused by a process exceeding its limited

budget of page frames, and asking for new frame 63 Definitions Thrashing Excessive amount of swapping When this happens, performance is severely degraded This is an indicator for the working set being too small Cause can be a memory-greedy process or an inappropriate grant of page frames defined by OS fiat!

64 Definitions Translation Look-Aside Buffer Special-purpose cache for storing recently used Page Directory and Page Table entries Physically very small, yet effective Lovingly referred to as tlb 65 Definitions Virtual Contiguity

Memory management policy that separates physical from logical memory In particular, virtual contiguity creates the impression that two logical addresses n and n+1 are adjacent to one another in main logical memory, while in reality they are an arbitrary number of physical locations apart from one another Usually, they are an integral number (positive or negative) of page-size bytes apart, plus 1 66 Definitions Virtual Memory

Memory management policy that separates physical from logical memory In particular, virtual memory can create the impression that a larger amount of memory is addressable than is really available on the target It can also create the impression that 2 logical addresses A and A+1 are adjacent to one another, AKA contiguous 67 Definitions Working Set The Working Set of page frames is that number of allocated physical frames to

guarantee that a program executes without thrashing Working set is unique for any piece of software; moreover it can vary by the input data for a particular execution of such a piece of SW OS responsible for tracking number of page faults; if needed, for increasing working set 68 Bibliography 1. Denning, Peter: 1968, The Working Set Model for Program Behavior, ACM Symposium on Operating Systems Principles, Vol 11, Number 5, May 1968, pp 323-333 2. Organick, Elliott I. (1972). The Multics System: An Examination

of Its Structure. MIT Press 3. Alex Nichol, Virtual Memory in Windows XP 4. Multics history: ttp:// 5. KDF-9 development history: tric/EnglishElectric.KDF9.1961.102641284.pdf 6. Wiki paged VMM: %20Management/MM10-paging.html 7. Silberschatz, Abraham and Peter Baer Galvin: 1998, Operating Systems Concepts, Addison Wesley, 5th edition 8. Intel x86 control registers, including one with pdbr function, known as CR3: 69

Recently Viewed Presentations

  • Work, Family and Academia: Leaning in? Backing out? Lying down?

    Work, Family and Academia: Leaning in? Backing out? Lying down?

    Backing out? Lying down? Lisa Wolf-Wendel - University of Kansas. Work, family, and academe…what comes to mind? Work, family, academe, and KU. What comes to mind??
  • What You Need to Know Michael Corso Director of Financial Aid

    What You Need to Know Michael Corso Director of Financial Aid

    Calculated using data from a federal application form and a federal formula. What is Financial Need? Will.Power. Cost of Attendance -Expected Family Contribution =Financial Need ... Some will be unable to use IRS DRT. Examples include: Filed an amended tax...
  • How to Help Your Child Be Successful in Middle School

    How to Help Your Child Be Successful in Middle School

    Parent Basics-Academic Support Establish homework routine: check agenda, check websites, provide quiet place-there is always homework! Check ParentVue/Google Classrooms- do this WITH your child-you can see what they need to be doing
  • Community Outreach Programme: Taking the Ncr and The Office ...

    Community Outreach Programme: Taking the Ncr and The Office ...

    Heinrich Magerman. Community Development Worker Programme (CDWP) was introduced in 2003 by National Government to help bridge the gap between government and the community. In Western Cape the CDWP evolved out of LGSETA Sponsored Learnership in 2005 and 2006. 200...
  • Social Choice Lecture 19 John Hey Voting  This

    Social Choice Lecture 19 John Hey Voting This

    The candidate who wins every one of their pairwise contests is the most preferred over all other candidates, and hence the winner of the election. In the event no single candidate wins all pairwise contests, use a resolution method described...
  • Dry forest &amp; woodlands - Outdoor School

    Dry forest & woodlands - Outdoor School

    Research and map - Activity 1 Work with a partner to create a poster depicting the characteristics of a specific outdoor environment. ... (BOLTSS+D) where possible. (Viridians website will be useful here) Dry forest & woodlands Dry forest & woodlands...
  • Globalization Drivers

    Globalization Drivers

    Low High Baked Goods Retail Banking Toothpaste Soft Drinks Automobiles Computers Aircraft Pharmaceuticals Multidomestic Global Global Pursuit of Global Share Follow Global Driver Standardize Interrelated/ Coordinated Multidomestic Based on Interest Duplicate Activities Tailored Independent/ market-by-market Global Multidomestic Global ...
  • Plant Life Cycles -

    Plant Life Cycles -

    Plant Life Cycles Mosses, Ferns, Conifers, and Flowering Plants General Life Cycle of Plants Recall that all plants cycle between two phases during their life Called 'alternation of generations' The two generations are gametophytes and sporophytes Life Cycles of Seed...