This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Memory management services in the Linux operating are performed by a computer hardware component called Memory Management Unit . The memory management unit translates physical memory addresses to linear addresses. MMU's functions also include translation of virtual addresses to physical addresses , cache control, memory protection and bus arbitration. Microprocessors have built in hardware circuits to improve the ease of memory management and to prevent illegal memory accesses. In this paper only Linux versions 2.6.x and above are considered.
Linux supports both paging and segmentation but segmentation is not used frequently.
Linux prefers paging over segmentation because:
a)Memory management is simpler when all processes share the same set of linear addresses.
b)Some architectures do not support segmentation. ex: RISC
2. Memory Addressing :
There are three different address spaces in Linux:
3) Physical address
The Memory Management Unit transforms a logical address into a linear address using the segmentation unit and the linear address into a physical address using the paging unit. Both the units are hardware components.
Microprocessors perform address translation in two different ways:
1) Real mode : exists mostly to maintain processor compatibility with older models and to allow the operating system to bootstrap
2) Protected mode : Used majority of the time
2.1 Segmentation in Hardware:
All processes, either in User Mode or in Kernel Mode, may use the same logical addresses. A logical address consists of two parts: a segment identifier and an offset that specifies the relative address within the segment. The segment identifier is actually a 16-bit field called the Segment Selector . The processor provides segmentation registers to hold Segment Selectors for easy access of Segmentation .
These segmentation registers are :
1) cs : The code segment register, which points to a segment containing program instructions
2)ss : The stack segment register, which points to a segment containing the current program stack
3)ds : The data segment register, which points to a segment containing global and static data
The remaining three segmentation registers are general purpose and may refer to arbitrary data segments.
Each segment is represented by an 8-byte Segment Descriptor that describes the segment characteristics. Segment Descriptors are stored either in the Global Descriptor Table (GDT ) or in the Local Descriptor Table(LDT).
a)Global Descriptor Table : The GDT holds all the segment descriptors and also the Task State Segment descriptors, Local Descriptor Table descriptors and Call Gate descriptors.
b)Local Descriptor Table : It contain memory segments which are private to every specific program
The segmentation unit in the MMU performs the following operations:
a)Examines the TI field (table indicator field that indicates whether the segment is in the LDT or GDT) of the Segment Selector to determine which Descriptor Table stores the Segment Descriptor.
b)Computes the address of the Segment Descriptor from the index field of the Segment Selector. The index field is multiplied by the size of a Segment Descriptor(usually 8), and the result is added to the content of the global descriptor table register or local descriptor register.
c)Adds the offset of the logical address to the Base field of the Segment Descriptor, thus obtaining the linear address.
2.2 Segmentation in the OS :
It is seldom used as paging is preferred to it.
Linux uses four main segments :
a)user code - used when process is running in user mode
b)user data - used when process is running in user mode
c)kernel code - used when process is running in kernel mode
d)kernel data - used when process is running in kernel mode
Linux Global Descriptor Table:
In uniprocessor systems there is only one GDT, while in multiprocessor systems there is one GDT for every CPU in the system. Each GDT includes 18 segment descriptors and 14 null, unused, or reserved entries. Unused entries are inserted on purpose so that Segment Descriptors usually accessed together are kept in the same 32-byte line of the hardware cache. 
They are :
a)Four user and kernel code and data segments .
b)A Task State Segment (TSS), different for each processor in the system.
c)A segment including the default Local Descriptor Table (LDT), usually shared by all processes.
d)Three Thread-Local Storage (TLS) segments: this is a mechanism that allows multithreaded applications to make use of up to three segments containing data local to each thread.
e)Three segments related to Advanced Power Management (APM) .
f)Five segments related to Plug and Play (PnP ) BIOS services.
g)A special TSS segment used by the kernel to handle "Double fault " exceptions.
Linux Local Descriptor Table:
The kernel defines a default LDT to be shared by most processes.
It includes five entries, but only two of them are effectively used by the kernel:
a)Call gate for iBCS executables
b)Call gate for Solaris /x86 executables
Call gates are used to change the privilege level of the CPU when using a pre-defined function.
2.2 Paging in Hardware:
The basic unit of memory under Linux is a page. For a process to run only the user structure and the page tables are required to be in the memory. The size of a page depends on the processor architecture. Usually the page size in Linux is 4KB in most systems. Some systems like SPARC by Sun Microsystems have a page size of 8KB.
The main issue with having a large page size is that we usually won't need the entire size of the page and it is not very convenient as it wastes space. So to tackle this issue, Linux provides several features:
1) A slab allocator can be used to allocate smaller kernel objects in the kernel.
2) Heap management functions in the Standard C library can be used to allocate memory of any size we want in the kernel mode. An alternative to this is to create your own heap manager on top of Linux Kernel System calls.
The paging unit translates linear addresses into physical ones. It checks the requested access type against the access rights of the linear address. If the memory access is not valid, it generates a Page Fault exception. Each page frame contains a page. The length of a page frame coincides with that of a page. A page frame is a constituent of main memory, and hence it is a storage area. The data structures that map linear to physical addresses are called page tables.
The linear address is divided into 3 fields :
a)Directory b)Field c)Offset
The translation of linear addresses is accomplished in two steps, each based on a type of translation table. The first translation table is called the Page Directory, and the second is called the Page Table. The two-level scheme reduces the memory by requiring Page Tables only for those virtual memory regions actually used by a process.
Linux also supports extended paging which is used to translate large contiguous linear address ranges into corresponding physical ones. The kernel can do without intermediate Page Tables while using extended paging and can thus save memory and preserve TLB entries.
Paging in the OS :
Paging is implemented partly by the kernel and partly by a process called the page daemon.
Once it is started if it finds that the number of pages on the list of free memory pages is too low, it initiates action to free up more pages. Text segments and mapped files are paged to their respective files on disk. Everything else is paged to either the paging partition or one of the fixed-length paging files, called the swap area.
Linux paging model used to have three paging levels. Starting with version 2.6.11, a four-level paging model was used.
The four types of page tables are:
a)Page Global Directory
b)Page Upper Directory
c)Page Middle Directory
The Page Global Directory includes the addresses of several Page Upper Directories, which in turn include the addresses of several Page. Middle Directories, which in turn include the addresses of several Page Tables. Each Page Table entry points to a page frame. So a linear address can be split into up to five parts.
The directory fields are used as an index to the appropriate page directory.one for each process. The value found is a pointer to one of the next level directories, which are indexed by a field from the virtual address. The selected entry in the middle page directory points to the final page table, which is indexed by the page field of the virtual address. The entry found here points to the page needed.
3.1 The Page Replacement Algorithm
Linux tries to keep some pages free so they can be claimed as needed. But in some cases pages will have to be reclaimed by the Operating system. This is done using the Page Replacement Algorithm .
Linux distinguishes between four different types of pages: un-reclaimable, swappable, sync-able, and discardable. Un-reclaimable pages(reserved or locked pages, kernel mode stacks, etc.,) may not be paged out. Swappable pages must be written back to the swap area or the paging disk partition before the page can be reclaimed. Sync-able pages must be written back to disk if they have been marked as dirty. Discardable pages can be reclaimed immediately.
At boot time, init starts the kswapd page daemon (one per each memory node) and configures them to run periodically. Each time kswapd awakens, it checks to see if there are enough free pages available, by comparing the low and high watermarks with the current memory usage for each memory zone. If there is enough memory, it goes back to sleep. If the available memory for any of the zones falls below a threshold, kswapd initiates the page frame reclaiming algorithm.
During each run, only a certain target number of pages is reclaimed, typically 32.
The number of reclaimed pages and the total number of scanned pages are configurable parameters. Each time PFRA executes, it first tries to reclaim easy pages, then proceeds with the harder ones.
a)Discardable and unreferenced pages can be reclaimed immediately by moving them onto the zone's free list.
b) Next it looks for pages with backing store which have not been referenced recently, using a clock-like algorithm.
c) Then shared pages that none of the users seem to be using .
d) Ordinary user pages are searched next, and if chosen to be evicted, they must be scheduled for write in the swap area.
e) Finally, if a page is invalid or absent from memory, shared, locked in memory, or
being used for DMA, it is skipped.
There is a loop in the algorithm which scans through each zone's active and inactive lists, trying to reclaim different kinds of pages, with different urgencies.
PFRA maintains two flags per page:
b)referenced or not.
These two flags encode four states. During the first scan of a set of pages, PFRA first clears their reference bits. If during the second run over the page it is determined that it has been referenced, it is advanced to another state from which it is less likely to be reclaimed. Otherwise, the page is moved to a state from where it will more likely to be evicted.
Pages on the inactive list, which have not been referenced since the last time they were inspected, are best candidates for eviction. These correspond to pages with both PG_active and PG_referenced equal to zero However, if necessary, pages may be reclaimed even if they are in some of the other states. These are represented by the refill arrows in the state diagram.
The reason PRFA maintains pages in the inactive list although they might have been referenced, is to prevent certain specific situations like when a process makes periodic accesses to different pages within a 1-hour period. A page accessed since the last loop will have its reference flag set. Since it will not be needed again for the next hour, there is no reason not to consider it as a
candidate for reclamation.
States of the PFRA algorithm : pfra.jpg
4. Dynamic Memory Management :
Dynamic memory management in Linux as handled by the kernel can be split into two main parts namely, how it handles contiguous memory and how it handles non-contiguous memory.
4.1 Handling Contiguous Memory :
There are two main techniques used to handle contiguous memory.
They are :
1) Page Frame Management 2) Memory Area Management
4.1.1 Page Frame Management :
Linux maintains an array of page descriptors, of type page for each physical page frame in the system, called mem_map. Page Descriptors are used to keep track of the current state of each page frame. State information of the page frame is maintained in them. Each page descriptor contains a pointer to the address space it belongs to. They are used to determine if a page frame is free or not and to distinguish between frames used to contain pages that belong to processes and those that have kernel code or kernel data structures.
Linux also supports Non-uniform Memory Access(NUMA) model in which the access time for different memory locations on the CPU may vary. So to tackle this physical memory is partitioned into nodes. Access time within each node is the same but access times across different nodes may be different. The kernel tries to minimize the access to the costlier nodes.
126.96.36.199 Zones :
The memory inside each node is divided into several zones. This is done to deal with the hardware constraints of 80x86 architecture. The constraints are:
a) DMA processor can address only the first 16 MB.
b) The CPU can't directly access all the physical memory because linear address spaces are too small.
There are three major zones :
They are pages that can be used for DMA operations. It contains page frames of memory below 16 MB.
They are pages that are mapped normally. It contains page frames that are greater than or equal to 16 MB and less than 896 MB.
These are pages with high memory addresses that are not permanently mapped. It contains page frames that are at or above 896 MB.
188.8.131.52 Reserved Page Frames :
Memory allocation requests can be satisfied in two different ways. If enough free memory is available, the request can be satisfied immediately. Otherwise, some memory needs to be reclaimed and the kernel control path that made the request is blocked until additional memory has been freed. Some kernel control paths cannot be blocked while requesting memory (ex : when handling an interrupt or when executing code inside a critical region). In these cases, a kernel control path should issue atomic memory allocation requests
An atomic request never blocks, if there are not enough free pages, the allocation simply fails.
Although there is no way to ensure that an atomic memory allocation request never fails, the kernel tries to minimize this. The kernel reserves a pool of page frames for atomic memory allocation requests to be used only on low-on-memory conditions for this purpose.
184.108.40.206 Zone Page Frame Allocator :
"The kernel subsystem that handles the memory allocation requests for groups of contiguous page frames is called the zoned page frame allocator ". 
The zone allocator receives the requests for allocation and de-allocation of dynamic memory. In the case of allocation requests, the component searches a memory zone that includes a group of contiguous page frames that can satisfy the request. Inside each zone, page frames are handled by a component named buddy system. To get better system performance, a small number of page frames are kept in cache to quickly satisfy the allocation requests for single page frames . There are two caches for each memory zone and for each CPU: a hot cache , which stores page frames whose contents are likely to be included in the CPU's hardware cache, and a cold cache.
220.127.116.11 Mapping in High Memory Regions :
The kernel uses three different mechanisms to map page frames in high memory. They are called permanent kernel mapping, temporary kernel mapping and non-contiguous memory allocation.
Permanent kernel mapping allow the kernel to establish long-lasting mappings of high-memory page frames into the kernel address space. They use a dedicated Page Table in the master kernel page tables .
Temporary kernel mappings are simpler to implement than permanent kernel mappings. They can be used inside interrupt handlers and deferrable functions, because requesting a temporary kernel mapping never blocks the current process. Every page frame in high memory can be mapped through a window in the kernel address space. A Page Table entry is reserved for this purpose.
18.104.22.168 Zone Allocator :
"The zone allocator is the frontend of the kernel page frame allocator".  It must locate a memory zone that includes a number of free page frames large enough to satisfy the memory request.
Its main goals are :
1)It should protect the pool of reserved page frames .
2)It should trigger the page frame reclaiming algorithm when memory is scarce and blocking the current process is allowed; once some page frames have been freed, the zone allocator will retry the allocation.
3)It should preserve the small, precious ZONE_DMA memory zone.
22.214.171.124 The Buddy System Algorithm:
Linux uses a different buddy system for each zone. The kernel uses this algorithm for allocating groups of contiguous page frames. The biggest issue that this algorithm needs to handle is external fragmentation . As a result, it may become impossible to allocate a large block of contiguous page frames, even if there are enough free pages to satisfy the request.
Block allocation through this algorithm is done in the following way. All free page frames are grouped into 11 lists of blocks that contain groups of 2^n 1<=n<=10 contiguous page frames, respectively. "The largest request of 1024 page frames corresponds to a chunk of 4 MB of contiguous RAM."
The algorithm checks first to see whether a free block exists in the list specified in the request. If there is no such block, the algorithm looks for the next larger free block. If such a block exists, the kernel allocates the required number of page frames from the bigger block to satisfy the request and inserts the remaining page frames into the list of smaller frame blocks. If even that block size is not enough it looks for the next larger block. If such a block exists, it allocates the required number of frames to satisfy the request, and then inserts the rest after it into the lower size lists in decreasing order of size. (i.e. if 384 is left then first 256 frames go into the list of free 256-page-frame blocks and the remaining 128 go into the list of free 128-page-frame-blocks. Two blocks are considered buddies if both blocks have the same size.
Releasing blocks of page frames is done by the kernel in which it attempts to merge pairs of free buddy blocks of size x together into a single block of size 2x. They are located in contiguous physical addresses. The physical address of the first page frame of the first block is a multiple of 2 * x * 2^12 . The algorithm is iterative. If it succeeds in merging released blocks, it doubles x and tries again so as to create even bigger blocks.
4.2 Memory Area Management :
This part deals with sequences of memory cells having contiguous physical addresses and an arbitrary length. Data structures that describe how small memory areas are allocated within the same page frame are used. However internal fragmentation can be caused by a mismatch between the size of the memory request and the size of the memory area allocated to satisfy the request. To resolve this issue memory areas whose sizes are geometrically distributed and the size depends on a power of 2 rather than on the size of the data to be stored. Internal fragmentation is always smaller than 50 percent by doing this. The kernel creates 13 geometrically distributed lists of free memory areas whose sizes range from 32 to 131, 072 bytes. The buddy system is invoked both to obtain additional page frames needed to store new memory areas and, conversely, to release page frames that no longer contain memory areas. A dynamic list is used to keep track of the free memory areas contained in each page frame.
4.2.1 Slab Allocator :
The slab allocator takes blocks using the buddy algorithm but then carves slabs(smaller units)
from them and manages the smaller units separately. This is done to reduce internal fragmentation.
The kernel relies on object caches to create and destroy objects of a certain type. These caches consist of pointers to one or more slabs which can store a number of objects of the same type. Each of the slabs may be full, partially full, or empty.
When the kernel needs to allocate a new process descriptor/task it looks in the object cache for task structures, and first tries to find a partially full slab, and allocate a new task object there. If no such slab is available, it looks through the list of empty slabs. Finally, if necessary, it
will allocate a new slab, place the new task structure there, and link this slab with the task structure object cache.
Caches divided into two types: general and specific. General caches are used only by the slab allocator for its own purposes, while specific caches are used by the remaining parts of the kernel.
A newly created cache does not contain a slab . New slabs are assigned to a cache only when both of the following are true:
a)A request has been issued to allocate a new object.
b)The cache does not include a free object.
Slabs can be destroyed in two cases:
a)There are too many free objects in the slab cache.
b)A timer function invoked periodically determines that there are fully unused slabs that can be released.
4.2.2 Memory Pools :
Memory pool allows kernel components such as the block device subsystems to allocate some dynamic memory to be used only in low-on-memory emergencies. If dynamic memory becomes so less that all usual memory allocation requests will fail, the kernel component can invoke special memory pool functions that use the reserve and get the memory needed.
4.3 Noncontiguous Memory Area Management:
Linux uses noncontiguous memory areas for many purposes like to allocate data structures for active swap, to allocate space for a module, to allocate buffers to some I/O drivers.
This is used when the requested memory needs to only be contiguous in virtual space, but not in physical memory. One exception where this cannot be used are with devices that do not understand virtual addresses. Its use results in performance degradation, and so is primarily used for allocating large amounts of contiguous virtual address space, such as for dynamically inserting kernel modules. The main advantage is to avoid external fragmentation, while the disadvantage is that it is necessary to work with the kernel Page Tables.
All fragmentation issues have already been handled by Linux . The memory management schemes are pretty perfect as they are. No other known issues need to be handled. Hence all the main aspects of memory management in Linux have been discussed to a reasonable extent.