UNDOCUMENTED CORNER

Understanding Pentium's 4-MB Page Size Extensions

Robert R. Collins

Robert is a design verification manager at Texas Instruments' Microprocessor Design Center. He can be reached via e-mail at rcollins@ti.com.


It's been more than three years since Intel first published the Pentium Family User's Manual. The Manual omitted discussion of some new, advanced programming features. Intel originally planned to release this information in its manuals, but instead, put this information in a document commonly referred to as "Appendix H" (formally known as the Supplement to the Pentium Processor User's Manual) and required recipients to sign a 15-year nondisclosure agreement (NDA). This decision has been the focus of a controversy concerning Intel's right to protect its intellectual property versus the rights of all programmers to have access to information that will benefit their programs. Another point of contention is the NDA itself. Intel claims that anybody needing this information will never be denied it, as long as they sign the NDA. But several stories have circulated regarding programmers being denied because Intel claims they don't need the information. This has spawned a community of programmers dedicated to reverse engineering these features and publishing their findings on Internet newsgroups and the World Wide Web. But is all of this necessary?

Intel has promised that the not-yet-released Pentium Pro Processor Family Developer's Manual will contain information on many of these advanced features, perhaps even a description of 4-MB paging.

Four-MB paging allows the operating system to access very large data structures without constantly referencing the Translation Lookaside Buffer (TLB), which is used by the processor to cache virtual-to-physical address translations for the most recently used pages of memory. This feature is most useful to operating-system developers who want a single page of memory dedicated to the OS kernel or a large data structure, such as a video-frame buffer. Information about 4-MB paging has been publicly documented by Intel--but you need to know where to look to find it. In order to get a complete description of Pentium's 4-MB pages, you need to read both the Pentium Family User's Manual, Volume 3 (P/N 241430) and the i860TM XP Microprocessor Data Book (P/N 240874).

In the Pentium manuals, there are at least nine references to 4-MB pages. This is a good start to reverse engineering 4-MB pages. These references give you the necessary clues to write software that unlocks the secrets of page-size extensions (PSE). However, such an effort is unnecessary. The Intel i860 XP processor documentation claims the i860 XP is page-level compatible with the Intel 386, Intel 486, and Pentium processors. This compatibility is noteworthy because the i860 XP also supports 4-MB pages, and its documentation provides a complete description of the 4-MB paging mechanism (see i860TM XP Microprocessor Data Book, section 2.4). All that's needed to obtain an Appendix H description of 4-MB pages are a few references from the Pentium manuals and the description of 4-MB pages from the i860 XP manual.

A 4-KB Page Backgrounder

When paging is enabled, linear addresses (program-visible addresses) are mapped to physical addresses (bus addresses). Paging makes it possible to execute programs much larger than the computer's available amount of memory. When the microprocessor needs more memory, it generates a page fault to demand that a portion of memory be swapped between the hard disk and main memory. Memory is partitioned into contiguous blocks, called "page frames." Each page frame is 4 KB. The Pentium paging mechanism consists of the following:

The PDBR is CR3, and points to the base of the page directory. Each page-directory entry (PDE) points to the page tables for 4 MB of memory. The PDE contains control information and the pointers to the page tables. Like the PDE, each page-table entry (PTE) contains control information, but points to a 4-KB page frame.

Linear addresses are converted to physical addresses by using a 20-bit pointer in a page table and combining it with the low-order 12 bits of the linear address to form a 32-bit physical address. For purposes of conversion, the linear address is broken into three parts:

The upper 20 bits of the PTE are then combined with the low-order 12 bits of the linear address to form the physical address. There is a direct relationship between the sizes of these three fields and the page size. The lower 12 bits can address 212, or 4 KB of memory. Hence, each PTE controls 4 KB of memory. The amount of memory controlled by each PDE is determined by the number of address bits used as an index into the page table, plus the number of bits used as the page-frame index. The PTE index is 10 bits, and the page-frame index is 12 bits, making 222, or 4 MB of memory controlled by each PDE. This association will be important in understanding the 4-MB paging mechanism.
Figure 1 shows how linear addresses are translated to physical addresses for 4-KB pages.

Making the Jump to 4-MB Pages

With an understanding of the 4-KB paging mechanism, it's not difficult to deduce the 4-MB paging mechanism. Recall that each page-directory entry controls 4 MB of memory. Now imagine how Figure 1 would look if the page-table lookup were eliminated. The page-frame index would increase from 12 bits to 22 bits, thus allowing direct control of a 4-MB page size. The 20-bit pointer in the page directory would be reduced to a 10-bit pointer, pointing directly to the 4-MB page frame of memory. With the page-table lookup eliminated, the page directory points directly to a 4-MB page frame. This describes how 4-MB pages are implemented in the i860 XP (i860TM XP Microprocessor Data Book, section 2.4). But the question remains: Are 4-MB i860 XP pages compatible with 4-MB Pentium pages? To answer that question, we need to compare the i860 and Pentium manuals.

The i860 manual claims that the i860 4-KB paging mechanism is compatible with the x86 implementation. A comparison of page-directory format and page-table format substantiates this claim. The page-size (PS) bit of the i860 page directory shares the same location as the Pentium's PS bit (see i860TM XP Microprocessor Data Book, Figure 2.13). With this information, you can assume they are compatible, and look more closely at the Pentium manual for the mechanics of enabling and using 4-MB pages.

Volume 3 of the Pentium manual describes how CR4.PSE enables PSEs and 4-MB pages, but refers you to Appendix H for more information. Later in the Pentium manual, bit 7 of the PDE is identified as the PS bit. Without CR4.PSE=1, the Pentium will always use Intel 486-compatible (4-KB) paging, regardless of the setting of the PDE.PS bit. Similarly, when CR4.PSE=1, and PDE.PS=0, Pentium still uses Intel 486-compatible 4-KB pages. But when CR4.PSE=1, and PDE.PS=1, Pentium uses an i860 XP-compatible 4-MB paging translation.

The linear address for a 4-MB page is converted to a physical address in much the same manner as 4-KB pages. However, the access to the page table is omitted. The high-order 10 bits form an index into the page directory. The page directory no longer contains a 20-bit pointer to a page table, but instead contains a 10-bit pointer to the 4-MB page frame of memory. This convention mandates that all 4-MB pages reside on 4-MB boundaries. The 10-bit pointer in the page directory then is combined with the low-order 22 bits of the linear address to form the 32-bit physical address.

Figure 2 describes the 4-MB and 4-KB paging translation mechanism. Ironically, Figure 11-16 in Pentium Processor Family Developer's Manual, Volume 3, 1993 edition, contained a virtually identical picture. Intel obviously recognized the significance of this pictorial representation of 4-MB pages. Subsequent editions of the Pentium manual were substantially modified to remove the visual representation of the 4-MB paging mechanism.

Side-effects and Caveats of 4-MB Pages

There are side-effects and caveats to enabling 4-MB pages. Consider the following excerpt from the Pentium Processor Family Developer's Manual, Volume 3, section 23.2.14.1, which discusses compatibility with previous Intel processors:

A Page Fault exception occurs when a 1 is detected in any of the reserved bit positions of a page table entry, page directory entry, or page directory pointer during address translation by the Pentium processor.

In other words, if any reserved bit in the PDE or PTE is 1, a page fault will occur. This does not occur when CR4.PSE=0, but does when PSEs are enabled (CR4.PSE=1). Every bit in CR4 enables a behavioral extension to the Intel 486 processor. In essence, CR4 bits enable/disable incompatibilities with the Intel 486. Therefore, it is a natural extension of enabling 4-MB pages to enable more rigorous type checking of the PDE and PTE. Unfortunately, even then, the aforementioned reference isn't completely accurate. Setting some reserved bits does generate an exception, while setting others does not. This behavior contradicts the Intel documentation. If the Pentium was originally intended to behave as documented, then the documentation didn't get modified to accurately reflect the correct behavior when relaxed type checking for reserved bits was implemented. Table 1 shows all of the Pentium paging structures. All positions in the PDE and PTE marked as reserved will generate a page-fault exception when CR4.PSE=1. All positions in CR3, the PDE, and PTE marked as "0" are reserved, but don't generate a page fault when CR4.PSE=1. Table 2 describes the meaning of all of the fields listed in Table 1.

It might be tempting to believe that the "page-directory pointer" is another name for the CR3 register. This assumption would be incorrect. Actually, the mention of the page-directory pointer is a mistake. This refers to a paging structure for a new paging feature that was to be implemented in the Pentium. This new paging feature was allegedly implemented in beta silicon, but removed before production, and now appears in the Pentium Pro. I'll discuss this in my next column.

The Intel documentation also doesn't tell the whole story of the error code generated by page faults. When CR4.PSE=1, and a 1 is detected in a reserved-bit position of the PDE or PTE, the page-fault error code indicates that an attempt was made to set a reserved bit in a paging structure. This indication is reflected in bit 3 of the page-fault error code. If set to 1, then an attempt was made to set a reserved bit in the PDE or PTE. In Figure 14-7 of the Pentium Processor Family Developer's Manual, Volume 3, 1993 edition, this behavior was correctly documented, but it was removed in subsequent editions. Table 3 shows an accurate representation of the page-fault error code, as shown in the 1993 edition of the Pentium manual.

TLB Translation

According to the 1995 edition of the Pentium user's manual, the Pentium has one code TLB and two data TLBs (Pentium Processor Family Developer's Manual, Volume 1, 1995 edition, section 33.2.1.2). The data TLBs consist of a 64-entry TLB for 4-KB page translations, and an 8-entry TLB for 4-MB page translation. The code TLB is a single 32-entry TLB which is shared by 4-KB and 4-MB page translations. The 4-MB code pages are cached in multiples of 4 KB. When the Pentium caches a 4-MB code page in the TLB, it initially uses only a single TLB entry. A code access beyond the initial 4 KB of memory associated with this TLB accesses the PDE as if it were a 4-KB page, and is given its own TLB entry.

TLB Invalidation

You'd assume that enabling and disabling 4-MB pages (CR4.PSE) would invalidate the TLB, as writing to CR3 does. However, this does not occur. A potentially dangerous situation arises when a user wants to disable 4-MB pages when a 4-MB page is still cached in the TLB. Suppose the PDEs were modified with a different paging translation and point to a different area of physical memory than the 4-MB pages (this would be natural to assume, as it complies with the whole purpose of paging). Once CR4.PSE is cleared, then any 4-MB TLB entries still cached remain in effect until they are evicted or until the TLB is invalidated. (Once CR4.PSE=0, TLB entries for 4-MB data pages will never get evicted, since they have their own dedicated TLB.) Any subsequent memory (or code) accesses while the old 4-MB TLB still is cached would retrieve incorrect data. Therefore, before 4-MB paging can be disabled, all 4-MB PDEs must be modified back to 4-KB PDEs. Once the PDEs are modified, CR4.PSE can be cleared, or the TLB invalidated (which effectively disables 4-MB paging). Some could consider this a bug, but Intel's documentation states that it's the operating-system writer's responsibility to manage the paging mechanism, including invalidating the TLB (Pentium Processor Family Developer's Manual, Volume 3, section 11.3.5).

Testing the Hypothesis

Now that we have an understanding of 4-MB paging, it should be easy to write characterization code that confirms our hypothesis. To detect whether or not 4-MB pages are implemented in Pentium as they are in the i860 XP, you could follow these steps:

The key to this technique is to read from one location in memory if 4-MB pages work or another location if they don't (so you don't page fault). This approach is demonstrated in 4MPAGES.ASM (see Listing One). Auxiliary subroutines for building paging structures on the Pentium and Pentium Pro processor are available electronically at http://www.x86.org (and from DDJ; see "Availability," page 3).

What to Try Next

You could write more characterization code to prove whether or not any other functional extensions are enabled by setting CR4.PSE. The listings available electronically demonstrate the page-faulting behavior of PSE. I've also included a program that detects the TLB size and associativity. Finally, another program demonstrates that writing any values to CR4.PSE will not invalidate the TLB.

Figure 1: Page translation for 4-KB page sizes.

Figure 2: Page translation for 4-MB and 4-KB page sizes.

Table 1: Structures used in Pentium paging translations (*see descriptions in Table 2).

Table 2: Descriptions of paging extension fields. (Bold fields indicate paging extensions available on the Pentium.)

 
Field         Description

RSV           Reserved. If set (RSV=1) may cause a page fault
                when CR4.PSE=1. Setting this bit only causes a
                page fault during page translation. If the
                referenced page entry is in the TLB, then setting
                this bit, and referencing the page will not cause
                a page fault. If the entry is not in the TLB, or
                gets flushed from the TLB, then the next reference
                to this page will cause a page fault. The page
                fault error code on the stack will have the RSV
                bit set (bit3).
AVL           Available for systems programmer use.
PS*           This bit is always set=1. When set=1, then this
                page directory entry points to a 4-MB page.
PS**          This bit is always clear=0. When clear, then
                this page directory entry points to a page table.
D             Dirty.
A             Accessed.
PCD           Page Cache Disable.
PWT           Page Write Through.
U             User.
W             Writable.
P             Present.

Table 3: Page-fault error code.

Listing One

    page    60,132
;-----------------------------------------------------------------------------
; 4MPAGES.ASM   Copyright (c) 1996  Robert Collins
;       You have my permission to copy and distribute this software for
;       non-commercial purposes.  Any commercial use of this software or
;       source code is allowed, so long as the appropriate copyright
;       attributions (to me) are intact, *AND* my email address is properly
;       displayed. Basically, give me credit, where credit is due, and 
;       show my email address.
;-----------------------------------------------------------------------------
;       Robert R. Collins               email:  rcollins@metronet.com
;       7201 Avalon Dr.
;       Plano, TX  75025
;-----------------------------------------------------------------------------
;-----------------------------------------------------------------------------
; Build instructions:
;   Assembled using Microsoft MASM 6.11.
;   To compile without the makefile:
;   ML /c /DINCLUDEDIR=[YOUR FAVORITE INCLUDE DIRECTORY] 4MPAGES.ASM
;   ML /c /DINCLUDEDIR=[YOUR FAVORITE INCLUDE DIRECTORY] PAGEFNS.ASM
;   LINK /NON 4MPAGES.OBJ PAGEFNS.OBJ;
;-----------------------------------------------------------------------------
;-----------------------------------------------------------------------------
; Assembler directives
;-----------------------------------------------------------------------------
    .xlist          ; disable list file
    .586P
    .ALPHA
;-----------------------------------------------------------------------------
; Include file section
;-----------------------------------------------------------------------------
%   Include INCLUDEDIR\\macros.inc  ; Include macros
%   Include INCLUDEDIR\\struct.inc  ; Include structures
;-----------------------------------------------------------------------------
; Public declarations
;-----------------------------------------------------------------------------
    Public  GDT_PTR,    ext_mem_blocks
;-----------------------------------------------------------------------------
; External declarations
;-----------------------------------------------------------------------------
    Extern  Init4M_Pages        :   Near16
    Extern  Check4M_Pages       :   Near16
    Extern  GetLinear4M     :   Near16
    Extern  GetPDBR         :   Near16
    Extern  PDBR            :   DWord
    .list
;-----------------------------------------------------------------------------
; Dummy segments
;-----------------------------------------------------------------------------
        INTSEG  segment at 0
        int0    dd      ?
        INTSEG  ends
    _DATA segment para public use16 'DATA'
;-----------------------------------------------------------------------------
; Data segment
;-----------------------------------------------------------------------------
    GDT_386 label   fword
    GDT_PTR     Descriptor  <>
    SEL_RMCS    equ $-GDT_386
    GDT_RMCS    Descriptor  <-1,,,9bh,0,>   ; DS Descriptor
    SEL_RMDS    equ $-GDT_386
    GDT_RMDS    Descriptor  <-1,,,93h,0,>   ; DS Descriptor
    SEL_4G      equ     $-GDT_386
    GDT_4G      Descriptor  <-1h,0,0h,93h,8fh,0h> ; 4G Descriptor
    GDT_Len     equ ($-GDT_386) - 1
;-----------------------------------------------------------------------------
; All other data
;-----------------------------------------------------------------------------
    Failure1_Msg    db  "4M page translation didn't work.",CRLF$
    Failure2_Msg    db  "Unknown page translation (this should never occur).",CRLF$
    Passed_Msg  db  "4M page translation behaves as expected.",CRLF$
    ext_mem_blocks  dw  0
    align
    OrigPTE     dd  0
    OrigINT0    dd  0
    OrigSentinal    dd  0
    _DATA   ENDS
    _TEXT   segment para public use16 'CODE'
    ASSUME  CS:_TEXT, DS:_DATA, ES:_DATA, SS:STACK
;-----------------------------------------------------------------------------
; Code starts here
;-----------------------------------------------------------------------------
    _4MPAGES    proc    far
    mov ax,seg STACK        ; setup stack segment
    mov ss,ax
    mov sp,sizeof StackPtr
    xor ax,ax           ; clear it
    pushf
    push    ds          ; save far return on stack
    push    ax
;-----------------------------------------------------------------------------
; Set segments to normal data segment
;-----------------------------------------------------------------------------
    mov ax,seg _DATA        ; get original data segment
    mov ds,ax
    mov es,ax
;-----------------------------------------------------------------------------
; Check that this processor supports 4M pages.
;-----------------------------------------------------------------------------
    call    Check4M_Pages       ; does this processor support 4M pages?
    jnc @F          ; yes, continue
@ErrorExit:
    mov ah,9
    int 21h         ; print message
    mov ax,4c01h        ; set error code
    int 21h
    iret                ; go split, just in case
;-----------------------------------------------------------------------------
; Setup descriptor table
;-----------------------------------------------------------------------------
@@: mov eax,ds          ; make pointer to GDT table
    shl eax,4           ; have physical address of segment
    add eax,offset GDT_386  ; now have physical addr of table
    mov GDT_PTR.Seg_limit,GDT_Len   ; set length
    mov GDT_PTR.Base_A15_A00,ax
    shr eax,10h         ; get other address bits
    mov GDT_PTR.Base_A23_A16,al
    mov GDT_PTR.Access_rights,ah
    mov eax,cs          ; get CS
    shl eax,4           ; now have physical address
    mov GDT_RMCS.Base_A15_A00,ax
    shr eax,10h         ; get other address bits
    mov GDT_RMCS.Base_A23_A16,al
    mov GDT_RMCS.Base_A31_A24,ah
    mov eax,ds          ; get DS
    shl eax,4           ; now have physical address
    mov GDT_RMDS.Base_A15_A00,ax
    shr eax,10h         ; get other address bits
    mov GDT_RMDS.Base_A23_A16,al
    mov GDT_RMDS.Base_A31_A24,ah
;-----------------------------------------------------------------------------
; Initialize page mode
;-----------------------------------------------------------------------------
    call    GetPDBR         ; get address of page directory
    jc  @ErrorExit      ; oops
    mov PDBR,edx        ; save it
;-----------------------------------------------------------------------------
; Read CMOS to determine the amount of extended memory.
;-----------------------------------------------------------------------------
    mov al,18h
    out 70h,al
    IO_Delay
    in  al,71h
    mov ah,al
    mov al,17h
    out 70h,al
    IO_Delay
    in  al,71h
    shr ax,6
    mov ext_mem_blocks,ax
;-----------------------------------------------------------------------------
; Enter protected mode.
;-----------------------------------------------------------------------------
    cli
    lgdt    GDT_386
    mov eax,cr0         ; get control register
    or  al,1            ;
    mov cr0,eax
    push    cs          ; push return selector on stack
    push    offset PMRET        ; set return offset
    JMPFAR  @F,SEL_RMCS
@@: mov ax,SEL_RMDS     ; get DS selector
    mov ds,ax
    mov ax,SEL_4G       ; get GS selector
    mov gs,ax
;-----------------------------------------------------------------------------
; Enable page mode
;-----------------------------------------------------------------------------
    call    Init4M_Pages
    mov ebx,PDBR        ; initialize CR3
    mov cr3,ebx
    mov ebx,cr0         ; get 386 control register
    or  ebx,80000000h       ; set PG bit
    mov cr0,ebx         ; now we're in protected mode
    jmp short @F
    Align
;-----------------------------------------------------------------------------
; This is the body of the test.
;-----------------------------------------------------------------------------
; Save a signature in memory so we can see if 4M pages work as expected.
;-----------------------------------------------------------------------------
@@: mov esi,FARCS
    mov edi,gs:Int0[esi]    ; get original memory contents
    mov OrigSentinal,edi    ; save it
    mov gs:Int0[esi],Signature  ; save signature in memory
    mov edi,gs:Int0     ; get original interrupt vector
    mov OrigINT0,edi        ; save it
;-----------------------------------------------------------------------------
; Modify the PDE for a 4 MB page.
;-----------------------------------------------------------------------------
    mov edx,SEL_4G      ; get selector
    lea eax,Int0[esi]       ; get INT0 offset
    call    GetLinear4M     ; get linear address
    mov dword ptr gs:[edx],87h  ; modify to 4M PDE
;-----------------------------------------------------------------------------
; Save original contents of signature location
;-----------------------------------------------------------------------------
    mov edi,gs:[eax]        ; get original PTE
    mov OrigPTE,edi     ; save it
    mov gs:Int0,edi     ; save it
;-----------------------------------------------------------------------------
; Enable 4M paging.
;-----------------------------------------------------------------------------
    mov ecx,cr3         ; get CR3
    mov ebx,cr4         ; get CR4
    or  bl,PSE          ; enable 4M pages
    mov cr4,ebx
    mov cr3,ecx         ; flush TLB
;-----------------------------------------------------------------------------
; The next memory read will read the signature or the PTE depending upon
; whether 4M paging even works.
;-----------------------------------------------------------------------------
    mov ebp,gs:Int0[esi]    ; try to read from 4M
;-----------------------------------------------------------------------------
; Get out of paging
;-----------------------------------------------------------------------------
    mov ecx,cr3         ; clear TLB by loading
    mov ebx,cr4         ; get PSE
    and bl,not PSE      ; turn off PSE
    mov cr4,ebx
    mov cr3,ecx         ;  CR3 with any value
;-----------------------------------------------------------------------------
; Restore original value to signature location.
;-----------------------------------------------------------------------------
    mov edi,OrigSentinal
    mov gs:Int0[esi],edi    ; restore sentinal
    mov edi,OrigINT0
    mov gs:Int0,edi     ; restore original INT0 handler
;-----------------------------------------------------------------------------
; Split from this program
;-----------------------------------------------------------------------------
    mov cx,SEL_RMDS     ; get DS selector
    mov gs,cx
    mov ecx,cr3         ; clear TLB by loading
    mov ebx,cr0         ; get 386 control register
    and ebx,not 80000001h   ; clear paging bit
    mov cr0,ebx         ;   and store in CR0
    mov cr3,ecx         ;  CR3 with any value
    retf
PMRET:
    mov ax,seg _DATA
    mov ds,ax
    mov gs,ax
;-----------------------------------------------------------------------------
; Determine whether or not our test passed.
;-----------------------------------------------------------------------------
    mov dx,offset Failure1_Msg  ;
    cmp ebp,Signature       ; did we get a bogus signature?
    je  @ErrorExit      ; yep
    mov dx,offset Failure2_Msg
    cmp ebp,OrigPTE     ; was our signature our original PDE?
    jne @ErrorExit      ; nope
    mov dx,offset Passed_Msg
    mov ah,9
    int 21h         ; print message
    mov ax,4c00h        ; set error code
    int 21h
    iret                ; go split, just in case
    _4MPAGES    endp
_TEXT   ends
    STACK segment para public 'STACK'
;-----------------------------------------------------------------------------
; Stack segment
;-----------------------------------------------------------------------------
    StackPtr    db  400h dup (?)
    STACK   ends
    _ZSEG   segment para public 'DATA'
    _ZSEG   ends
    end _4MPAGES