CPI file format
There are various descriptions of the CPI file format around the web; this is my attempt at one. The structure names and definitions used are based on those in Andries Brouwer's format documentation. These, in turn, appear to originate from the MS-DOS Programmer's Reference (my copy is for MS-DOS 5: ISBN 1-55615-329-5).
CPI files are used to store fonts allowing devices to display in multiple codepages. They can refer either to screen fonts, or printer fonts. Screen CPI files can hold one or more fonts per codepage - usually, at 8x16, 8x14 and 8x8 sizes. DRDOS screen codepage files also contain an 8x6 font (actually 6x6, but the file headers all say 8x6) which is used by ViewMAX screen drivers.
According to this blog comment by Larry Osterman, one of the developers of MSDOS, NLS functions were ported to PC-DOS by IBM from their mainframe systems. Presumably this included codepages, in which case the CPI file format may be derived from a mainframe file format.
There are three main CPI format variants -- FONT (used by MSDOS, PCDOS and Windows 9x), FONT.NT (used by Windows NT and its successors) and DRFONT (used by DRDOS screen fonts). There is a file format specification in the MSDOS programmer's reference which covers FONT; I know of no formal specification for FONT.NT or DRFONT. Even in the case of FONT, a bit of expansion and clarification wouldn't come amiss in some places.
In this document (on the principle of being conservative in what you generate and liberal in what you accept) emphasized text indicates restrictions on the file format that you should try to follow when generating a CPI file, but which you shouldn't rely on when reading. It is sometimes followed by a footnote [0] saying which utility has this restriction.
Here's one, for instance: CPI files in FONT format should not exceed 64k in size - use FONT.NT or DRFONT if you need to get more codepages in a file than will fit in 64k [1]. If you know that your CPI file will only be parsed by utilities that understand 32-bit file offsets, you can write CPI files bigger than 64k. Just don't try to use them with, in this case, the PC-DOS 3.3 DISPLAY.SYS. And don't assume that all FONT-format CPI files will be 64k or less.
The principal programs which have to parse CPI files - and on which I've based this specification - are:
- DISPLAY.SYS under DOS (FONT format files)
- MODE.COM under DRDOS (DRFONT format files)
- ViewMAX video drivers under DRDOS (DRFONT format files)
- The Windows NT DOS box (FONT and FONT.NT format files)
All numbers are stored in little-endian format. 'short' is 2 bytes, 'long' is 4 bytes.
Overview
FONT or FONT.NT
FontFileHeader FontInfoHeader CodePageEntryHeader | | | | either | +---> CodePageInfoHeader } | . ScreenFontHeader } | . Screen font bitmaps } Code page body | . ScreenFontHeader } | . Screen font bitmaps } | . ... | . or | +---> CodePageInfoHeader } | PrinterFontHeader } Code page body | Printer font data } v CodePageEntryHeader | | | +---> Code page body ... ...
DRFONT
FontFileHeader DRDOSExtendedFontFileHeader | FontInfoHeader | CodePageEntryHeader | | | | | +---> CodePageInfoHeader } | | ScreenFontHeader } | | ScreenFontHeader } Code page body | | ... } | | Character index table } | v | CodePageEntryHeader | | | | | +---> Code page body | ... ... v Screen font bitmaps
FontFileHeader
A CPI file begins with a fixed header. In theory its size could range from 18 bytes to just over 320k, but in practice its length is always 23 bytes, for two reasons:
- Some utilities hardcode the 23-byte form, and will break if it is not used.
- There are at least two possible ways the header can be expanded beyond 23 bytes - but which one is right?
struct { char id0; char id[7]; char reserved[8]; short pnum; char ptyp; long fih_offset; } FontFileHeader;
- id0
- The first byte of the file is 0xFF for FONT and FONT.NT files, and 0x7F for DRFONT files.
- id[]
- This is the file format, space padded: "FONT ", "FONT.NT" or "DRFONT ".
- reserved[]
- The eight reserved bytes are always zero.
- pnum
- This is the number of pointers in this header. In all known CPI files this is 1; the MS-DOS 5 Programmer's Reference says that "for current versions of MS-DOS" it should be 1. With the count of pointers set to 1, the total header size is 23 bytes. A value of 0 here would result in a degenerate 18-byte CPI file consisting only of the FontFileHeader.
- ptyp
- The type of the pointer in the header. In all known CPI files this is 1; the MS-DOS reference says that "for current versions of MS-DOS" it should be 1. Meanings for other values are presumably not defined.
- fih_offset
- The offset in the file of the FontInfoHeader. In FONT and FONT.NT files, this is usually 0x17, pointing to immediately after the FontFileHeader - though files with other values are known to exist [10]. In DRFONT files, it should point to immediately after the DRDOSExtendedFontFileHeader [2], which for a four-font CPI file puts it at 0x2C.
DRDOSExtendedFontFileHeader
In a DRFONT font, this immediately follows the FontFileHeader.
struct { char num_fonts_per_codepage; char font_cellsize[N]; long dfd_offset[N]; } DRDOSExtendedFontFileHeader;
- num_fonts_per_codepage
- The number of fonts defined by each codepage. This is 4 for the codepages distributed with DRDOS. The DRDOS MODE.COM supports values up to 10, and ViewMAX has no limit at all. The length of the DRDOSExtendedFontFileHeader is 1 plus five times the value in this byte.
- font_cellsize
- This array has num_fonts_per_codepage entries. It lists the size of a character in bytes (in all existing DRFONT files this is equal to the character height) for each font in this file. The original DRDOS EGA.CPI has sizes 6, 8, 14 and 16.
- dfd_offset
- This array also has num_fonts_per_codepage entries. Each entry is the offset, from the start of the file, of the first character bitmap in the corresponding size.
Notes
- Existing utilities treat font_cellsize interchangeably as character size in bytes and character height. If a DRFONT was created with a character width greater than 8, they would not be able to handle it because these two values would be different.
- The order of ScreenFontHeader records in each codepage must match the order of fonts in this header - for example, if the first entry in this header is for a font of height 6, each codepage must start with a height 6 font.
- The original DRDOS CPI files have the smallest font first.
FontInfoHeader
struct { short num_codepages; } FontInfoHeader;
- num_codepages
- This contains a count of codepages in the file. A value of 0 is possible but very uninteresting.
This should immediately follow the FontFileHeader or DRDOSExtendedFontFileHeader [2].
CodePageEntryHeader
The FontInfoHeader is immediately followed by the first CodePageEntryHeader; these form a linked list of codepages that the CPI file implements.
struct { short cpeh_size; long next_cpeh_offset; short device_type; char device_name[8]; short codepage; char reserved[6]; long cpih_offset; } CodePageEntryHeader;
- cpeh_size
- This is the size of the CodePageEntryHeader structure, i.e. 0x1C bytes. Some CPI files have other values here, most often 0x1A. Some utilities ignore this field and always load 0x1C bytes; others believe it.
- next_cpeh_offset
- This is the offset of the next CodePageEntryHeader in the file.
In FONT and DRFONT files, the address is relative to the start of the
file; in FONT.NT files, it is relative to the start of this
CodePageEntryHeader. At least one pathological CPI file is known to
exist where values above 64k are stored as segment:offset rather than a
32-bit pointer (eg: 0x1000abcd rather than 0x0001abcd). The file
EGA.ICE[10] is even worse - all its
pointers, even those below 64k, are stored as apparently arbitrary
segment:offset combinations.
In the last CodePageEntryHeader, the value of this field has no meaning. Some files set it to 0, some to -1, and some to point at where the next CodePageEntryHeader would be. The MS-DOS 5 Programmer's Reference says it should be 0. - device_type
- 1 for screen, 2 for printer. Some printer CPI files from
early DRDOS versions have device_type=1; a suggested workaround
is to check for a device name of
- "4201 "
- "4208 "
- "5202 "
- "1050 "
- device_name
- The ASCII device name. For screens, it refers to the display
hardware ("EGA " for EGA/VGA
and "LCD " for the IBM
Convertible LCD). For printers, it is usually one of:
- "4201 "
- "4208 "
- "5202 "
- "1050 "
- "EPS "
- "PPDS "
- codepage
- This is the number of the codepage this header describes. Traditionally, DOS codepages had 3-digit IDs (1-999) but the number can range from 1-65533 - see the "Code Page Global Identifier" section in IBM's Character Data Representation Architecture. IDs 65280-65533 are 'reserved for customer use' - ie, this is the range to use for user-defined codepages.
- reserved
- The reserved bytes are always zero.
- cpih_offset
- The offset of the CodePageInfoHeader for this codepage. In FONT and DRFONT files, it is relative to the start of the file; in FONT.NT files it is relative to the start of this CodePageEntryHeader. As with next_cpeh_offset, the field is normally treated as a 32-bit pointer but some programs may instead populate it with segment:offset values.
The CodePageInfoHeader for a codepage should immediately follow the CodePageEntryHeader - rather than, for example, all the CodePageEntryHeaders together at the start and then all the CodePageInfoHeaders with their fonts. [3]. This is particularly important in a DRFONT file [4].
The fields next_cpeh_offset and cpih_offset should not point to addresses earlier in the file than this CodePageEntryHeader, for the same reason.
CodePageInfoHeader
At the start of the data block for each codepage is a CodePageInfoHeader:
struct { short version; short num_fonts; short size; } CodePageInfoHeader;
- version
- This is 1 if the following codepage is in FONT format, 2 if it is in DRFONT format. Putting a DRFONT codepage in a FONT-format file will not work. You shouldn't put a FONT codepage in a DRFONT-format file either [5].
- num_fonts
- If this is a screen font, it gives the number of font records that follow. For printer fonts, it should be assumed to be 1; some DRDOS printer CPI files have it wrongly set to 2.
- size
- This is the number of bytes that follow up to the end of this codepage (if version is 1) or up to the character index table (if version is 2).
Printer Fonts
If the CPI is for a printer, the CodePageInfoHeader is followed by:
struct { short printer_type; short escape_length; } PrinterFontHeader;
- printer_type
- This is 1 if the character set is downloaded to the printer, 2 if the printer already has the character set and selects it with escape codes.
- escape_length
- The number of bytes in the escape sequences that follow.
This structure is in turn followed by the printer data. If printer_type is 1, there are two escape sequences; if printer_type is 2, there is one. The first escape sequence selects the builtin code page; the second selects the downloaded codepage. An escape sequence is stored as a Pascal string (the first byte is the length). After the escape sequence(s), any remaining data up to the size given in CodePageInfoHeader are the definition of the font, to be downloaded to the printer.
Screen fonts
If the CPI is for the screen, the CodePageInfoHeader is followed by screen font definitions for each size. In a FONT or FONT.NT file, each entry consists of a ScreenFontHeader followed by the font bitmap; in a DRFONT, just the ScreenFontHeader is provided.
struct { char height; char width; char yaspect; char xaspect; short num_chars; } ScreenFontHeader;
- height
- This is the character height in pixels.
- width
- This is the character width in pixels; in all known CPI files it is 8. Values other than 8 can cause trouble in any font format [6], but particularly in DRFONT fonts [7] and FONT.NT fonts [8].
- yaspect
- Vertical aspect ratio. In all known CPI files this is unused and set to zero.
- xaspect
- Horizontal aspect ratio. In all known CPI files this is unused and set to zero.
- num_chars
- Number of characters in the font. In known CPI files this is always 256. Some utilities may assume that it is 256, and malfunction if it is not.
Except in DRFONT fonts, the bitmap follows the ScreenFontHeader; its
length is num_chars * height * ((width+7)/8)
, and it contains
glyphs for each character in increasing order. Some loaders calculate the
size simply as height * num_chars
, and so will miscalculate if
the width is wider than 8.
Character index table
In a DRFONT, after the ScreenFontHeaders, there follows a table describing where the character bitmaps come from.struct { short FontIndex[256]; } CharacterIndexTable;
The DRDOS utilities assume that there are always 256 entries in this table; so the character count in a DRFONT ScreenFontHeader should always be 256 [9].
Each entry in FontIndex describes the number of the bitmap for the corresponding character in the bitmap tables pointed to by the DRDOSExtendedFontFileHeader. To find the bitmap for a particular letter, take the FontIndex entry, multiply it by the character length in bytes, and add the dfd_offset for the size in question.
To determine the number of characters in bitmap tables in a DRFONT, a program therefore has to walk all FontIndex entries in the file and take the highest value.
Trailing data
Some CPI files don't end immediately after the last font. Usually, what follows is a copyright message (possibly terminated by 0x1A) and/or some zero bytes. The MS-DOS 5 Programmer's Reference says that a CPI file 'always ends with a copyright notice' and that this is at most 0x150 bytes long.
Ambiguities
Among the things that the format seems to support but some or all utilities do not, we find:
FontFileHeader: Multiple pointers
If pnum were to be greater than 1, there are two possibilities for how the extra data would be stored:
struct struct { { char id0; char id0; char id[7]; char id[7]; char reserved[8]; char reserved[8]; short pnum; short pnum; char ptyp[N]; struct { long fih_offset[N]; char ptyp; long fih_offset } pointers[N]; } FontFileHeader; } FontFileHeader;-- that is, either all the types come first and then all the pointers, or types and pointers alternate. The second is backward-compatible, in that programs which only understood the 1-pointer format would be able to follow the first pointer as usual.
FontFileHeader: Pointer types other than 1
ptyp is always 1. What might other values mean?
Codepages for multiple devices
Technically, there's no reason why a CPI file shouldn't hold codepages for multiple devices (eg, each codepage appears three times: once for "EGA", once for "LCD", and once for the "4201" printer). How would utilities handle this?
Backwards pointers
Even if a CPI file can't be streamed because of the order of the records, all the pointers in it will almost certainly point forwards - that is, to bytes further from the start of the file than where the pointer is. What happens if the blocks are so perversely arranged that this is not the case?
In this situation, a FONT.NT file would actually have negative values in its offset fields, and this might cause trouble on systems that treated them as unsigned.
Repetition
How should utilities handle the case of the same codepage appearing multiple times for the same device, or the same font size appearing multiple times within a codepage?
Aspect ratio
What was the aspect ratio intended for? Can the same font size appear multiple times in a codepage if the aspect ratio is different?
Footnotes
These explain the reasons for particular recommendations.
- [0]
- Example footnote
- [1]
- PC-DOS 3.3 DISPLAY.SYS does not seem to be able to handle CPI files larger than 64k.
- [2]
- ViewMAX display drivers and DRDOS MODE both assume that the FontInfoHeader immediately follows the DRDOSExtendedFontFileHeader.
- [3]
- FONT-format CPI files are passed to DISPLAY.SYS using a streaming interface that can seek forward but not back, and therefore objects in a CPI file should be in the order that DISPLAY.SYS would process them.
- [4]
- DRDOS MODE assumes that the CodePageInfoHeader immediately follows the CodePageEntryHeader.
- [5]
- ViewMAX display drivers assume that all fonts in a DRFONT-format file will be DRFONT fonts.
- [6]
- Fonts with a width greater than 8 cause problems with utilities that assume characters are 1 byte wide. Values less than 8 may also cause problems, because it isn't clear whether characters should be left- or right- aligned in the 8-pixel wide character cell. This may be why the 6-pixel fonts in DRDOS describe themselves as 6x8 even though they are actually only 6x6.
- [7]
- ViewMAX display drivers and DRDOS MODE both assume that the height of a character is equal to the number of bytes in its bitmap. This will not be true for characters wider than 8 pixels.
- [8]
- The codepage loader in the Windows NT DOS box rejects fonts whose character width is not 8.
- [9]
- DRDOS MODE assumes that the character index table has 256 entries, but should correctly handle a font with fewer characters. The ViewMAX screen drivers also assume that there are 256 entries, and always try to copy 256 characters. A possible workaround for fonts with fewer than 256 characters is to write the table with 256 entries and set the unused ones to 0.
- [10]
- EGA.ICE (which I found on a
Compaq Concerto laptop, and which was apparently
distributed with MS-DOS 6.0),
is an unusual codepage file in several ways:
- The copyright message is not at the end of the file, but the beginning; it is located between the FontFileHeader and the FontInfoHeader.
- The copyright message reads:
EXEC-NW.CPI Version E3
Therefore "EGA.ICE" is not the original filename, and the file was not created by Microsoft.
437 850 860 861 865
Copyright (c) 1991, AST Europe Ltd. All rights reserved. - All pointers in the CodePageEntryHeader are stored as segment:offset values, whether or not they are below 64k. By way of example, the first CodePageInfoHeader is at file offset B6h, but its pointer is 00090026h (ie, 0090:0026).
- There are four font sizes: 8x8, 8x14, 8x16 and 8x19. The latter is presumably intended to produce a 25-line display in a 640x480 video mode.
John Elliott 2006-10-14