Act Sirius 1 User Group (UK)

MS-DOS NOTES

4.1 MS-DOS PROGRAM LOAD

The operating system core provides no direct means to run user programs.Instead, to run a given program represented by a disc file, the file must be opened and read into memory using the normal system functions. These functions are requested by the user program that is currently running.

The first user program to run is the initialisation routine that follows a system boot, which normally loads and executes the file COMMAND.COM. This is a user program that accepts commands from the console and translates them into system function calls. COMMAND includes the capability to load and execute other program files; when these other programs terminate, COMMAND regains control. Thus COMMAND is responsible for the initial conditions that are present when a program is executed.

A standard set of initial conditions is provided by COMMAND on entry to another program. It is possible for programs other than COMMAND to load and execute program files, and they must also provide the same initial conditions so that a consistent interface may be assumed by the newly executing program.

4.1.1 MS-DOS Base Page Structure (see also Section 4.4)

The MS-DOS Base Page (sometimes called the Program Segment Prefix or PSP), is created when you enter an external command. COMMAND.COM will allocate a memory region to the external program, and will insert the Base Page prior to the origin of this program.

In the memory segment that the program is to load, COMMAND.COM places a Base Page, COMMAND.COM then loads the program at an offset of 100H, and hands over control to the external program. The external program, once its function is complete, hands control back to the operating system by a far JUMP or far RETURN to location zero within the Base Page; the instruction at this location is an INT 20, or return control to MS-DOS. This stage must be executed to allow MS-DOS to recover memory correctly (see Appendix I).

When an external program is loaded, the following conditions are true:

The file control blocks at Base Page locations 5CH and 6CH are created from the first two parameters entered on the command line.

The command line at Base Page location 80H is created from the command line entered AFTER the program filename. The byte at location 80H contains the command line character count, the following bytes contain the raw command line as entered at the keyboard.

The word at offset 6; in the Base Page contains the number of bytes available in the segment.

The contents of register AX are established to reflect the validity of the drive(s) on the command line. Thus the following may be found:

AL = FFH when the first drive letter on the command line was not recognised by MS-DOS.

AH = FFH when the second drive letter on the command line was not recognised by MS-DOS.

The above applies equally to both .EXE and .COM type files. The EXE and .COM files do have differences when they load, and these are described more fully below.

When .EXE files load:

The contents of register DS and register ES are pointing at the Base Page segment address.

The registers CS, IP, SS and SP are initialised to those values passed by the linker.

When .COM files load:

The contents of registers CS, DS, ES and SS are pointing to the Base Page segment address.

The register IP is set at 100H.

The register SP is set the high address in the program segment, or to the base of the transient portion of COMMAND.COM, whichever is the lower. The contents of the word at Base Page offset 6 are decremented by 100H to allow for a stack of that size.

A word of zeros is placed at the top of the stack.

All four segment registers have the same value, and the corresponding absolute memory address is the base of a "program segment". The program is loaded and begins execution at location 100 hex in the program segment. Other assignments in the program segment are:

00-01 Termination point.Contains an interrupt type 20 hex, which returns control to the originating program. Thus a JMP 0 or INT 20H are the normal ways to terminate a program.

02-03 Memory size in paragraphs. End of current allocation block contains the first segment number after the end of memory.

05-09 Far CALL to MS-DOS function dispatcher.

0A-0B Program terminate address as IP and CS.

0E-11 Address as CS and IP.

22-5B Default stack. The stack pointer is initially 5A hex, with a word of zeros on the top. Thus executing a "return" instruction will cause a transfer to location 0 and the program will terminate normally. This stack may be used as-is, or a new one may be set up. Remember that 32 bytes of stack space are required to perform system calls.

5C-67 File Control Block #1, formatted as normal unopened FCB.

6C-77 File Control Block #2, formatted as normal unopened FCB.

80-FF Unformatted parameters. Count of characters on command line; followed by command line entered.

COMMAND prepares the parameter areas from the console input line that specified the program to be executed. For example, if COMMAND sees a line of the form:

this is a request to execute the file <progname>.COM <file1> and <file2> each may or may not include a disc specifier or a file name extension, but in any case they appear in the formatted parameters at 5C hex and 6C hex. In addition, the entire input line after the last letter of <progname> appears in the unformatted parameter area beginning at 81 hex, with the number of characters placed at 80 hex.

Suppose the input line is:

COPY T.BAK B:TEST.ASM

The formatted parameter at 5C hex will contain:

00 "T BAK"

at 6C hex will be:

02 "TEST ASM"

and at 80 hex will be:

17 " T.BAK B:TEST.ASM"

where the 17 is decimal.

Below is a sample "base" page for a type .EXE file for MS-DOS, which executes a program starting at "MAIN". Note the fix up required for DS. When program starts DS/ES point to DOS related base. (Note, this is not required for .COM files ... for them DS, ES, CS are O.K. and IP is at 100h).

CGROUP DGROUP code	NAME GROUP GROUP SEGMENT ASSUME EXTRN PUBLIC	BASEPAGE CODE DATA PUBLIC 'CODE' CS:CGROUP,DS:GROUP MAIN : NEAR End_of_program
Start_of_program	PROC	NEAR
End_of_program:	MOV MOV MOV MOV CALL JMP	BX, DS ;Hold base page segment AX, DGROUP DS, AX ;Fix DS to point to data group Base_page_ptr+2,BX ;Save base page seg. MAIN ;Execute program ;Can jump here to end, if unable to ;RET DWORD PTR [Base_page_ptr]
Start_of_program code data	ENDP ENDS SEGMENT PUBLIC	PUBLIC 'DATA' Base_page_ptr
Base_page_ptr data	DD ENDS end	0 START_OF_PROGRAM

4.2 The Command Processor

4.2.1 Introduction

The command processor supplied with MS-DOS (file COMMAND.COM) consists of three distinctly separate parts:

A resident portion resides in memory immediately below the BIOS (see Section 1.6.1). This portion contains routines to process interrupt types 22H (end address), 23H (CTRL-C handler), 24H (critical error handling) and 27H (end but stay resident), as well as a routine to reload the transient portion if needed. (When a program ends, a checksum determines if the program had caused the transient portion to be overlaid. If so, it is reloaded). Note that all standard MS-DOS disc error handling is done within this portion of COMMAND. This includes displaying error messages and interpreting the reply of Abort, Retry, or Ignore.

An initialisation portion is given control during startup. This section contains the AUTOEXEC file processor setup and also the date prompt routine (used if no AUTOEXEC file is found). The initialisation portion determines the segment address at which programs can be loaded. It is overlaid by the first program COMMAND loads because it is no longer needed.

A transient portion is loaded below the resident portion. This is the command processor itself, containing all of the internal command processors, the batch file processor, and a routine to load and execute external commands (files with filename extensions of .COM or .EXE). This portion of COMMAND produces the system prompt (such as A>), reads the command from the keyboards (or batch file) and causes it to be executed. For external commands, it builds a Program segment Prefix control block, loads the program named in the command into the segment just created, sets the end and CTRL-C exit address (interrupt vectors 22H and 23H) to point to the resident portion of COMMAND, then gives control to the loaded program.

Note: Files with an extension of .EXE which are designated to load into high memory are loaded immediately below the transient portion of COMMAND to prevent the loading process from overlaying COMMAND itself.

Section 4.3 contains information describing the conditions in effect when a program is given control by COMMAND.

4.2.2 Replacing the Command Processor

Though the command processor is an important part of MS-DOS, its functions may not be needed in certain environments. Therefore, it has been designed as a user program to allow its replacement. If you decide to replace it with your own command processor;

Name your program file COMMAND.COM.

The entry conditions are the same as for all .COM programs.

Be sure to set the end and CTRL-C exit addresses in the interrupt vectors and in your own Program Segment Prefix to transfer control to your own code.

You must provide code to handle (and set the interrupt vectors for) interrupt types 22H (end address), 23H (CTRL-C handler), 24H (critical error handling) and if needed 27H (end but stay resident). Your COMMAND.COM is also responsible for reading commands from the keyboard and loading and executing programs, if needed.

4.2.3 Available MS-DOS Functions

MS-DOS provides a number of functions to user programs, all available through issuance of a set of interrupt codes. There are routines for keyboard input (with and without echo and CTRL-C detection), console and printer output, constructing file control blocks, memory management, date and time functions, and a variety of diskette and file handling functions. See MS-DOS Interrupts and Function Calls in Programmer's Toolkit for detailed information.

4.2.4 Diskette/File Management Notes

Through the INT 21H (function call) mechanism, MS-DOS provides methods to create, read, write, rename and erase files. Files are not necessarily written sequentially on diskette - space is allocated one sector at a time as it is needed, and the first sector available is allocated as the next sector of a file being written. Therefore, if considerable file creation and erasure activity has taken place, newly created files will probably not be written in sequential sectors.

However, due to the mapping (chaining) of file sectors via the File Allocation Table, and the fields defined in the File Control Block, any file can be used in either a sequential or random manner. By using the current block and current record fields of the FCB and the sequential disc read or write functions, you can make the file appear sequential - MS-DOS will do the calculations necessary to locate the proper sectors on the diskette. On the other hand, by using the random record field and random disc functions, you can cause any record in the file to be accessed directly - again. MS-DOS will locate the correct sectors on the diskette for you. Among the most powerful functions are the random block read and write functions which allow reading or writing a large amount of data with one function call - this is how MS-DOS loads programs. As above, MS-DOS will handle locating the correct sectors on diskette to provide the image of sequential processing - you need not be concerned about the physical location of data on diskette.

4.2.5 The Disc Transfer Area (DTA)

The Disc Transfer Area (also commonly called "buffer") is the memory area MS-DOS will use to contain the data for all disc reads and writes. This area can be at any location within memory, and should be set by your program. (See function call 1AH).

Only one DTA can be in effect at a time, so it is the program's responsibility to inform MS-DOS what memory location to use before using any disc read or write functions. Once set, MS-DOS continues to use that area for all disc operations until another function call 1AH is issued to define a new DTA. When a program is given control by COMMAND a default DTA has already been established at 80H in the program's Program Segment Prefix large enough to hold 128 bytes.

4.2.6 Error Trapping

MS-DOS provides a method by which a program can receive control whenever a disc read/write error occurs, or when a bad memory image of the file allocation table is detected. When these events occur, MS-DOS executes an INT 24H to pass control to the error handler. The default error handler resides in COMMAND.COM but any program can establish its own by setting the INT 24H vector to point to the new error handler. MS-DOS provides error information via the registers and provides Abort, Retry or Ignore support via return codes. (See MS-DOS Interrupts and Function Calls in the Programmer's Toolkit).

Unlike the end and CTRL-C exit addresses, MS-DOS does not preserve the original contents of the critical error exit address when a program is given control. It is your program's responsibility to preserve the original contents (two words) of the INT 24H vector prior to setting this vector, and to restore the original contents before ending.

4.2.7 General Guidelines

The following guidelines and tips should assist in developing applications using the MS-DOS disc read and write functions.

All disc operations require a properly constructed FCB that the program must supply.

Remember to set the Disk Transfer Area address (function 1AH) before performing any reads or writes to a file.

All files must be opened (or created, in the case of a new file) before being read from or written to. Files which have been written to must also be closed to ensure accurate directory information.

A program may define its own logical record size by placing the desired size into the FCB. MS-DOS then uses that value to determine a record's location within the file. If using the "file size" function call, this field must be set by the calling program prior to the function call. If using the disc read and write routines, the field should be set after opening (or creating) the file but before any read or write functions are used. (Open function sets the field to a default value of 128 bytes).

New files must be created (function call 16H) before they can be written to. This call creates a new directory entry and opens the file.

If the amount of data being transferred is less than one sector (512 bytes), MS-DOS will "buffer" the data for the requesting program in an internal buffer within BIOS. Because there is only one disc buffer, performing less-than-sector-size operations in a random manner or against multiple files concurrently causes MS-DOS to frequently change the contents of the buffer. If such operations are in output mode, this forces MS-DOS to write a partially full sector to make the buffer available for any other diskette operation. Subsequently, the partially full sector would have to be re-read before further data could be written to the file. This is called "thrashing" and can be very time consuming. To remedy this situation, use of the Random block read and write routines is recommended, with a data transfer size as large as possible. (An entire file can be read this way, provided enough memory exists.) This method bypasses the "buffering" described above, by reading or writing directly to or from the DTA for as much of the data as possible. If the file size is not a multiple of 512 bytes, only the last portion of the file (the portion past the last 512-byte multiple) is buffered by MS-DOS.

4.2.8 Examples of Using MS-DOS Functions

This example illustrates the steps necessary for a program named TEST.COM to:

Create a new file named FILE1.

Load and execute a second program named PGM1.COM from the diskette in drive B.

The program is in a file named TEST.COM and was invoked from the keyboard by the command TEST FILE1 B:PGM1.COM.

When the program (TEST) received control the Program Segment Prefix has been set up as shown in section 4.3. The end and CTRL-C exit addresses in the Program Segment Prefix are the ones which the host (calling program) had established and should not be modified - they are restored to interrupt 22H and 23H vectors when this program ends. The FCBs at 5CH and 6CH are formatted to contain file names of FILE1 and PGM1.COM respectively - the first FCB reflects the default drive and the second drive B. The default DTA is set to 80H into the segment (the unformatted parameter area of the Program Segment Prefix).

4.2.9 To Create File FILE1

Because it is known that the data in the FCB at 6CH is needed to load and execute the program whose name it contains in a subsequent step that FCB must be preserved: opening the FCB at 5CH would cause it to be overlayed. The program should:

Copy the FCB at 6CH to an area within itself.

Using the FCB at 5CH call function 11H to be sure FILE1 does not already exist - if it did exist, it would be overwritten by this program.

Assuming it did not exist, create the file (function call 16H) - the file is now open.

Set the FCB current record and random record fields to zero, and the record size field to the desired size.

Build the memory image of the file's data.

Set the DTA to point to the memory image (function call 1AH).

Use the sequential write (15H), random write (22H), or random block write (28H) calls to write the file, ensuring the FCB fields and DTA are set properly for each call. In the case of call 28H (the preferred method) the entire file can be written with one call by setting CX to the number of records to be written (in terms of the FCB record size field).

Close the FCB at 5CH - the directory and file allocation table are updated and any partial data in MS-DOS's disc buffer (if it were performing blocking) are written to disc.

To Load and Execute Program PGM1.COM from drive B.

Assume that the current program (TEST) wished to control the action taken if CTRL-C is entered. (Until now, the CTRL-C address still pointed to COMMAND.COM which would end program TEST if CTRL-C were pressed).

TEST should:

Set the end and CTRL-C exit vectors (call 25H) to point to code within itself (the end address is where the program to be loaded will return when it ends).

Determine where PGM1.COM should reside in memory and set up a segment for it, including a Program Segment Prefix (call 26H). This copies the end and CTRL-C exit addresses just set into the new segment's Program Segment Prefix.

Set the DTA to offset 100H into the just-created segment. (Be sure the DS register contains the correct segment address). This is the offset at which PGM1.COM will be loaded.

Open the FCB that had been copied earlier (for PGM1.COM). The FCB file size field will be filled in by open to a default value of 128 bytes.

Set the FCB record size field to the desired size. (Setting it to 1 is very useful in this case).

Set the CX register to the number of records (based on the record size field) to read. If the record size was set to 1, then the number of records to read does not have to be computed - it can be obtained directly from the FCB file size field. In any case, if the product of the record size field and contents of the CX register are equal to or greater than the file size, then the entire file is read in the following step.

Read the file using the Random Block Read function (call 27H) into the new segment at offset 100H. (See step 3 above). There is no need to close the file since it was not written to.

Prepare the DS, ES, SS and SP registers for the loaded program and push a word of zeros on the top of its stack.

Set the DTA to offset 80H into the new segment.

Give control to the loaded program. (An intersegment jump is ideal, since it does not use stack space). When the called program ends via INT 20H, MS-DOS restores interrupt vectors 22H and 23H from the values in the ending program's Program Segment Prefix (the values established in step 1) and pass control to the end exit address. TEST is now back in control, and can itself issue an INT 20H which will cause its caller (COMMAND.COM) to regain control.

Note: The example above was simplified by not discussing the checking of return codes from the function calls. Nearly all function calls do return exception or error indications, which should be checked by the calling program.

4.3 MS-DOS Diskette Directory

FORMAT builds the directory for each diskette on track 0 sectors 3-10, a total of 4096 bytes. The directory has room for 128 entries, each 32 bytes long. Each directory entry is formatted as follows. (Byte offsets are in decimal).

0-7 Filename. (E5H in byte 0 means this directory entry is not used.)

8-10 Filename extension.

11 File attribute. Contents can be 02H for a hidden file and 04H for a system file. (Both files are excluded from all directory searches unless an extended FCB with the appropriate attribute byte is used). For all other files this byte contains 00H. A file can be designated as hidden when it is created.

12-21 Reserved

22-23 Time


<           24             > <     22         >
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
h  h  h  h  h  m m m m m m s s s s s

24-25 Date the file was created or last updated. The mm/dd/yy are mapped in the bits as follows:

< 25> < 24>
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
y  y  y  y  y  y  y m m m m d d d d d

where:
yy is 0-119 (1980-2099)
mm is 1-12
dd is 1-31

26-27 Starting sector: the relative sector number of the first block in the file. (For file allocation purposes only, relative sector numbers start at 000 with track 0 sector 6. This is in contrast with DEBUG and the absolute disc read write routines, interrupts 25H and 26H which number relative sectors from the beginning of the diskette.)

The relative sector number is stored with the least significant byte first.

28-31 File size in bytes. The first word contains the low-order part of the size. Both words are stored with the least significant byte first.

4.4 MS-DOS Program Segment

When you enter an external command, the COMMAND processor (see also Section 4.1.1) determines the lowest available address (immediately after the character fonts, see Section 1.6.1) to use as the start of available memory for the program invoked by the external command. This area is called the Program Segment.

At offset 0 within the Program Segment. COMMAND builds the Program Segment Prefix control block. (See section 4.1.) COMMAND loads the program at offset 100H and gives it control. (.EXE files can be loaded into high memory just below the transient portion of COMMAND.COM but the Program Segment Prefix will still be in low memory.)

The program returns to COMMAND by a jump to offset 0 in the Program Segment Prefix (The instruction INT 20 is the first item in the control block) by issuing an INT 20, or by issuing an INT 21 with register AH=0

Note: It is the responsibility of all programs to ensure that the CS register contains the segment address of the Program Segment Prefix when ending via any of these methods.

All three methods result in an INT 20 being issued, which transfers control to the resident portion of COMMAND.COM. It restores interrupt vectors 22H and 23H (end and CTRL-C exit addresses) from the values saved in the Program Segment Prefix of the ending program. Control is then given to the end address. (If this is a program returning to COMMAND, control transfers to its transient portion.) If a batch file was in process, it is continued: otherwise, COMMAND issues the system prompt and waits for the next command to be entered from the keyboard.

When a program receives control, the following conditions are in effect.

For all programs:

Disk transfer address (DTA) is set to 80H (default DTA in the Program Segment Prefix).

File control blocks at 5CH and 6CH are formatted from the first two parameters entered when the command was invoked.

Unformatted parameter area at 51H contains all the characters entered after the command name (including leading and embedded delimiters with 80H set to the number of characters).

Offset 6 (one word) contains the number of bytes available in the segment. If the resident portion of COMMAND.COM is within the segment its value is reduced by its size.

Register AX reflects the validity of drive specifiers entered with the first two parameters as follows:
- AL=FF if the first parameter contained an invalid drive specifier (otherwise AL=00).
- AH=FF if the second parameter contained an invalid drive specifier (otherwise AH=00).

For .COM programs:

All four segment registers contain the segment address of the Program Segment Prefix control block.

The Instruction Pointer (IP) is set to 100H.

SP register is set to the end of the program's segment or the bottom of the transient portion of COMMAND.COM, whichever is lower. The segment size at offset 6 is reduced by 100H to allow for a stack of that size.

A word of zeros is placed on the top of the stack.

For .EXE programs:

DS and ES registers are set to point to the Program Segment Prefix. (See below).

CS,IP, SS and SP registers are set to the values passed by the linker.

Last update: 02/03/2007

Technical Reference

MS-DOS NOTES