Victor R. Volkman received a BS in Computer Science from Michigan Technological University. He has been a frequent contributor to The C Users Journal since 1987. He is currently employed as Senior Analyst at H.C.I.A. of Ann Arbor, Michigan. He can be reached by dial-in at the HAL 9000 BBS (313) 663-4173 or by Usenet mail to sysop@ha19k.com.
This article is abstracted from original documentation by Thomas Wolff, wolff@inf.fu-berlin.de, Freie Universitt Berlin, Institut fr Informatik, D-14195 Berlin, Germany, expressly for Reprint in The C Users Journal.
The MINED editor, originally by Michiel Huisjes and extensively modified by Achim Mller and Thomas Wolff, is a powerful multi-platform text editor. MINED runs on most UNIX platforms as well as MS-DOS and DEC VAX-11/VMS. MINED uses TERMCAP under UNIX and can edit text in any 8-bit character set. One of its more interesting features is full regular expression pattern matching for search and replace operations. This abstract provides some notes on implementing MINED. MINED is available from the CUG Library as volume #399.
MINED is a screen editor designed for the MINIX operating system (MINix + EDitor = MINED). MINED is designed for files smaller than 50K and it tends to be quite fast. When MINED starts up, it reads the entire file into memory, thus minimizing disk access. (Only a few save, write, and copy commands require access to the disks.)
Like Emacs and Jove, MINED is a modeless editor. Each character has its own entry in a 256-entry pointer-to-function array. When you type a character, MINED calls the corresponding function. Printable characters are connected with a function which inserts that character at the current location in the file. Other characters invoke other functions.
The display consists of SCREENMAX + 1 lines and XMAX + 1 characters. When a line becomes larger than XBREAK characters, the line is either shifted SHIFT_SIZE characters to the left (which means that the first SHIFT_SIZE characters are not printed) or the end of the line is marked with the SHIFT_MARK character and the rest of the line is not printed. A line can never exceed MAX_CHARS characters. MINED will try to keep the cursor on the same line and same (relative) x-coordinate unless you alter the text. If you scroll up or down, the cursor attempts to stay in the same position with respect to the text. If the cursor position leaves the field of view, the cursor will move to the nearest visible line.
Every character on the line is available for editing including the linefeed at the end of the line. When the linefeed is deleted, the current line and the next line are joined. The last character of the file (which is always a linefeed) can never be deleted.
The bottom line (as indicated by YMAX + 1) is used as a status line during editing. This line is usually blank, but sometimes contains a request for information MINED needs during editing. Commands and requests on this line are displayed in reverse video.
The terminal modes have changed completely since the last version. All signals like start/stop, interrupt etc. are unset. The only signal that remains set is the quit signal. The quit signal (^\) is the general abort signal for MINED. Typing a ^\ during searching or when MINED is asking for filenames, etc. will abort the function and MINED will return to the main loop.
Sending a quit signal during the main loop will abort the session (after confirmation). If the file has been modified, MINED will ask if you want to save the file. If there isn't enough space left on the disk for a save, MINED will give an error message and continue.
MINED attempts to mimimize the number of system calls. (This also tends to make it run fast.) I/O is done in SCREEN_SIZE reads/writes and accumulated output is flushed at the end of each character.
Regular Expressions
MINED has a built-in regular expression matcher for search and replace routines. This allows for very efficient "wildcard" search requests. A regular expression can contain any number of normal characters and also these special characters:1. A. ("dot," indicates matching any character)
2. A ^ (matching the begin of a line)
3. A $ (as the last character of the pattern) matching the end of a line
4. A \<character> matching <character>
5. A number of characters enclosed in [] pairs (matching any of the enclosed characters). You can indicate a list of characters with a hyphen. For example, [a-z] matches any letter of the alphabet. If the first character after the [ is a ^ then MINED will match anything but the characters in the set.
Putting \ in front of any of the special characters negates its special meaning. Therefore a \ must be represented by \\.
6. * means that MINED will match a sequence of zero or more occurrences of the previous expression.
MINED performs regular expression searches in two phases. In the first phase the expression is compiled into a more comprehensible form. In the second phase the actual matching is performed. For more details see "Search and Replace Routines" below.
MINED Data Structures
In MINED, the whole data file is kept in a double-linked list of lines. The LINE structure looks like this:
typedef struct Line { struct Line * next; struct Line *prev; char * text; unsigned char shift_count; } LINE;Each line entry contains a pointer to the next line, a pointer to the previous line and a pointer to the text of that line. A special field called shift_count contains the number of shifts (in units of SHIFT_SIZE) performed on that line. The total size of the structure is seven bytes per line so a file consisting of 1,000 empty lines will waste a lot of memory. A LINE structure is allocated for each line in the file. After that, MINED determines the number of characters in a line and allocates sufficient space to store them (including a linefeed and a '\0'). The resulting address is assigned to the text field in the structure.A special structure is allocated and its address is assigned to the variable header as well as the variable tail. The text field of this structure is set to NIL_PTR. The tail->prev of this structure points to the last line of the file and the header->next to the first line. Other LINE * variables are top_line and bot_line, which point to the first line and last line on the screen respectively.
Two other variables are also important. First, the LINE * cur_line, which points to the LINE currently in use and the char * cur_text, which points to the character at which the cursor stands.
Whenever an ASCII character is typed, a new line is built with this character inserted. Then the old data space (pointed to by cur_line->text) is freed and space for the new line is allocated and assigned to cur_line->text.
Two global variables called x and y represent the x and y-coordinates of the cursor. The global variable nlines contains the number of lines in the file. Last_y indicates the maximum y coordinate of the screen (which is usually SCREENMAX).
You must initialize a few strings by hand before compiling MINED. These string are enter_string, which is printed upon entering MINED, rev_video (turn on reverse video), normal_video, rev_scroll (perform a reverse scroll) and pos_string. pos_string positions the cursor. The #define X_PLUS and Y_PLUS contain the characters that MINED will add to the coordinates x and y (both starting at 0) to finish cursor positioning.
Starting Up
You can call MINED with or without a filename argument. If you specify a filename, the function load_file checks if the file exists, if it can be read, and if it is writable (setting the writable flag accordingly). If the file can be read, load_file reads a line from the file and stores this line into a structure by calling install_line and line_insert. line_insert installs the line into the double linked list. loadfile then reads another line and repeats the process until it reaches the end of the file.Lines are read by the function get_line, which buffers the reading in blocks of SCREEN_SIZE. load_file also initializes the LINE * variables described above.
Moving Around
MINED has several commands for moving through the file. You can move up (UP), down (DN), left (LF), and right (RT) with the arrow keys. Moving one line below the screen scrolls the screen one line up. Moving one line above the screen scrolls the screen one line down. The functions forward_scroll and reverse_scroll scroll the screen.There are several other move functions: begin of line (BL), end of line (EL), top of screen (HIGH), bottom of screen (LOW), top of file (HO), end of file (EF), scroll one page down (PD), scroll one page up (PU), scroll one line down (SD), scroll one line up (SU), and move to a certain line number (GOTO).
Two functions called MN and MP move one word forward or backward. A word is a number of non-blanks seperated by a space, a tab, or a linefeed.
Modifying Text
The modifying commands are built around the two functions, insert and delete. insert must be told where to insert the text. It doesn't make any difference whether this text contains linefeeds or not. delete must be given a pointer to the start line, a pointer to where deleting should start on that line, and the same information about the end position. The last character of the file will never be deleted. delete will make the necessary changes to the screen after deleting, but insert won't.The functions for modifying text are: insert one char (S), insert a file (file_insert(fd)), insert a linefeed and put cursor back to end of line (LIB), delete character under the cursor (DCC), delete before cursor (even linefeed) (DPC), delete next word (DNW), delete previous word (DPW), and delete to end of line (if the cursor is at a linefeed delete line) (DLN).
Yanking
MINED provides a few utilities for yanking pieces of text. The function MA marks the current position in the file. This is done by setting LINE * mark_line and char * mark_text to the current position. Yanking of text can occur in either of two modes. The first mode just copies the text from the mark to the current position (or visa versa) into a buffer (YA) and the second also deletes the text (DT). Both functions call the function set_up (YA with the delete flag on; DT with it off). set_up checks if the marked position is still valid (by using check_mark and legal), then calls the function yank, sending yank the start and end positions in the file. yank copies the text into a scratch_file as indicated by the variable yank_file. At the end of copying, yank will (if necessary) delete the text. A global flag called yank_status keeps track of the buffer (or file) status. yank_status is initialized to NOT_VALID and set to EMPTY (by set_up) or VALID (by yank). Several things can be done with the buffer. It can be inserted somewhere else in the file (PT) or it can be copied into another file (WB).
Search and Replace Routines
The string search and replace routines use regular expresions (see above). For any expression, the function compile is called with the expression as an argument. compile returns a pointer to a structure that looks like this:
typedef struct regex { union { char * err_mess; int * expression; } result; char status; char * start_ptr; char * end_ptr; } REGEX;If something goes wrong during compiling (e.g. an invalid expression), the function reg_error is called, which sets the status field to REG_ERROR and the err_mess field to the error message. If the match must be anchored at the beginning of the line (or end of line), the status field is set to BEGIN_LINE (or END_LINE). If none of these special cases apply, the field is set to 0 and the function finished is called. finished allocates space to hold the compiled expression and copies this expression into the expression field of the union (bcopy). Matching is performed with the routines match and line_check. Match takes as argument the REGEX * program, a pointer to the start position on the current line, and a flag indicating forward or reverse search. match checks the whole file until a match is found. If match is found it returns a pointer to the line in which the match was found. Otherwise it returns a NIL_LINE. Line_check takes the same arguments, but return either MATCH or NO_MATCH.During checking, the start_ptr and end_ptr fields of the REGEX structure are assigned to the start and end of the match. Both functions try to find a match by walking through the line character by character. For each possibility, the function check_string is called with the REGEX * program and the string as arguments. check_string starts walking through the expression until either the end of the expression or the end of the string is reached. Whenever check_string encounters a *, the position of the string is marked, the maximum number of matches are performed, and the function star is called to find the longest possible match. star takes as arguments the REGEX program, the current position of the string, the marked position, and the current position of the expression. star walks from the current position of the string back to the marked position, and calls string_check in order to find a match. (star returns MATCH or NO_MATCH, just as string_check does.)
Searching is now easy. Both search routines (forward search (SF) and backward search (SR)) call search with an appropriate message and a flag indicating forward or reverse search. search will get an expression from the user by calling get_expression. get_expression returns a pointer to a REGEX structure (or to NIL_REG if it encounters an error) and prompts for the expression. If no expression is given, the previous expression is used. After that, search will call match, and if a match is found, we can move to that place in the file by the functions find_x and find_y, which will find and display the match on the screen. Replacement can occur in either of two ways. A global replace (GR) or a line replace (LR). Both functions call change with a message and a flag indicating global or line replacement. change will prompt for the expression and for a replacement string. Every & in the replacement string means "substitute the search string." (You can escape this special meaning of & by preceding it with \. See "Regular Expressions," above.) When a match is found, the function substitute will perform the substitution.
Miscellaneous Commands
Here are some other important commands: redraw the screen (RD), fork a shell (SH), print file status (FS), write file to disc (WT), insert a file at current position (IF), leave editor (XT), and visit another file (VI). The last two functions will check if the file has been modified. If it has, they will ask if you want to save the file by calling ask_save.The function REPT will repeat a command n times. It will prompt for the number. (You can abort the loop by sending the ^\ signal.)
Utility Functions
MINED has several functions for internal use. The allocation routines alloc(bytes) and newline will return a pointer to free data space. If there is no more memory available, the function panic is called.The only signal that can be sent to MINED is the SIGQUIT signal. This signal functions as a general abort command. MINED will abort if the signal is given during the main loop. The function abort_mined causes MINED to abort.
panic takes an error message as an argument. It will print the message and the error number set by the kernel (errno), then ask if the file must be saved. It resets the terminal (raw_mode) and exits.
MINED string handling routines include copy_string(to, from), length_of(string), and build_string(buffer, format, arg1, arg2, ... ). build_string takes a description of the string from the format field and puts the result in the buffer. The functions status_line(string1, string2), error(string1, string2), clear_status, and bottom_line all print information on the status line.
Get_string(message, buffer) reads a string and getchar reads one character from the terminal.
Num_out((long) number) prints the argument number into an 11-digit field without leading zeros. It returns a pointer to the resulting string. File_status prints all file information on the status line. Set_cursor(x, y) prints the string that will position the cursor at coordinates x and y.
MINED has four output functions: writeline(fd, string), clear_buffer, write_char(fd, c), and flush_buffer(fd). Normally MINED writes to file descriptor STD_OUT (the terminal). It provides three functions for this purpose: string_print(string), putchar(c) and flush. All these functions use the global I/O buffer screen and the global index out_count. Thus, I/O is buffered, so that reads or writes can be done in blocks of SCREEN_SIZE size.
The following functions handle internal line maintenance. The function proceed(start_line, count). returns the countth line after start_line. If count is negative, the countth line before the start_line is returned. If a header or tail is encountered then the header or tail will be returned. Display(x, y, start_line, count) displays count lines starting at coordinates (x, y) and beginning at start_line. If a header or tail is encountered, empty lines are displayed instead. The function reset(head_line, ny) resets top_line, last_y, bot_line, cur_line, and y-coordinate. This is not a neat way to do the line maintenance, but it saves a lot of code. It is usually used in combination with display.
Put_line(line, offset, clear_line), prints a line (skipping characters according to the line->shift_size field) until XBREAK - offset characters are printed or a '\n' is encountered. If clear_line is TRUE, spaces are printed until XBREAK - offset characters. Line_print(line) is a #define from put_line(line, 0, TRUE). Moving is done with the functions move_to(x, y), move_addres(address), and move(x, address, y).
Platform Notes
To compile MINED on UNIX, try make with the makefile provided in the package. Depending on your system, you may have to change the screen handling method by selecting SCREEN = -DSGTTY instead of SCREEN = -DTERMIO in the makefile. If neither works, you should try the curses variant. This is the last choice for two reasons: curses screen output is clumsier than direct terminal control, and many UNIX curses implementations still obstruct the use of 8-bit character sets. (The curses option was built in for a quick port to VMS, where it is automatically selected.)On some systems, you may have to define the sysV variable (see makefile). Optimization can be turned on or off in makefile for generation/development.
On VMS, just compile the source files and link them with an extra library using the script linkmined.com.
On MSDOS, you may have the files in their UNIX form (i.e. with linefeed-only newlines) so you will have to change the newlines of all text files (except mined. hlp which is already dedicated) to an MSDOS compatible form. You can temporarily use the enclosed executable, mined.exe, to edit the MINED source and makefiles prior to compiling a customized version. (In some cases, you may choose not to compile MINED at all and can simply use the executable as is.) Configuration files for Turbo C are included. The COMPACT memory model should be used to allow the largest possible file size.