Boston University’s RAX Library

(circa 1973-8)

Boston University (BU) developed its own timesharing system in the 1970s for its IBM 360 and 370 mainframes. The system was based on the batch-oriented Remote Access Computing System (RACS) developed by IBM. McGill University also participated in RAX development, but their version was renamed “McGill University System for Interactive Computing” (MUSIC). Although many of the details are lost in the mists of time, both systems used some text processing tools developed at BU.

At the time, IBM had developed a few timesharing systems, but they were generally expensive and slow. IBM’s standard operating systems for the 360 series had a file system; files were referred to as data sets. To put matters as charitably as possible, IBM’s data set support was not suited to the dynamic nature of file access in timesharing environments. Frankly, it was a beast. So RAX really needed its own file system.

In accordance with the traditions of IBM data processing, a RAX file looked more-or-less like a deck of punched cards. Files consisted of “records” that carried individual lines of text. Unlike punched cards, trailing blanks were omitted and the individual records (lines) could vary in length. More significantly, files were either read sequentially in a single pass, or written sequentially in a single pass. There wasn’t any notion of random access or of modifying the middle of a file without rewriting the whole thing. While RAX did support random access to hard drive files, the function was limited to specially allocated files (standard IBM data sets, actually) and used special operations that were only avaliable to assembly language programmers.

Each file had a unique name and was ‘owned’ by the user that created it. Users could modify the permissions on files to share them with other users.

The RAX system’s timesharing hours were generally limited to daytime and evenings. Overnight, the CPU was rebooted with IBM’s OS/360 or OS/VS1 to run batch jobs. Thus, the RAX hard drives had to be compatible with IBM’s native file system, such as it was. The RAX library was implemented inside a collection of IBM data sets, each data set serving as a pool of disk blocks to use in library files. These disk blocks were called space sets and contained 512 bytes each.

A complete RAX library file name contained two parts: an 8-character index name and an 8-character file name. While this gave the illusion of there being a hierarchical file system, there was no true ‘root’ directory. All files not used by the RAX system programming staff resided in the “userlib” index; if no index name was given, RAX searched in userlib. The directory arrangement apparently worked as follows:

There were a small number of IBM data sets that served as library directories (indexes). A file’s index name selected the appropriate data set to search for that file’s directory entry. These index files were apparently set up using IBM’s Indexed Sequential Access Method (ISAM). Such files were specially formatted to use a feature of the IBM disk hardware. Each data block in the file contained a key field along with space for a library file’s directory entry. The “key” part contained the file name. The IBM disk hardware could be told to scan the data set until it found the record whose key contained that name, and then it would retrieve the corresponding data. This put the burden of directory searching on the hard drive, and freed up the CPU to work on other tasks.

The directory entry contained the usual timestamps (date created, accessed, modified, etc.), ownership information, access permissions, size, and a pointer to the first space set in the file.

Once the system knew the location of the file’s first space set, it could retrieve the file’s contents sequentially. A space set address was a 32-bit number formatted in 2 fields:

RAX space set ID

Remember that the library consisted of numerous data sets that served as pools of data blocks These pools were called lib files, and were numbered sequentially. The data blocks, or space sets, were numbered sequentially inside each lib file.

Files within the RAX library were implemented as a list of linked space sets. The first four bytes of each space set carried the pointer to the next one in the file. The pointer bytes were managed automatically by the system’s read and write operations; they were invisible to user programs. The net result was that user programs perceived space sets as containing only 508 bytes, since 4 bytes were used for the link pointer.

A single library file could contain space sets from many different lib files. Since each lib file tended to represent a contiguous set of disk space, file retrieval was most efficient when all space sets came from the same lib file. In practice, however, a file would incorporate space sets from whichever lib file had the most available.

Free space was managed within individual lib files. Each lib file kept a linked list of free space sets. Space sets from deleted files were added back to the free list in the appropriate lib file.

Here is a review of the eight issues listed above:

  • File data structure – variable length records that more or less corresponded to lines of text.
  • File block structure – uses a linked list to organize randomly located disk blocks into a sequential file
  • Directories – effectively a single level directory structure with user permissions and timestamps
  • Free space – manages in arbitrary, locally maintained lists. Any block can be in any file, eliminating fragmentation problems.
  • Easy to implement – built atop a rich IBM-oriented I/O mechanism – simple to implement in that environment, but hard to replicate in non-IBM environments.
  • Speed – Directory lookup is very fast. File data access is rarely optimized, though
  • Sequential vs. direct – system really only supports sequential access
  • Storage sizes – can combine data sets on multiple drives to store perhaps 2TB of data, assuming 512 byte space sets. Individual files are probably limited to 4GB by the size field in the directory entry.
  • Robustness – Links are brittle. System crashes could cause link inconsistencies, and the risk of a file’s link pointing to a space set on the free list. However, file replacement consists of not eliminating the old file and its chain of space sets until the new file’s chain has been completely built. System crashes during such updates would usually leave the previous file intact.