Decoded: od (coreutils) – MaiZure's Projects

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of od command (coreutils)

Summary

od - write files in octal or other formats

[Source] [Code Walkthrough]

Lines of code: 1983
Principal syscall: write()
Support syscalls: open(), close()
Options: 38 (27 short, 11 long, does not include legacy digits for field skip)

Descended from rm introduced in Version 1 UNIX (1971)
Added to Textutils in November 1992 [First version]
Number of revisions: 254

Helpers:

check_and_close() - Tests a stream for errors and closes it
decode_format_string() - Decodes the modern format string
decode_one_format() - Creates the spec format by reading the -t option argument
dump() - Top level od procedure: read, format, write
dump_hexl_mode_trailer() - Procedure for single character view ('z' trailer)
dump_strings() - The level procedure for od -S
format_address_label() - Formats pseudo-address label in legacy mode
format_address_none() - Applies no address format
format_address_paren() - Applies parenthese to a pseudo-address (label)
format_address_std() - Applies the standard address formats for all bases
get_lcm() - Gets the least common multiple of spec sizes
open_next_file() - Opens the subsequent file from the processing list
parse_old_offset() - Parses legacy offset input
print_(TYPE) - A family of ten functions to print a block of standard C types
print_ascii() - Outputs a block of escaped ASCII text
print_named_ascii() - Prints in named-character mode (7-bit)
read_block() - Reads a block of bytes into a given buffer
read_char() - Reads a single byte into a given position
simple_strtoul() - Converts a string to a long value
skip() - Skips headers by repositioning read pointer
write_block() - Writes the now-formatted block to STDOUT

External non-standard helpers:

die() - Exit with mandatory non-zero error and message to stderr
error() - Outputs error message to standard error with possible process termination

Setup

The od utility defines an important structure, struct tspec, built from user input and applied to data before output. Reach format contains one of the printing functions as well as size and padding information.

od uses many global variables and flags to support the conversion operations, including:

abbreviate_duplicated_blocks - Flag to convert duplicate blocks to asterisks
address_base - The number base address are displayed in (-A)
bytes_per_block - The number of input bytes formatted per output line
bytes_to_hex_digits[] - Array to map hex widths, indexed by byte width
bytes_to_oct_digits[] - Array to map octal widths, indexed by byte width
bytes_to_signed_dec_digits[] - Array to map signed decimal widths, indexed by byte width
bytes_to_unsigned_dec_digits[] - Array to map unsigned decimal widths, indexed by byte width
charname[][] - The names of non-printable characters indexed by value
*default_file_list[] - The file list used if none are provided (STDIN)
end_offset - The first byte after the last byte formatted
**file_list - The list of input files on the command line
flag_dump_strings - Flag to dump strings with -S
flag_pseudo_start - Flag if a legacy pseudo-address was used
have_read_stdin - Flag set if input was read from sTDIN
*in_stream - The input file stream after opening
*input_filename - A reference to the command line file names
input_swap - Flag to use native endianess
limit_bytes_to_format - Flag set if only certain ytes are formatted (-N)
max_bytes_to_format - Limit the input reading to this many bytes (-N)
n_bytes_to_skip - Number of input bytes to skip (-j)
n_specs - The number of specifications defined
n_specs_allocated - The number of specifications allocated (possibly some unused)
string_min - The minimum length of strings requested (-S)
traditional - Flag to support legacy arguments
*spec - Global array of format specifications

main() introduces a few local variables:

desired_width - The width requested by the user (-w)
i - Generic iterator used in several places
l_c_m - Holds the least common multiple of format specs
modern - Flag set if we're using modern od syntax
multipliers[] - Array of magnitude suffixes
n_files - The number of input files specified on the command line
ok - The final return status
width_per_block - Holds the minimum block width across all format specs
width_specified - Flag set if the user requested a specific width (-w)

Parsing

Parsing answers the following questions to define the execution parameters

How should the addresses be displayed?
Should we skip some bytes (header)?
What formatting should be applied to the data?

Construction of tspec format structures occurs during parsing as they are encountered

Parsing failures

These failure cases are explicitly checked:

Trying to process many files in legacy mode
Choosing an unknown address format
Suggesting string lengths larger than SIZE_MAX
Applying format types to string dumps
Unknown option used

User specified parsing failures result in a short error message followed by the usage instructions. Access related parsing errors die with an error message.

Execution

od operates as you would expect: Read input, apply format, write to output. A closer look at the procedure shows:

Compute address format
Gather input file list
Open first input file and skip header
Compute block length and padding for block alignment
Execution branches to either dump() or dump_strings() based on the user option -S
Regular dump:
- Read the next block -- EOF opens the next file and retries read
- Tests block for previous match and handle abbreviations
- Print the address
- Call the tspec format for the associated block
- Repeat sequence until the last block of the last file is read
String dump:
- Read the next character -- EOF opens the next file and retries read
- Process possible escape strings
- Write the processes character
- Repeat sequence until all characters of all files processed

Failure cases:

Reading or writing failures
Skip increment too large
Invalid string character encountered
Unable to close input files or standard input

All failures at this stage output an error message to STDERR and return without displaying usage help

[Back to Project Main Page]