Decoded: pr (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of pr command (coreutils)

Summary

pr - convert text files for printing

[Source] [Code Walkthrough]

Lines of code: 2848
Principal syscalls: write() (via putchar())
Support syscall: None
Options: 62 (36 short, 26 long)

Descended from pr introduced in Version 1 UNIX (1971)
Added to Textutils in November 1992 [First version]
Number of revisions: 241

The pr utility includes excellent documentation at the top of the source file.


Setup

pr defines many globals and a struct used throughout the code. Some key ideas are:

Structs:

  • struct COLUMN - Manages data for a single column on the page. This struct is very similar to an 'object' in OOP - it has variables and generic function pointers that we can assign custom behaviors (a la methods). All output runs through the COLUMN structs.

Globals:

  • align_empty_cols - Flag indicating empty columns in line
  • *buff - The buffer to store solumn data
  • buff_current - The index to buff
  • buff_allocated - The size of buff
  • *column_vector - All of the columns we need to output
  • *end_vector - Holds horizonal position the end of line
  • explicit_columns - Flag set if the number of columns is given
  • FF_only - Flag to indicate a form feed detected in column
  • have_read_stdin - Flag indicating that we used STDIN
  • join_lines - Flag set for line merger (blends options -w and -s)
  • *line_vector - Start index of each line in buff
  • parallel_files - Flag for printing multiple files in parallel (defualt no)
  • print_a_header - Flag indicating the time to print a page header
  • storing_columns - Flag set if we're printing a single file in columns (must buffer)
  • truncate_lines - Flag to clip lines longer than page width
  • use_form_feed - Flag to use form feed in place of newlines (\f vs \n)

Other important globals are used to hold the computed page geometry. Their names are self explanatory: lines_per_page, lines_per_header, lines_per_body, lines_per_footer, chars_per_line, chars_per_column, chars_per_output_tab

main() begins by initializing a few more variables

  • n_files - The number of files we're processing (index for file_names)
  • old_options - Flag indicating that we're using old options (-w or -s)
  • old_w - Flag set if we're using the old page width option
  • old_s - Flag set if we're using the old separator option
  • file_names - Array holding the input file names
  • column_count_string - Holds the column count as a string
  • n_digits - Length of the column count
  • n_alloc - Allocated length of the column count

Parsing

Parsing the cli input answers these questions:

  • What is the page format? (length, columns, spacing, header, etc)
  • What is the inputs source? (files, stdin)

After parsing options, some legacy choices are translated to newer versions including as -s and -w.

Finally, we copy the files names passed on the command line in to a file_names[] array. If there are no file names, assume that the remainder of the standard input is to be processed as a file (usually redirected in).

Parsing failures

These failure cases are explicitly checked:

  • Nonsensical page ranges, line numbers, or offsets
  • Missing option operands
  • Unusual page widths
  • Combining column count and parallel printing
  • Combining printing across and parallel printing

Execution

Now we're ready for input processing through several layers: File, Page, Line, and Column. Each layer performs tasks and checks before calling to the next lower layer. Actual output printing happens at the Column layer using the COLUMN struct's print function.

Files

First we process all the input files within the function print_files(). Since this is the highest level, we start by preparing the execution environment by computing global parameters from the parsed options. These include line sizes, separators, tabs, join behavior, and truncation.

The remainder of the file processing includes:

  • Initializing column buffers
  • Skipping pages if requested by the user
  • Determine the final output functions for COLUMNS
  • Pass to the page layer (print_page())

Pages

Like files, print_page() begins with page initialization to set the source for each column to be printed (either a stored buffer or directly from the file)

Other paging flags are the header and the vertical padding/spacing.

Then we begin the loop through each line on the page

Lines

The line loop is contained within the print_page() function.

At the beginning of a line we must reset counters including output position, spaces skipped, separators skipped, padding status, and a few other flags. Now we perform the actual output by loop through each column (see next section). Afterwards, we complete a line by vertically padding and double spacing (if necessary).

Columns

The heart of this procedure is the call to COLUMN->print_func(). Naturally, this only happen if the column still has data left to print. Subsequently, we may need to print column separators or add alignment padding if the line was empty.

There are several ways that file processing could fail:

Failure cases:

  • Too many pages (overflow)
  • Unable to close a file

Helpers

  • add_line_number() - Prints the current line number
  • align_column() - Pads column if necessary and prints the separator
  • balance() - Computes balanced lines per page (as in SysV)
  • char_to_clump() - Converts char clump buffer and returns true size
  • cleanup() - Frees global buffers
  • close_file() - Closes an input file and updates COLUMN data
  • cols_ready_to_print() - Returns number of columns with input ready
  • first_last_page() - Sets the first and last page number
  • getoptarg() - Parse option groups
  • getoptnum() - Parse number arguments
  • hold_file() - Suppress file updates for the rest of the page
  • init_fps() - Initializes input files for processing
  • init_funcs() - Sets up the printing/reading functions for COLUMNs
  • init_header() - Constructs the page header
  • init_page() - Gathers column status for the page and sets globals
  • init_parameters() - Sets up the page geometry data from input
  • init_store_cols() - Allocate column data
  • integer_overflow() - Fail procedure for integer overfow
  • open_file() - Accesses an input file and sets COLUMN data as needed
  • pad_across_to() - Pad line to a position
  • pad_down() - Pad the rest of the page (\f or \n as needed)
  • print_char() - Prints a character, escaping input as needed
  • print_clump() - Prints a character group
  • parse_column_count() - Parses a column count string
  • print_files() - Process all input files
  • print_header() - Outputs the page header
  • print_page() - Output a page
  • print_sep_string() - Counts separator usage and prints appropriate character
  • print_stored() - Prints aline from the buffer
  • print_white_space() - Prints blanks as needed for space or tab
  • read_line() - Reads an entire line, clumping characters among \n, \f, and EOF
  • read_rest_of_line() - Read the remainder of a line after a break
  • reset_status() - Resume files on hold
  • separator_string() - Updates separator string length
  • skip_read() - Reads and counts lines from columns. Discards characters
  • skip_to_page() - Skips lines until a specific page
  • store_char() - Handles a character by storing in a buffer
  • store_columns() - Buffers leading columns in a line

[Back to Project Main Page]