Decoded: unexpand (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of unexpand command (coreutils)

Summary

unexpand - convert spaces to tabs

[Source] [Code Walkthrough]

Lines of code: 327
Principal syscalls: write() (via putchar())
Support syscall: None
Options: 7 (2 short, 5 long)

Descended from unexpand in 3BSD (1979)
Added to Textutils in November 1992 [First version]
Number of revisions: 140

The unexpand utility has a sister utility, expand. Many of the common functions shared between both have been factored out and live in an included file, unexpand-common.c. Many of the design concepts are shared between both utilities.

Helpers:
  • unexpand() - The worker function that changes all spaces to tabs and prints
External non-standard helpers:
  • add_tab_stop() - Adds a new tab from user input to the list
  • cleanup_file_list_stdin() - Closes STDIN file list
  • die() - Exit with mandatory non-zero error and message to stderr
  • finalize_tab_stops() - Sets the final tab stop values
  • get_next_tab_column() - Returns the position of the next tab stop
  • parse_tab_stops() - Adds more tab stops to the existing list
  • set_file_list() - Assigns the file list

Setup

The unexpand utility has a deceptively simple setup. All of the globals are included in unexpand-common.c. One global used directly in unexpand.c is:

  • convert_entire_line - Flag to indicate processing entire line (default off, leading tabs only)

Several variable are declared in main():

  • c - holds the next option for processing
  • convert_first_only - Flag to only process leading blanks
  • have_tabval - Flag set if user provided tab values
  • tabval - The user provided tab value

Parsing

Parsing for unexpand answers these questions:

  • Do we change only leading space, or the entire line of two+ spaces?
  • Where are the tabs and how large are they (default 8 spaces)

A legacy syntax is also supported, where the tab stops appear as integer option values

Parsing failures

The only parsing failure occurs if the user provides an unknown option


Execution

The unexpand procedures look like this:

  • Using the parsed information, set the final tab stop positions
  • Prepare the input file list
  • Process all files, all lines
    • Handle each double space and backspace (partial or full line as requested)
    • Output normal characters as usual

Failure cases:

  • Unable to output character
  • The input line is too long

[Back to Project Main Page]