Decoded: tr (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of tr command (coreutils)

Summary

tr - translate, squeeze, and/or delete characters

[Source] [Code Walkthrough]

Lines of code: 1915
Principal syscall: write()
Support syscalls: close(), fadvise()
Options: 12 (6 short, 6 long)

Descended from tr introduced in Version 5 UNIX (1974)
Added to Shellutils in November 1992 [First version]
Number of revisions: 204

Helpers:
  • append_char_class() - Adds a character class (alpha, digits) to the Spec_list
  • append_equiv_class() - Adds a equivalence class (?) to the Spec_list
  • append_normal_char() - Adds a regular character to the Spec_list
  • append_range() - Adds a range (a-z) to the Spec_list
  • append_repeated_char() - Adds a character Kleene star to the Spec_list
  • build_spec_list() - Creates a Spec_list in two passes (unescaping and construction)
  • card_of_complement() - Computes set cardinality
  • es_free() - Frees any allocation within an E_string
  • find_bracketed_repeat() - Handles bracketed repeats '[c*3]' while building a Spec_list
  • find_closing_delim() - Finds matching delimiters (i.e. ':" or "=")
  • get_next() - Returns the next character from the processed Spec_list
  • get_spec_stats() - Updates the given Spec_list statistics
  • get_s1_spec_stats() - Updates Spec_list statistics for string1
  • get_s2_spec_stats() - Updates Spec_list statistics for string2
  • homogeneous_spec_list() - Tests a Spec_list for one type of entry
  • is_char_class_member() - Tests a character against a List_element char_class
  • is_equiv_class_member() - Tests a character against a List_element equiv_code
  • look_up_char_class() - Finds a character in the global class name array
  • make_printable_char() - Allocates a character in printable form (escaped if necessary)
  • make_printable_str() - Allocates a new string with all characters printable
  • parse_str() - Processes escapes and builds the Spec_list
  • plain_read() - Reads from STDIN with error reporting
  • read_and_delete() - The top-level delete procedure for tr
  • read_and_xlate() - The top-level translate procedure for tr
  • set_initialize() - Initializes membership sets (squeeze set, delete set)
  • skip_construct() - Skip the current construct in the Spec_list
  • spec_init() - Initializes an empty Spec_list
  • squeeze_filter() - The top level squeeze procedure for tr
  • star_digits_closebracket() - Tests the target character for a final digit in a repeat
  • string2_extend() - Grows the string2 Spec_list to match string1's list
  • unquote() - Converts escaped characters to their true form (unescaped)
  • validate() - Tests string1 and string2 for compatibility with execution mode
  • validate_case_classes() - Tests matching case classes for Spec_list 1 and 2.
External non-standard helpers:
  • die() - Exit with mandatory non-zero error and message to stderr
  • error() - Outputs error message to standard error with possible process termination

Setup

Throughout the source code and this page, we refer to two strings, string1 and string2. These are the user-provided argument strings.

There are a few important structures to know in tr:

  • E_string - A separate escaped representation of input arugment strings
  • List_element - One part of a descriptive range of strings
  • Spec_list - Describes individual elements of an argument string

tr keeps five important flags as globals to denote the operating mode:

  • complement - Flag to consider the complement of set1 (all characters not in the given string1)
  • delete - Flag to remove characters identified by the delete set (-d)
  • squeeze_repeats - Flag to suppress repeated characters (-s)
  • translating - Flag set if we're translating (not deleting and two string sets provided)
  • truncate_set1 - Flag set if excess characters in string1 should be ignored (-t)

main() introduces a few local variables:

  • buf1 - Buffer for Spec_list of string1
  • buf2 - Buffer for Spec_list of string2
  • c - The character for the next option to process
  • max_operands - The maximum number of user input strings
  • min_operands - The minimum number of user input strings
  • non_option_args - The number of arguments (strings) provided by the user
  • s1 - The Spec_list created from string1
  • s2 - The Spec_list created from string2

Parsing

Parsing answers the following questions to define the execution parameters

  • What behaviors do we want? Translation? Deletion? Squeezing?
  • What are the matching character sets for the specified operation?

Parsing failures

These failure cases are explicitly checked:

  • Not providing two strings for deleting, squeezing, or translating
  • Proving two strings when deleting without squeezing
  • Providing any other unexpected number of non-option argument
  • Unknown option used

User specified parsing failures result in a short error message followed by the usage instructions. Access related parsing errors die with an error message.


Execution

tr execution branches among three procedures: Translating (read_and_xlate()), Squeezing (squeeze_filter()), and Deleting (read_and_delete()). Before any of the procedures are invoked, the user input strings need to be parsed to build corresponding Spec_lists. Depending on the operation, delete and squeeze sets may be initialized, possibly basd on set complements. name need to be The overall procedure looks like this:

  • Parse the first user input string (string1) to create a Spec_list. This will always exist
  • Parse the first user input string (string2) to create a Spec_list. This is usually needed except in delete, non-squeeze operations
  • Validate the Spec_lists
  • Initialize required sets and perform desired procedure. These are as follows:
  • Squeeze only:
    • Initialize squeeze set
    • Perform squeeze_filter() with plain_read()
  • Delete only:
    • Initialize delete set
    • Perform read_and_delete()
  • Squeeze and delete:
    • Initialize delete set
    • Initialize squeeze set
    • Perform squeeze_filter() with read_and_delete()
  • Translate complement:
    • Initialize complement set
    • Perform simple mapping with translation array, xlate[]
  • Translate and squeeze:
    • Perform simple mapping with translation array, xlate[]
    • Initialize squeeze set
    • Perform squeeze_filter() with read_and_xlate()
  • Close standard input

Failure cases:

  • Specified strings have too many characters (validation error)
  • Unable to read from STDIN
  • Unable to write to STDOUT

All failures at this stage output an error message to STDERR and return without displaying usage help


[Back to Project Main Page]