Decoded: tac (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [No POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of tac command (coreutils)

Summary

tac - concatenate and write files in reverse

[Source] [Code Walkthrough]

Lines of code: 714
Principal syscall: write()
Support syscalls: open(), close() (both via the standard fstream functions)
Options: 2 (3 short, 5 long)

Lineage unclear -- not part of Research UNIX or System V
Added to Fileutils in November 1992 [First version]
Number of revisions: 191

The tac utility is functionally similar to cat but the implementation strategy is necessarily different in much the same way head differs from tail. We need to read and buffer the file contents from the beginning until we find the end and then begin output. Non-seekable data sources (ttys) complicate the problem further.

Helpers:
  • copy_to_temp() - Steams data to a temporary file
  • output() - Writes buffer data to output
  • record_or_unlink_tempfile() -
  • tac_file() - Top level tac procedure
  • tac_nonseekable() - tac procedure for nonseekable source
  • tac_seekable() - tac procedure for seekable sources (normal files)
  • temp_stream() - Creates a temporary stream file
  • unlink_tempfile() - Closes and deletes a file
External non-standard helpers:
  • die() - Exit with mandatory non-zero error and message to stderr
  • error() - Outputs error message to standard error with possible process termination

Setup

tac uses several flags and variables as globals for managing execution:

  • G_buffer - The input buffer
  • G_buffer_size - The size of the input buffer (generally 2 full reads, plus sentinal, plus 2)
  • have_read_stdin - Flag set if data is read from STDIN
  • match_length - The length of a successful match
  • read_size - The size of the read buffer
  • sentinel_length - The length of a separator
  • separator - The record/line separating string
  • separator_ends_record - Flag if the separator should be at the end (or beginning of subsequent)

tac may use regular expressions to match a separator. This features relies on the POSIX regular expression library and thus we bring in some structures and variables to support it:

  • compiled_separator - The compiled pattern buffer
  • compiled_separator_fastmap - The fastmap for lookups
  • regs - The match information structure
  • main() introduces several local variables for processing:

    • *file - The name of the next file to process
    • *error_message - The regex compiler message
    • default_file_list[] -
    • half_buffer_size -
    • ok - The final return status
    • optc - The character for the next option to process

    Parsing

    Parsing tac is fairly simple considering the lack of options. The information provided by the user considers separators and targets:

    • Should separators apply to the beginning or end of a record (line)
    • What is the separator?
    • Should we match separators based on regular expression?

    Parsing failures

    These failure cases are explicitly checked:

    • Not defining a separator if using a regex
    • Using an unknown option

    User specified parsing failures result in a short error message followed by the usage instructions. Access related parsing errors die with an error message.


    Execution

    If a non-seekable data source is used (i.e. ttys, pipes, sockets, FIFOs), then tac begins by creating a temporary file and stream all of the source contents to that file first. Now we can apply the seekable procedure to all cases:

    • Find the end of the file and align partial first read to buffer size
    • Scan backwards to match end of record/line separator (regex and normal match cases)
    • Copy record to buffer
    • Repeat backward scan until buffer is full then output buffer
    • When the beginning of the file is found, output current buffer
    • Close the file
    • Repeat entire process with any subsequent files requested

    The logical diagram of tac shown above doesn't capture the many ways the utility could fail during processing.

    Failure cases:

    • Unable to open or close data source
    • Unable to create temporary file for non-seekable sources
    • Unable to read from data source/temp file at any point
    • Invalid regular expression separator defined
    • A single line/record is too large to fit in the buffer
    • Seek fails on a seekable source
    • Write fails while streaming non-seekable source to temp file
    • Output write fails

    All failures at this stage output an error message to STDERR and return without displaying usage help


    [Back to Project Main Page]