Decoded: paste (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of paste command (coreutils)

Summary

paste - merge lines of files

[Source] [Code Walkthrough]

Lines of code: 531
Principal syscall: write()
Support syscalls: open(), close(), fadvise()
Options: 8 (3 short, 5 long)

Originated with or shortly before the release of System III (1982)
Added to Textutils in November 1992 [First version]
Number of revisions: 128

The paste utility interleaves lines from input files to standard output in the order requested by the user.

Helpers:
  • collapse_escapes() - Initializes the global delimiter with user input delimiters
  • paste_serial() - The serial merge procedure
  • paste_parallel() - The parallel merge procedure
  • write_error() - Exits with a write error
  • xputchar() - Writes a single character and handles errors
External non-standard helpers:
  • die() - Exit with mandatory non-zero error and message to stderr
  • error() - Outputs error message to standard error with possible process termination

Setup

paste defines five globals to manage execution behavior:

  • *delim_end - Pointer to mark the end of the delimiter lists
  • *delims - The list of delimiter characters
  • have_read_stdin - Flag set if STDIN was read from during processing
  • line_delim - The end of line character
  • serial_merge - Flag to control merge behavior (serial or parallel)

main() initializes the following:

  • delim_arg - The delimiter between output columns
  • optc - The first character of the next option to parse

Parsing

Parsing for paste determines the user behavior though considering:

  • Should we read input files in serial or parallel? (one file before the other, or one line from each)
  • What characters should separate columns and lines>

Parsing failures

Two failure cases are explicitly checked:

  • The last delimiter is a backslash
  • Using an unknown option

This failure result in a short error message followed by the usage instructions.


Execution

The first step is to translate the escaped control characters provided by the user as custom delimiters (if any)

Afterwards, execution follows two distinct paths for serial or parallel processing.

Serial execution:

  • Open the next input file
  • Read the next line from the file
  • Write the line to STDOUT
  • Write the next delimiter and rotate to the next delimiter
  • Repeat reading all lines until EOF
  • Open the next file and repeat entire process

Parallel execution:

  • Open all target files
  • Read the next line from the next source file
    • Rotate to the next source file
  • Write the line to STDOUT
  • Write the next delimiter
  • Repeat above until all lines for all files are written

After all files are processed, we then free the allocated delimiter string and close standard input

Failure cases:

  • Unable to open or close input stream
  • Dangling constraints / mismatched relations
  • Loop detected in node relationships (no topological ordering is possible)

Failures at this stage output an error message to STDERR unless quiet mode was enabled


[Back to Project Main Page]