Decoded: wc (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of wc command (coreutils)

Summary

wc - print newline, word, and byte counts

[Source] [Code Walkthrough]

Lines of code: 875
Principal syscall: write()
Support syscalls: open(), close()
Options: 13 (5 short, 8 long)

Descended from wc introduced in Version 1 UNIX (1971)
Added to Textutils in November 1992 [First version]
Number of revisions: 186

Helpers:
  • compute_number_width() - Computes an optimal output print width for counters
  • get_input_fstatus() - stat() input files and retain results
  • wc() - Top level counting procedure for all files
  • wc_file() - Counts a single file
  • write_counts() - Print provided file count results
External non-standard helpers:
  • argv_iter_init_argv() - Creates an argument interator for argv (from gnulib)
  • die() - Exit with mandatory non-zero error and message to stderr
  • error() - Outputs error message to standard error with possible process termination

Setup

wc keeps tracks the counts and other variables as globals, including:

  • have_read_stdin - Flag set if input is read from STDIN
  • max_line_length - Tracks the longest line seen
  • number_width - The width of the output display for counts
  • page_size - The number of bytes in a memory page (a la getpagesize())
  • print_bytes - Flag set to display byte counts
  • print_chars - Flag set to display character counts
  • print_linelength - Flag set to display line length
  • print_lines - Flag set to display line counts
  • print_words - Flag set to display word counts
  • total_bytes - The total number of bytes across all files
  • total_chars - The total number of characters across all files
  • total_lines - The total number of lines across all files
  • total_words - The total number of words across all files

main() introduces a few local variables:

  • **files - The input file names
  • *files_from - The name of the reference file holding all the target file names
  • fstatus - A custom structure holding stat() info and execution status
  • nfiles - The number of files to count
  • ok - The final return status
  • optc - The character for the next option to process
  • tok - Token structure based on obstacks for input file list

Parsing

Parsing user options defines the execution parameters, specifically:

  • Are we counting by bytes, characters, words, or lines?
  • Is there a max line length?
  • Is there a separate file which lists the target files to analyze?

The only parsing error caught is if the user provides files via command line and with a list file. The result is a short error message followed by the usage instructions.


Execution

The most complicated aspect of wc is that reading files may use one of several procedures optimized on how the user is counting. This is reflected in the wc() function which accounts for over 550 lines (60%) of the source file. The high-level procedure for the utility is:

  • Access the file list (source file or command line arguments)
  • Tokenize the file list (if using a file list)
  • Get the status of each file using (f)stat() and define fstatus structures
  • Compute optimal number widths based on all files
  • While there still files to process:
    • Verify file accessibility (exists? name? openable?)
    • Read each element from the file (byte, word, line)
    • Check for appropriate delimiters to count each element appropriately
    • At the end of a file, print the file counts and add them to the total counts
    • Repeat above for all input files
  • Print the totals

Failure cases:

  • Unable to open or close input files
  • Unable to read from input source
  • Unable to allocate memory for reading input
  • Unable to close standard input after reading

All failures at this stage output an error message to STDERR and return without displaying usage help


[Back to Project Main Page]