Decoded: head (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of head command (coreutils)

Summary

head - output the first part of files

[Source] [Code Walkthrough]

Lines of code: 1096
Principal syscall: fwrite()
Support syscalls: open(), fstat()
Options: 14 (5 short, 9 long, not including number options)

Descended from head included in System V (1985)
Added to Textutils in November 1992 [First version]
Number of revisions: 166

Helpers:
  • copy_fd() - Copies data from file to buffer then writes to STDOUT
  • diagnose_copy_fd_failures() - outputs the appropriate message for an error code
  • elide_tail_bytes_file() - Omit lines from the end, counting by bytes
  • elide_tail_bytes_pipe() - Buffers bytes from fd and omits as specified
  • elide_tail_lines_file() - Omit lines from the end, counting by lines
  • elide_tail_lines_pipe() - Buffers lines from fd and omits as specified
  • elide_tail_lines_seekable() - Seek and omit as needed
  • elseek() - Moves position within a file and checks errors
  • head() - Primary processing function for an input file
  • head_bytes() - Output from the beginning, counting by lines
  • head_file() - Opens a file for processing by head()
  • head_lines() - Output from the beginning, counting by bytes
  • string_to_integer() - Convert string to int (Wrapper for __xdectoint()
  • write_header() - Adds a header to the output if needed
  • xwrite_stdout() - Writes data from buffer and checks for errors
External non-standard helpers:
  • die() - Exit with mandatory non-zero error and message to stderr
  • error() - Outputs error message to standard error with possible process termination
  • xset_binary_mode() - Set the file descriptor access mode

Setup

Several variables are defined at global scope, including:

  • presume_input_pipe is a flag for processing pipes (skip file check)
  • The print_headers flag allows headers to be printed
  • The line_end character holds the line delimiter

main() initializes the following:

  • header_mode - Defines when to print he aders
  • ok - Holds the return status of the utility
  • c - The character value of the next option
  • i - Generic iterator index for file array
  • n_units - The number of lines/bytes to process
  • count_lines - Flag for line or byte mode
  • elide_from_end - Flag for include or exclude mode
  • default_file_list[] - The input files to process if none are provide (STDIN)
  • file_list - The user provided viles to process

Parsing

Parsing begins differently than most utilities. We start by looking explicity for legacy syntax which begins with a line/byte count and options appended (i.e head -6v if we want to see the first 6 lines without a header). This routine ends by resetting the option/argument pointers

Now we parse the options again with getopt to the same effect for the new syntax

We should know the following:

  • Are we processing lines or bytes?
  • Are the lines/byte inclusive from the beginning or exclusive from the end
  • Do we output headers?
  • Do we collapse consecutive spaces?

Parsing failures

Two failure cases:

  • Unknown or invalid options
  • Invalid number of bytes to process

Execution

Implementing head would be self-explanatory if it weren't for the omit case on sequential stream (pipes). Let's consider the basic cases and expand the problem

Output from the beginning of a file
Set the counter for the number of lines/bytes we need to output. Then read one 'unit' of input, decrement the counter, output the data and repeat until the counter is 0. That's it!

Output until a given point before the end of a file
This situation is more subtle. We don't necessarily know where the point is from the beginning of a file, but since the data stream is persistant, we have random access. We can start at the end of the file and move backward with the counter to find the point to exclude. Then output from the beginning up to the point. A little more work, but not as problematic as the file case.

Output until a given point before the end of a sequential stream
We can only read the data sequentually from beginning to end, and we can only read it once. We don't know how large the input is thus we cannot tell how far from the end any byte of data might be. We must buffer the data.

Read the input in to a buffer that's as least as large as the part of the input we want to exclude. When the buffer fills, then we know that each subsequent input read should pair with an output write. This is the general solution to an arbitrarily large input of unknown size. The actual coreutils implementation follows this strategy as well as a (possibly) better-performaning solution involving double-buffered windows.

Failure cases:

  • Unable to fstat() input file
  • Unable to seek an input file
  • Failure to read an input file
  • Unexpected end of file
  • Failure to write to output
  • Trying to process and unreasonably large file
  • Unable to open/close input

All failures at this stage output an error message to STDERR and return without displaying usage help


[Back to Project Main Page]