Decoded: sum (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [No POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of sum command (coreutils)

Summary

sum - print checksum and block counts

[Source] [Code Walkthrough]

Lines of code: 274
Principal syscall: write() via printf()
Support syscalls: None
Options: 5 (2 short, 3 long)

Descended from V3 UNIX (1973) through System V sum -r (1983) and 4.3BSD sum (1986)
Added to Textutils in November 1992 [First version]
Number of revisions: 120 [Code Evolution]

The sum utility is deprecated in favor of the standardized cksum utility which exhibits better cryptographic properties. sum is maintained for compatibility with scripts for legacy BSD and System V machines

Helpers:
  • bsd_sum_file() - BSD checksum operation (1K blocks)
  • sysv_sum_file() - SysV checksum operation (512-byte blocks)
External non-standard helpers:
  • die() - Exit with mandatory non-zero error and message to stderr
  • error() - Outputs error message to standard error with possible process termination
  • human_readable() - Applies a well-known basis to a raw integer
  • safe_read() - Reads with retry on interrupt

Setup

The sum utility performs limited setup in main(), declaring the following local variables:

  • files_given - The number of files to check
  • ok - The utility return status
  • optc - The next command line option to parse
  • sum_func - Function pointer for the sum function to use (BSD or SysV)

Parsing

Parsing sum only checks for two options beyond the default help and version. Both switces are used to select the sum function, either the BSD or SysV compatible variants.

Parsing failures

Parsing may fail if the user provides an unknown option. The result is a short error message followed by the usage instructions.


Execution

Execution is simple: Loop through each input file (or stdin), compute the sum and count the block size. Print the results.

To make this more interesting, let's tear apart the two checkcum functions.

BSD sum

The BSD checksum adds together the 'value' of all the bytes while right rotating the bits between add operations:

while ((ch = getc (fp)) != EOF)
{
  total_bytes++;
  checksum = (checksum >> 1) + ((checksum & 1) << 15);
  checksum += ch;
  checksum &= 0xffff;
}
  • First, we're counting the bytes read as total_bytes.
  • Next we rotate the bits right by shifting all bits right and masking the LSB to the MSB position
  • Add the new character to the sum
  • Mask the lower 16-bits to keep a valid range

System V sum

The System V checksum also relies on adding bytes, but without the bit rotation. The following snippet has been edited for length (original source)

while (1)
{
  size_t bytes_read = safe_read (fd, buf, sizeof buf);

  for (size_t i = 0; i < bytes_read; i++)
    s += buf[i];
	
  total_bytes += bytes_read;
}

r = (s & 0xffff) + ((s & 0xffffffff) >> 16);
checksum = (r & 0xffff) + (r >> 16);
  • We read in a block of 8192 bytes (sizeof buf) in to buf.
  • Then look through the buffer adding the intermediate values to s
  • Track the number of bytes read as total_bytes
  • Finally, we reduce the final checksum from 4 bytes to 2 bytes in 2 steps
    • Add the high-order to the low-order bytes. Expect some 'overflow' in to the high order result
    • Repeat it again, this time without possible overflow, leaving a 16-bit result

For a more advanced checksum, refer to the cksum utility

Failure case:

  • Unable to open, read, or close an input file

[Back to Project Main Page]