Decoded: cksum (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of cksum command (coreutils)

Summary

cksum - print CRC checksum and byte counts

[Source] [Code Walkthrough]

Lines of code: 319
Principal syscall: fread()
Support syscalls: None
Options: 2 (0 short, 2 long - default help/version)

The GNU cksum be the earliest implementation. (?)
Added to Textutils in November 1992 [First version]
Number of revisions: 119 [Code Evolution]

The cksum has two entry points (two main() functions) with two separate execution paths. One is for normal operation, and the other reconstructs the CRC table. Using the latter entry point requires recompiling the utility with the -DCRCTAB option. We will focus primarily on the normal use.

Helpers:
  • cksum() - The core cksum computation for a single input file
  • crc_remainder() - Computes the remainder for a single message (CRC table mode)
  • fill_r() - Fills the first 8 CRC table entries (CRC table mode)
External non-standard helpers:
  • die() - Exit with mandatory non-zero error and message to stderr
  • error() - Outputs error message to standard error with possible process termination

Setup

The key part of the cksum setup includes the pre-computed polynomial lookup table, crctab[256]. This table is the provided by the POSIX standard (adopted as-is from the ISO CRC-32 standard). If you're interested in the mathematics behind the computations, check out Ross William's guide

In main(), we define a couple locals

  • i - Generic iterator over input files to process (argv)
  • ok - The utility return status

Parsing

Parsing only checks for the default help and version options. Any other attempts result in a short error message followed by the usage instructions.


Execution

Conceptually, this standardized checksum operates the same as the previous version: Loop through each input file (or stdin), compute the sum and count the bytes size. Print the results.

The key difference between cksum and sum is the checksum algorithm. The CRC-32 checksum is more robust in detecting differences in file changes.

CRC-32 checksum

Here is the underlying code, slightly condensed for space (original source)

while ((bytes_read = fread (buf, 1, BUFLEN, fp)) > 0)
{
  unsigned char *cp = buf;

  length += bytes_read;
  
  while (bytes_read--)
    crc = (crc << 8) ^ crctab[((crc >> 24) ^ *cp++) & 0xFF];
}

for (; length; length >>= 8)
    crc = (crc << 8) ^ crctab[((crc >> 24) ^ length) & 0xFF];

crc = ~crc & 0xFFFFFFFF;
  • Read in 64kb from input
  • Point cp to the beginning of the buffer
  • For each byte in the buffer:
    • Shift crc left one byte, read in a new XOR'd byte from the CRC table indexed by the XOR of the next input and the byte rotated out.
    • See Ross (section 10) for a very detailed description of this operation
  • Handle zero-byte sequence at the end of the buffer (see same Ross note).
  • Complement the bit sequence for the final CRC

Failure case:

  • Unable to open, read, or close an input file
  • Input file is too long

[Back to Project Main Page]