Decoded: md5sum (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [No POSIX requirement] [Linux man] [No FreeBSD requirement]

Logical flow of md5sum command (coreutils)

Summary

md5sum - print or check md5 digests

[Source] [Code Walkthrough]

Lines of code: 1111
Principal syscall: None
Support syscalls: open(), close(), fadvise()
Options: 18 (5 short, 13 long)

This utility originated with GNU as secure hash usage grew in the 1990s
Added to Textutils in June 1995 [First version]
Number of revisions: 219

The md5sum utility handles user requirements and organizes input files before calling to gnulib to perform the actual md5 computation as per RFC 1321. Note that this utility wraps several possible algorithms (BLAKE2, SHA-1, SHA-256, SHA-384 and SHA-512) although we're only concerned with md5

Helpers:
  • bsd_split_3() - Splits an input in to 2 parts (digest and escaped file name)
  • digest_check() - Opens a list of files/digests and verifies matches
  • digest_file() - Prepares an input file for digest computation via gnulib
  • filename_unescape() - Converts escaped file names to actual characters (sed: s/\\n/\n/g;s/\\\\/\\/g)
  • hex_digits() - Tests that an input string is a valid hex sequence
  • print_filename() - Prints a file name either escaped or unescaped
  • split_3() - Splits an input string in to the three md5 components (hash, format, file name)
External non-standard helpers:
  • die() - Exit with mandatory non-zero error and message to stderr
  • DIGEST_STREAM() - Macro function for the select algorithm (this case is md5_stream() in gnulib)
  • error() - Outputs error message to standard error with possible process termination

Setup

md5sum uses several flags and variables as globals, including:

  • bsd_reversed - Flag set if we're using BSD reverse checksums
  • digest_hex_bytes - The size of the output digest (md5 is 16 bytes)
  • have_read_stdin - Flag set if the utility has read input from STDIN
  • ignore_missing - Flag to continue processing on missing files (--ignore-missing)
  • min_digest_line_length - The minimum length of a valid checksum
  • quiet - Flag to prevent display of normal feedback
  • status_only - Flag to prevent display of normal output (--status)
  • strict - Flag to force abort on any abnormalities (--strict)
  • warn - Flag to display warnings

main() introduces a few local variables:

  • *bin_buffer - A memory-aligned pointer to the data buffer
  • bin_buffer_unaligned - The data buffer holding the digest value of the current file
  • binary - Flag set if the file should be read in binary mode (-b)
  • do_check - Flag set if we're verifying a hash list (-c)
  • opt - The character for the next option to process
  • ok - The final return status
  • prefex_tag - Flag set for using BSD style checksums (--tag)

Parsing

Parsing answers the following questions to define the execution parameters

  • Are we dealing with text or binary files?
  • Are we checking md5 checksums or producing them?
  • Should display output be limited in any way?
  • Should checksum mismatches reflect in the exit status?
  • Should output be newline or NUL terminated?

Parsing failures

These failure cases are explicitly checked:

  • Combining --tag and --text modes
  • Verifying checksums while using the --zero, --tag, --binary, or --text options
  • USing --status, --warn, or --strict while producing checksums
  • Unknown option used

User specified parsing failures result in a short error message followed by the usage instructions. Access related parsing errors die with an error message.


Execution

md5sum has two execution strategies: One is the process to generate an md5 digest and the other to verify correctness of a list of files/hashs. Both cases rely on gnulib to generate the digest within the digest_file() function. The processes look like this:

Generate md5 digests:

  • Retrieve the next file name
  • Open file (or STDIN)
  • Call in to gnulib via DIGEST_STREAM(), result md5 digest stored in bin_result
  • Verify no error occurred
  • Close the file
  • Repeat above while there are still files to process

Check md5 digests:

  • Retrieve the list of files/digests to check
  • Open the list file
  • For each file entry:
    • Split file entry in to the 3 expected components (file name, digest, access mode)
    • Pass the file to the digest generator (above process) to generate a comparison
    • Check for failures
    • Compare the generated digest with the provided digest
    • Repeat process for all files
  • Report results

Failure cases:

  • Unable to open or close input files
  • User-provided checksum format mismatches
  • Unable to read from input source
  • md5 digest generator in gnulib fails for any reason
  • Check list could not be parsed
  • Check list checksums do not match

All failures at this stage output an error message to STDERR and return without displaying usage help


[Back to Project Main Page]