Decoded: tsort (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of tsort command (coreutils)

Summary

tsort - topological sort

[Source] [Code Walkthrough]

Lines of code: 574
Principal syscalls: write() via puts()
Support syscalls: open(), close(), fadvise()
Options: 2 (--help and --version)

Descended from tsort introduced in Version 7 UNIX (1979)
Added to Textutils in January 1999 [First version]
Number of revisions: 93 [Code Evolution]

tsort is an ancient utility with hand-crafted procedures using custom data structures. It hasn't seen a functional change in a decade(s?) and thus uses virtually no outside assistance from gnulib.

Helpers:
  • count_items() - Tree function to Count the number of strings
  • detect_loops() - Checks for cycles in successor relationships
  • new_item() - Creates a new item structure
  • record_relation() - Creates a new relationship between items
  • recurse_tree() - Performs a given recursive operation on item sub-tree
  • scan_zeros() - Tree function to add an item to global zeros list
  • search_item() - Binary tree search for a string in a tree of items
  • tsort() - Performs a topological sort on a single input file
  • walk_tree() - Moves to a tree node (right node)
External non-standard helpers:
  • die() - Exit with mandatory non-zero error and message to stderr
  • error() - Outputs error message to standard error with possible process termination

Setup

tsort uses two data structures to organize input and build the output:

  • struct item - A node in a binary search tree with parameters including value, left and right child nodes, balance parameter, and a count of successors
  • struct successor - A list of item node relationships

The utility also uses several global pointers to retain state:

  • *head - The beginning of the output list
  • *loop - The loop test point
  • n_strings - The number of strings to sort
  • *zeros - The end of a list of nodes with no ancestors (start node candidates)

The important local variables to know are actually declared in tsort() rather than main():

  • is_stdin - Flag if we're reading input from STDIN
  • *j - one node in a relation
  • *k - The other node in a relation
  • ok - The return status of the utility
  • *root - The top of our node tree
  • tokenbuffer - A buffer to hold input relations

Parsing

Parsing tsort is different from most utilities in that there are only the default options to consider (help and version).

The utility does parse relationships input from either a file or STDIN, but I consider those part of the execution phase

Using any other option or providing no input data results in a short error message followed by the usage instructions.


Execution

The general strategy constructs a tree from the input data and repeatdly walks the tree collecting data and making decisions. This is much like the early stages of a compiler working on an abstract syntax tree.

The process looks like this:

  • Initialize a tokenbuffer to for tokenizing input data
  • Read each line of input and construct an item tree node
  • Walk the tree to count the number of strings (the amount of work to do)
  • Walk the tree again to find unconstrained nodes (no ancestors). The first found will be the head of the tree and candidate to output
  • Print the head node, remove it, update tree structure, and repeat previous step tree walk
  • Detect cycles -- any remaining nodes in the tree are part of a cycle, find it and print a solution to break the cycle
  • Close the input file

Failure cases:

  • Unable to open or close input stream
  • Dangling constraints / mismatched relations
  • Loop detected in node relationships (no topological ordering is possible)

Failures at this stage output an error message to STDERR unless quiet mode was enabled


[Back to Project Main Page]