Decoded: tsort (coreutils) – MaiZure's Projects

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of tsort command (coreutils)

Summary

tsort - topological sort

[Source] [Code Walkthrough]

Lines of code: 574
Principal syscalls: write() via puts()
Support syscalls: open(), close(), fadvise()
Options: 2 (--help and --version)

Descended from tsort introduced in Version 7 UNIX (1979)
Added to Textutils in January 1999 [First version]
Number of revisions: 93 [Code Evolution]

tsort is an ancient utility with hand-crafted procedures using custom data structures. It hasn't seen a functional change in a decade(s?) and thus uses virtually no outside assistance from gnulib.

Helpers:

count_items() - Tree function to Count the number of strings
detect_loops() - Checks for cycles in successor relationships
new_item() - Creates a new item structure
record_relation() - Creates a new relationship between items
recurse_tree() - Performs a given recursive operation on item sub-tree
scan_zeros() - Tree function to add an item to global zeros list
search_item() - Binary tree search for a string in a tree of items
tsort() - Performs a topological sort on a single input file
walk_tree() - Moves to a tree node (right node)

External non-standard helpers:

die() - Exit with mandatory non-zero error and message to stderr
error() - Outputs error message to standard error with possible process termination

Setup

tsort uses two data structures to organize input and build the output:

struct item - A node in a binary search tree with parameters including value, left and right child nodes, balance parameter, and a count of successors
struct successor - A list of item node relationships

The utility also uses several global pointers to retain state:

*head - The beginning of the output list
*loop - The loop test point
n_strings - The number of strings to sort
*zeros - The end of a list of nodes with no ancestors (start node candidates)

The important local variables to know are actually declared in tsort() rather than main():

is_stdin - Flag if we're reading input from STDIN
*j - one node in a relation
*k - The other node in a relation
ok - The return status of the utility
*root - The top of our node tree
tokenbuffer - A buffer to hold input relations

Parsing

Parsing tsort is different from most utilities in that there are only the default options to consider (help and version).

The utility does parse relationships input from either a file or STDIN, but I consider those part of the execution phase

Using any other option or providing no input data results in a short error message followed by the usage instructions.

Execution

The general strategy constructs a tree from the input data and repeatdly walks the tree collecting data and making decisions. This is much like the early stages of a compiler working on an abstract syntax tree.

The process looks like this:

Initialize a tokenbuffer to for tokenizing input data
Read each line of input and construct an item tree node
Walk the tree to count the number of strings (the amount of work to do)
Walk the tree again to find unconstrained nodes (no ancestors). The first found will be the head of the tree and candidate to output
Print the head node, remove it, update tree structure, and repeat previous step tree walk
Detect cycles -- any remaining nodes in the tree are part of a cycle, find it and print a solution to break the cycle
Close the input file

Failure cases:

Unable to open or close input stream
Dangling constraints / mismatched relations
Loop detected in node relationships (no topological ordering is possible)

Failures at this stage output an error message to STDERR unless quiet mode was enabled

[Back to Project Main Page]