Decoded: ls (coreutils)

[Back to Project Main Page]

Note: This page explores the design of command-line utilities. It is not a user guide.
[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]

Logical flow of ls command (coreutils)

Summary

ls - list directory contents

[Source] [Code Walkthrough]

Lines of code: 5309
Principal syscalls: opendir(), readdir()
Support syscall: closedir(), fstat()
Options: 83 (40 short, 43 long)

Spiritually linked with LISTF from CTSS (1963) -- designed in 1962
Added to Shellutils in October 1992 [First version]
Number of revisions: 653

The ls command is the most complex (read: over-engineered) utility in coreutils. Rich features, such as format control, color support, pattern filtering, and sorting intertwine with signal handlers, hash tables, obstacks, interface modes, caching, and access control to deliver a versatile tool.


Setup

The first 1000 lines of ls defines the globals, structs, and enums used throughout the code. Some key players are:

Structs:

  • struct bin_str - Holds data for a color indicator.
  • struct color_ext_type - An extended color style definition
  • struct column_info - Metadata for columns
  • struct fileinfo - Holds data for a single file entry.
  • struct ignore_pattern - The first of a list of file name matches to ignore
  • struct pending - The first of a list of pending directories to process

Enums:

  • enum acl_type - The 3 ACL rules
  • enum Dereference_symlink - The four strategies to handle symlinks
  • enum filetype - The 10 types of files
  • enum format - Five types of format rules
  • enum ignore_mode - The three ways to ignore special files
  • enum indicator_style - The four color indicator styles
  • enum sort_type - The seven sorting categories
  • enum time_style - Four timestamp styles
  • enum time_types - Three time types (modified, create, access)

Entry

ls has three entry points depending if the user invokes via ls, dir, or vdir. Each of these files sets ls_mode to value used later to manage default options

In all cases, ls:

  • Assumes we always print the directory name
  • Clears all pending directories
  • Gets the current time for upcoming consistency checks
  • Initializes flags to a default value
  • Sets a default quoting style
  • Pulls column and tab size from the environment

Parsing

The largest helper function is decode_switches() at 579 lines. The function goes through each of the command line arguments to set flags used later during execution. Parsing switches answers these general questions:

  • What is the overall output format?
  • What are the specific formats? (time, inode, blocks, etc)
  • Are colors used?
  • How is the output sorted and ordered?
  • Is the search recursive?
  • Does the search follow links?

Switch decoding finishes by checking for consistency among options. Note that many options simply override others rather than failing (i.e. -l overrides -1).

Parsing failures

These failure cases are explicitly checked:

  • An invalid line or tab size
  • Invalid time style format

Extra Comments

Colors

Color sequences generally follow ANSI in-band escape formats, which include a LEFT, CODING, and RIGHT sequence to trigger output colors changes in a supported terminal. The LS_COLORS environment variable defines a set of color codes for each file type. Just in case LS_COLORS isn't defined, this ls implementation also defines a set. Colors are changed during the table printing functions, usually as a call to set_normal_color(), restore_default_color(), or directly with put_indicator().

During initialization, ls parses the LS_COLORS variable. A typical entry looks like di=1:, which says that directories will be bolded. This label has four components: A label, an '=', an encoding, and a separator. parse_ls_color() retrieves the string of color encodings and uses a state machine to read and handle an entry a character at a time:

parse_ls_color state machine

State Action Next State
PS_START Check for the first label letter, separator, or * for extentions PS_2, PS_4, PS_FAIL
PS_2 Get the second character of a label PS_3
PS_3 Check '=' and process subsequent encoding PS_START, PS_FAIL
PS_4 Check '=' after extension and process encoding PS_START, PS_FAIL
PS_DONE Parse succeeded Terminal State
PS_FAIL Something went wrong Terminal state

Execution

After parsing, the next step sets up the colors via the environment variables LS_COLORS or COLORTERM in that order.

We then set search behaviors. This means finalizing symlink handling and preparing for a recusrive search (if used). Some smaller details include current time zone, sort format, dired mode stacks, and hyperlinks.

The final step before processing is to set up the table data structure that we populate during file system traversal. The involves allocating the table and ensuring the sort order vector is clear. The initial directory (either user specified or cwd) is added to the processing queue and we're ready to go.

Processing directories

If only a single file is specified, that file is added to the file table and printed with print_current_files(). However, the typical case is that an entire directory is queued. This target directory (and eventually children) live in the pending_dirs list. These list members are checked for loops and then processed one at a time via print_dir(). Preparing directory output looks like this:

  • Attempt to open directory with the opendir() syscall
  • Check if the directory has been visited to avoid cycles
  • Clear the file table to prep for the new directory
  • Print the directory name, if appropriate
  • Repeatedly make readdir() syscall until all files processed
    • Check that file type isn't ignored
    • Add the file to the table (more below)
    • Handle any signals that may have arrived
  • Close the current directory with closedir() syscall
  • Sort the file table
  • Revisit the table entries and add directories to the pending list
  • Print the output of the file table (the cwd)

Printing output

Getting the files listed on the screen is the final step. This is more complicated than a call to printf() due to the variety possible output formats including a list of columns, a single list, many wrapped horizonal entries, and more. The most significant printing function is print_long_format(). In general, we walk file table and print the entries, which are already processed during add (gobble_file() - See below for details). The most in-depth printing call sequence worth exploring is:

  • print_current_files()
    • print_long_format()
      • print_name_with_quoting()
        • quote_name() - ends with fwrite() to STDOUT
          • print_color_indicator()
            • put_indicator() - The fwrite() here comes first

There are several ways that file processing could fail:

Failure cases:

  • Unable to open directory (doesn't exist, no permissions, etc)
  • Unable to stat file (doesn't exist, no permissions, bad path, etc)
  • Improper hyperlink formatting
  • Error reading directory
  • Unable to close directory
  • Unable to read a followed symlink

Extra comments

Adding a file to the file table

A file is added with a call to gobble_file(), the 2nd largest helper function in this utility at 332 lines. While printing all the files simply walks the table, outputting all of the lines, most of the work in ls happens as each file is collected and stored. To add a file we perform a subset of the following operations, depending on format required:

  • Ensure that the file table is large enough
  • Set the file reference in the file table
  • Verify quoting requirements
  • Construct the full name of the file
  • stat or lstat (if symlink) the file
  • Check capabilities
  • Check security context and access list
  • Construct the link name (if link)
  • Get widths: owner, group, author, security context, major/minor device, size, and inode
  • Return the total size in blocks

Helpers

The complexity of ls naturally leads to a very long list of helper functions

  • abformat_init() - Initialize the month abbreviation format
  • abmon_init() - Initialize the month abbreviation structure
  • add_ignore_pattern() - Adds a file pattern to the ignore list
  • align_nstrftime() - Returns the size of the time format
  • attach() - Constructs an absolute file name from path and name
  • basename_is_dot_or_dotdot() - Checks if the input name is . or ..
  • calculate_columns() - Returns a column count based on screen size
  • clear_files() - Remove all files from the sorted array
  • cmp_atime() - Compare function for access times for two files
  • cmp_ctime() - Compare function for creation times for two files
  • cmp_extension() - Compare function for two file extensions
  • cmp_mtime() - Compare function for modified times for two files
  • cmp_name() - Compare function for names for two files
  • cmp_size() - Compare function for sizes for two files
  • cmp_version() - Compare function for version for two files
  • decode_switches() - Checks all provided switches and sets flags/globals
  • dev_ino_compare() - Checks if table entries match
  • dev_ino_free() - Frees a table entry
  • dev_ino_hash() - Computes the table position based on inode number
  • dev_ino_pop() - Pop dev_ino from dedicated stack
  • dev_ino_push() - Push dev_ino on to dedicated stack
  • dired_dump_obstack() - Outputs a prefix followed by obstack values
  • errno_unsupported() - Tests if an error value implies unsupported function usage
  • extract_dirs_from_files() - Removed directories from a file listing
  • file_escape() - Enforces RFC3986 contraints on strings
  • file_escape_init() - Init the RFC3986 character map (letters/numbers/some symbols)
  • file_failure() - Outputs a file access failure message
  • file_has_acl_cache() - Caches results of recent file access control attempts
  • file_ignored() - Determines if a file should be ignored
  • first_percent_b() - Finds the first %b time format (abbreviated month)
  • format_group() - Wrapper for printing the group format
  • format_group_width() - Wrapper to get the width of the group format
  • format_inode() - Formats the inode number for display
  • format_user() - Wrapper for printing the user format
  • format_user_or_group() - Handles actual printing of user/group formats
  • format_user_or_group_width() - Returns the width of user or group formats
  • format_user_width() - Wrapper to get the user format width
  • free_ent() - Frees file info for a directory entry
  • free_pending_ent() - Frees a pending directory entry
  • get_color_indicator() - Returns a color string based on file type
  • get_funky_string() - Parses an LS_COLORS string and returns validity
  • get_link_name() - Puts the name of a linked file in to a buffer
  • get_type_indicator() - Gets the file type indicator (for -F)
  • getenv_quoting_style() - Checks environment for quoting style then sets for ls
  • getfilecon_cache() - Checks getfilecon failure cache and returns attribute context size
  • gobble_file() - Adds a file to the file table
  • has_capability() - Checks if the file has capabilities
  • has_capability_cache() - Caches the capability check failure result
  • indent() - Adds indents, using tabs when possible
  • init_column_info() - Allocates memory for files and columns data
  • initialize_ordering_vector() - Initialize sorted_file[] to counting order
  • is_colored() - Checks if the current active color matches the input
  • is_directory() - Checks if a file entry is a directory
  • known_term_type() - Checks if the TERM env is known
  • length_of_file_name_and_frills() - Returns the total length of file and and extras
  • long_time_expected_width() - Returns the number of columns in a timestamp
  • make_link_name() - Returns the path prepended to an input name
  • needs_quoting() - Cehcks if a file name needs quoting
  • parse_ls_color() - Initializes the configured color scheme
  • patterns_match() - Checks a file against all patterns for a match
  • prep_non_filename_text() - Outputs color indicates as needed
  • print_color_indicator() - Attempts to output color indicates and returns success
  • print_current_files() - Display all files in the current file table
  • print_dir() - Output a list of files in a directory
  • print_file_name_and_frills() - Prints file name and extras (inode, block size, etc)
  • print_horizontal() - Output data transposed (for -x)
  • print_long_format() - Outputs extra information per entry (for -l)
  • print_many_per_line() - Outputs multiple entries per row (for -C)
  • print_name_with_quoting() - Outputs a name with quotes
  • print_type_indicator() - Outputs the file type indicator (for -F)
  • print_with_separator() - Outputs filename followed by a separator and space
  • process_signals() - Dispatches waiting signals, if any
  • put_indicator() - Outputs a color indicator
  • queue_directory() - Adds a directory to the pending queue
  • quote_name() - Adds a quote to a name and returns size used (may include a pad)
  • quote_name_buf() - Stores a quoted name in an input buffer
  • quote_name_width() - Returns the size of a quoted name
  • restore_default_color() - Removes coloring scheme
  • set_exit_status() - Updates exit status to reflect other than complete success
  • set_line_length() - Sets the width of a line (for -w)
  • set_normal_color() - Puts indicator sequence for normal (nonfile) text colors
  • sighandler() - Handler for all caught signals except SIGTSTP
  • signal_init() - Initialize signal handlers - wrapper for signal_setup()
  • signal_restore() - Restores default signal handlers - wrapper for signal_setup()
  • signal_setup() - Initialize/Restore signal handlers
  • sort_files() - Sorts the files in the file table using the sort_type
  • stophandler() - Handler for SIGTSTP
  • unsigned_file_size() - Converts a negative size to positive (assume signed wrap)
  • visit_dir() - Active the directory and warn if already active
  • xstrcoll() - Extended strcoll(), which checks errno

[Back to Project Main Page]