How to list directories in C recursively (UNIX)

In UNIX systems (Mac, Linux, WSL), it’s easier to list directories with the C programming language due to how the file system is structured. A directory is a file that contains information about its contents and they also have an absolute path to distinguish them.

This task is done with low level syscalls such as open and close, but we’re going to use a header called dirent.h, which contains convenience functions and structs to make this process easier.

If you just want the code right away I saved it this Gist but if you’re reading this to actually learn how it’s done, then stick with me.

Table of Contents

Preparing the environment

We really don’t need much for this, no need for any fancy build tool or CMakeLists.txt file. First let’s start by creating a C file in any directory; for testing purposes, preferably in a directory that also contains subdirectories and many other files so we can see the program in action right away.

Let’s imagine our current working directory looks like this (I ran tree -i -f on my terminal, if you don’t have it can to install it):

.
./file1.txt
./file2.txt
./subdir1
./subdir1/report.csv
./subdir1/subsubdir1
./subdir1/subsubdir1/logo.png
./subdir1/subsubdir1/photo.jpg
./subdir2
./subdir2/one.txt
./subdir2/two.txt
./tree.c
./tree-tutorial

3 directories, 10 files

Notice ./tree-tutorial, that’s the name we’ll be giving the executable once compiled, and running it will yield this output:

.
|- tree.c
|- tree-tutorial
|- ./subdir1
|  |- report.csv
|  |- ./subdir1/subsubdir1
|  |  |- logo.png
|  |  |- photo.jpg
|- file1.txt
|- file2.txt
|- ./subdir2
|  |- two.txt
|  |- one.txt

Very similar to what we’d get with the tree command utility. The goal is indeed to build a very minimalistic version of that. But first, let’s start small by creating a tree.c file and filling it with some includes and declarations:

#include <dirent.h>
#include <stdio.h>
#include <string.h>

#define CURRENT_DIR "."
#define PARENT_DIR ".."

int main(void) {
  return 0;
}

The headers that need to be included are:

  1. dirent.h: Contains functions for opening, reading, and closing directories; it also contains the DIR stream object and the dirent structure that contains information about each entry in a directory (we’re only concerned with their type and name).
  2. stdio.h: For printing output to the screen and snprintf, with which we’ll assemble the full path of subdirectories without the risk of buffer overflows.
  3. string.h: From here we just need strcmp to compare strings, and strlen to retrieve their length.

Listing directory entries

Starting small, here’s the simplest code possible to print a list (similar to ls but with each entry in its own line) of all the entries in the current directory where the executable is being ran.

// ... #includes and #defines

int main(void) {
  DIR *directory;
  if ((directory = opendir(CURRENT_DIR)) == NULL) {
    fprintf(stderr, "error opening %s\n", CURRENT_DIR);
    return 1;
  }

  const struct dirent *entry;
  while ((entry = readdir(directory)) != NULL) {
    puts(entry->d_name);
  }

  if (closedir(directory) == -1) {
    fprintf(stderr, "listdir: error closing %s\n", CURRENT_DIR);
    return 2;
  }
  
  return 0;
}

The first 5 lines inside the main function open the current directory (through opendir passing it the name of the directory) and creates a pointer to the stream object of type DIR in a variable called directory. This variable is used later to read from the directory and lastly, to close it.

Just like how we can read file streams line by line until EOF is returned, reading the entries of a directory stream is done one by one with readdir until NULL is returned. Each call to readdir (it accepts the DIR pointer as its only argument) is supposed to return a pointer to the next entry in the directory, which is then assigned to the entry variable (declared as a pointer to a struct dirent, aka, directory entry). Inside the loop we print each entry by accessing the d_name member and printing it in its own line with puts (same as printf("%s\n", char *)).

Both opendir and closedir can fail, that’s why we handle such cases by checking their respective return value (NULL for opendir, and -1 for closedir).

Running the first iteration

Our goal hasn’t yet been reached but we have something to work with now. Let’s compile the tree.c file with your compiler of choice (could be cc, gcc, clang or something else). Mine is gcc so let’s do gcc -o tree-tutorial tree.c; this will produce a compiled executable file called tree-tutorial that we can run by typing ./tree-tutorial into the terminal prompt.

Running the code we have now will yield the following output:

tree.c
subdir1
.
tree2.c
..
file1.txt
tree-tutorial
file2.txt
subdir2

As you can see, it prints entries in no particular order. It also prints everything just the same regardless of type (symbolic link, directory, binary file, etc). Notice how there are two very special entries: . (current directory), and .. (parent directory).

Making it recursive

If we wish to make a recursive version that walks inside every subdirectory, ignoring the current and parent directories becomes very important. Another thing we have to look out for is absolute directory paths; if we call opendir with "subsubdir1", it will not work because there is no such directory in the current working directory (where the executable is being ran)… it clearly needs to be called with "./subdir/subsubdir1" instead.

Here’s how we refactor tree.c to allow for recursive directory walking:

// .. #includes and #defines
#define MAX_PATH 1024

void listdir(char *dirname);

int main(void) {
  listdir(CURRENT_DIR);

  return 0;
}

void listdir(char *dirname) {
  DIR *directory;
  if ((directory = opendir(dirname)) == NULL) {
    fprintf(stderr, "listdir: error opening %s\n", dirname);
    return;
  }

  const struct dirent *entry;
  while ((entry = readdir(directory)) != NULL) {
    if (strcmp(entry->d_name, CURRENT_DIR) == 0 ||
        strcmp(entry->d_name, PARENT_DIR) == 0)
      continue;

    if (entry->d_type == DT_DIR) {
      char path[MAX_PATH];

      snprintf(path, MAX_PATH, "%s/%s", dirname, entry->d_name);
      printf("%s [DIR]\n", path);
      listdir(path);
    } else {
      puts(entry->d_name);
    }
  }

  if (closedir(directory) == -1)
    fprintf(stderr, "listdir: error closing %s\n", dirname);
}

Once we compile it and run it we get:

tree.c
./subdir1 [DIR]
report.csv
./subdir1/subsubdir1 [DIR]
logo.png
photo.jpg
file1.txt
tree-tutorial
file2.txt
./subdir2 [DIR]
two.txt
one.txt

Just because it looks flat doesn’t mean it’s not recursive. For testing purposes I added a little [DIR] suffix to each directory entry’s name so you can see wehat’s going on. Every time the function detects that an entry is a directory (by checking the d_type member against the DT_DIR enum constant), it reserves enough space for an absolute path string of maximum 1024 characters (including null terminator); then, through snprintf, we write the concatenation of the directory name in the dirname parameter (initially ".") and the name of the current entry (obtained inside the loop’s condition with readdir).

This means that for level-1 subdirectories, path will hold a string like "./subdir1", which will in turn be sent to the recursive call and then contatenated again to make a new path string like "./subdir1/subsubdir1" (a level-2 subdirectory); just to name an example.

Lastly, see that big if statement right at the start of the loop’s body? That’s just how we check to see if the current entry’s name happens to be "." or "..", names that I already explained why should be skipped (you’d just get a stack overflow).

A nicer formatting

Even though the application works perfectly now, it’s still far from looking like an actual directory tree; it looks quite flat. So what do we need to fix that? Indentation: Subdirectories (and their corresponding entries) must be properly aligned according to their depth level. We achieve this by repeating a string "| " n times where n is the current depth level (we start from 0).

For this to work we need to update the listdir function signature (both in the declaration and the definition) to accept an extra parameter called level of type int. We also need to move the printing of the current dirname value (the parameter that first gets "." and then, on recursive calls, a full path) to outside of the loop, right before it.

Lastly, we need the function that will print a string n times. Let’s call it printn (I really wanted to call it print_times).

// ... #includes and #defines

void listdir(char *dirname, int level);
void printn(char *s, int n);

int main(void) {
  listdir(CURRENT_DIR, 0);

  return 0;
}

void listdir(char *dirname, const int level) {
  // .. opendir

  printn("|  ", level - 1);
  if (level > 0) printf("|- ");
  puts(dirname);

  const struct dirent *entry;
  while ((entry = readdir(directory)) != NULL) {
    if (strcmp(entry->d_name, CURRENT_DIR) == 0 ||
        strcmp(entry->d_name, PARENT_DIR) == 0)
      continue;

    if (entry->d_type == DT_DIR) {
      char path[MAX_PATH];

      snprintf(path, MAX_PATH, "%s/%s", dirname, entry->d_name);
      listdir(path, level + 1);
    } else {
      printn("|  ", level);
      printf("|- %s\n", entry->d_name);
    }
  }

  // ... closedir
}

void printn(char *s, const int n) {
  for (int i = 0; i < n; i++)
    printf("%s", s);
}

The printn function is really not that hard to comprehend, it’s just a for loop, could even be inlined or macro’d honestly. What we really need to pay attention here is the fact that now the directory path is printed outside the loop, and it only prints the padding for depth levels greater than or equal to 2. Additionally, we only want to print the |- string for subdirectories, not the current directory.

Of course we also need to update the way we print non-directory entries by also calling printn to add the appropriate padding according to the current depth level. The recursive call needs to account for the 2nd parameter; this means, incrementing the level by 1 in listdir(path, level + 1);.

Improvement opportunities

The code is now complete, what you just saw was the final version. however, it doesn’t mean this is where the learning ends for you!

First we could ensure that there is absolutely 0 chance for a buffer overflow (even though snprintf is already pretty safe) and on top of that, signal an error when a path name is too long:

if (strlen(dirname) + strlen(entry->d_name) + 2 > MAX_PATH) {
  fprintf(stderr, "listdir: path name too long\n");
  return;
}

This snippet can go right before the call to snprintf; it adds the lengths of the current path, the directory (entry) name, '/', and the null terminator (hence the + 2), then compares it against MAX_PATH.

What if we wanted handle long absolute paths? Easily achievable with a dynamically allocated array (malloc(max_path)), where max_path replaces MAX_PATH as a static (file-private) external int variable that changes value (could be with max_path *= 2) every time the current directory entry’s path’s length exceeds it. Obviously we need to free(path) after the recursive call, and change path to be a char * pointer rather than an array.

Going even deeper, why if instead of allocating a new block of memory everytime we go one level deeper, we move path to the outside. This is very complex, requires some pointer arithmetics to keep track of where we are but if memory is a big concern for you, give it a try.

Homework

This is it, you reached the end, all there is left for you to do is add more functionality by doing your own research:

  • Add some command line arguments to enable options such as -L {{NUM}} (specify maximum depth), -i (remove indentation lines), or -s (show the size of entries).
  • Try to colorize the directory names, and add diferent emojis before the entry names, corresponding to their respective d_type.
  • Instead of showing the full path for the subdirectories, show just the subdirectory name; the indentation is sometimes enough.
  • Show some data about each file (creation date, size, etc) using the struct stat data structure and the fstat function from the unistd.h and sys/stat.h headers.
  • Sort the entries by any criteria. You may need to store the entries inside a recursive data structure before attempting such thing.

If you find any mistake in the code don’t forget to hit me up.

Subscribe to my feed

All the latest posts directly in your RSS reader. By adding the RSS feed link to your favourite reader you can avoid having to visit this site to get updates whenever I post something new. Try Feedly, I recommend it.

Subscribe