How to list directories in C recursively (UNIX)
In UNIX systems (Mac, Linux, WSL), it’s easier to list directories with the C programming language due to how the file system is structured. A directory is a file that contains information about its contents and they also have an absolute path to distinguish them.
This task is done with low level syscalls
such as open
and close
, but we’re going to use a header called dirent.h
, which contains convenience functions and struct
s to make this process easier.
If you just want the code right away I saved it this Gist but if you’re reading this to actually learn how it’s done, then stick with me.
Table of Contents
- Preparing the environment
- Listing directory entries
- Making it recursive
- A nicer formatting
- Improvement opportunities
- Homework
Preparing the environment
We really don’t need much for this, no need for any fancy build tool or CMakeLists.txt file. First let’s start by creating a C file in any directory; for testing purposes, preferably in a directory that also contains subdirectories and many other files so we can see the program in action right away.
Let’s imagine our current working directory looks like this (I ran tree -i -f
on my terminal, if you don’t have it can to install it):
.
./file1.txt
./file2.txt
./subdir1
./subdir1/report.csv
./subdir1/subsubdir1
./subdir1/subsubdir1/logo.png
./subdir1/subsubdir1/photo.jpg
./subdir2
./subdir2/one.txt
./subdir2/two.txt
./tree.c
./tree-tutorial
3 directories, 10 files
Notice ./tree-tutorial
, that’s the name we’ll be giving the executable once compiled, and running it will yield this output:
.
|- tree.c
|- tree-tutorial
|- ./subdir1
| |- report.csv
| |- ./subdir1/subsubdir1
| | |- logo.png
| | |- photo.jpg
|- file1.txt
|- file2.txt
|- ./subdir2
| |- two.txt
| |- one.txt
Very similar to what we’d get with the tree
command utility.
The goal is indeed to build a very minimalistic version of that.
But first, let’s start small by creating a tree.c
file and filling it with some includes and declarations:
#include <dirent.h>
#include <stdio.h>
#include <string.h>
#define CURRENT_DIR "."
#define PARENT_DIR ".."
int main(void) {
return 0;
}
The headers that need to be included are:
dirent.h
: Contains functions for opening, reading, and closing directories; it also contains theDIR
stream object and thedirent
structure that contains information about each entry in a directory (we’re only concerned with their type and name).stdio.h
: For printing output to the screen andsnprintf
, with which we’ll assemble the full path of subdirectories without the risk of buffer overflows.string.h
: From here we just needstrcmp
to compare strings, andstrlen
to retrieve their length.
Listing directory entries
Starting small, here’s the simplest code possible to print a list (similar to ls
but with each entry in its own line) of all the entries in the current directory where the executable is being ran.
// ... #includes and #defines
int main(void) {
DIR *directory;
if ((directory = opendir(CURRENT_DIR)) == NULL) {
fprintf(stderr, "error opening %s\n", CURRENT_DIR);
return 1;
}
const struct dirent *entry;
while ((entry = readdir(directory)) != NULL) {
puts(entry->d_name);
}
if (closedir(directory) == -1) {
fprintf(stderr, "listdir: error closing %s\n", CURRENT_DIR);
return 2;
}
return 0;
}
The first 5 lines inside the main
function open the current directory (through opendir
passing it the name of the directory) and creates a pointer to the stream object of type DIR
in a variable called directory
.
This variable is used later to read from the directory and lastly, to close it.
Just like how we can read file streams line by line until EOF
is returned, reading the entries of a directory stream is done one by one with readdir
until NULL
is returned.
Each call to readdir
(it accepts the DIR
pointer as its only argument) is supposed to return a pointer to the next entry in the directory, which is then assigned to the entry
variable (declared as a pointer to a struct dirent
, aka, directory entry). Inside the loop we print each entry by accessing the d_name
member and printing it in its own line with puts
(same as printf("%s\n", char *)
).
Both opendir
and closedir
can fail, that’s why we handle such cases by checking their respective return value (NULL
for opendir
, and -1
for closedir
).
Running the first iteration
Our goal hasn’t yet been reached but we have something to work with now.
Let’s compile the tree.c
file with your compiler of choice (could be cc
, gcc
, clang
or something else).
Mine is gcc
so let’s do gcc -o tree-tutorial tree.c
;
this will produce a compiled executable file called tree-tutorial
that we can run by typing ./tree-tutorial
into the terminal prompt.
Running the code we have now will yield the following output:
tree.c
subdir1
.
tree2.c
..
file1.txt
tree-tutorial
file2.txt
subdir2
As you can see, it prints entries in no particular order.
It also prints everything just the same regardless of type (symbolic link, directory, binary file, etc).
Notice how there are two very special entries: .
(current directory), and ..
(parent directory).
Making it recursive
If we wish to make a recursive version that walks inside every subdirectory, ignoring the current and parent directories becomes very important.
Another thing we have to look out for is absolute directory paths;
if we call opendir
with "subsubdir1"
, it will not work because there is no such directory in the current working directory (where the executable is being ran)…
it clearly needs to be called with "./subdir/subsubdir1"
instead.
Here’s how we refactor tree.c
to allow for recursive directory walking:
// .. #includes and #defines
#define MAX_PATH 1024
void listdir(char *dirname);
int main(void) {
listdir(CURRENT_DIR);
return 0;
}
void listdir(char *dirname) {
DIR *directory;
if ((directory = opendir(dirname)) == NULL) {
fprintf(stderr, "listdir: error opening %s\n", dirname);
return;
}
const struct dirent *entry;
while ((entry = readdir(directory)) != NULL) {
if (strcmp(entry->d_name, CURRENT_DIR) == 0 ||
strcmp(entry->d_name, PARENT_DIR) == 0)
continue;
if (entry->d_type == DT_DIR) {
char path[MAX_PATH];
snprintf(path, MAX_PATH, "%s/%s", dirname, entry->d_name);
printf("%s [DIR]\n", path);
listdir(path);
} else {
puts(entry->d_name);
}
}
if (closedir(directory) == -1)
fprintf(stderr, "listdir: error closing %s\n", dirname);
}
Once we compile it and run it we get:
tree.c
./subdir1 [DIR]
report.csv
./subdir1/subsubdir1 [DIR]
logo.png
photo.jpg
file1.txt
tree-tutorial
file2.txt
./subdir2 [DIR]
two.txt
one.txt
Just because it looks flat doesn’t mean it’s not recursive.
For testing purposes I added a little [DIR]
suffix to each directory entry’s name so you can see wehat’s going on.
Every time the function detects that an entry is a directory (by checking the d_type
member against the DT_DIR
enum constant), it reserves enough space for an absolute path string of maximum 1024
characters (including null terminator);
then, through snprintf
, we write the concatenation of the directory name in the dirname
parameter (initially "."
) and the name of the current entry (obtained inside the loop’s condition with readdir
).
This means that for level-1 subdirectories, path
will hold a string like "./subdir1"
, which will in turn be sent to the recursive call and then contatenated again to make a new path
string like "./subdir1/subsubdir1"
(a level-2 subdirectory); just to name an example.
Lastly, see that big if
statement right at the start of the loop’s body?
That’s just how we check to see if the current entry’s name happens to be "."
or ".."
, names that I already explained why should be skipped (you’d just get a stack overflow).
A nicer formatting
Even though the application works perfectly now, it’s still far from looking like an actual directory tree; it looks quite flat.
So what do we need to fix that? Indentation: Subdirectories (and their corresponding entries) must be properly aligned according to their depth level. We achieve this by repeating a string "| "
n
times where n
is the current depth level (we start from 0
).
For this to work we need to update the listdir
function signature (both in the declaration and the definition) to accept an extra parameter called level
of type int
.
We also need to move the printing of the current dirname
value (the parameter that first gets "."
and then, on recursive calls, a full path) to outside of the loop, right before it.
Lastly, we need the function that will print a string n
times.
Let’s call it printn
(I really wanted to call it print_times
).
// ... #includes and #defines
void listdir(char *dirname, int level);
void printn(char *s, int n);
int main(void) {
listdir(CURRENT_DIR, 0);
return 0;
}
void listdir(char *dirname, const int level) {
// .. opendir
printn("| ", level - 1);
if (level > 0) printf("|- ");
puts(dirname);
const struct dirent *entry;
while ((entry = readdir(directory)) != NULL) {
if (strcmp(entry->d_name, CURRENT_DIR) == 0 ||
strcmp(entry->d_name, PARENT_DIR) == 0)
continue;
if (entry->d_type == DT_DIR) {
char path[MAX_PATH];
snprintf(path, MAX_PATH, "%s/%s", dirname, entry->d_name);
listdir(path, level + 1);
} else {
printn("| ", level);
printf("|- %s\n", entry->d_name);
}
}
// ... closedir
}
void printn(char *s, const int n) {
for (int i = 0; i < n; i++)
printf("%s", s);
}
The printn
function is really not that hard to comprehend, it’s just a for loop, could even be inlined or macro’d honestly.
What we really need to pay attention here is the fact that now the directory path is printed outside the loop, and it only prints the padding for depth levels greater than or equal to 2
.
Additionally, we only want to print the |-
string for subdirectories, not the current directory.
Of course we also need to update the way we print non-directory entries by also calling printn
to add the appropriate padding according to the current depth level.
The recursive call needs to account for the 2nd parameter;
this means, incrementing the level by 1 in listdir(path, level + 1);
.
Improvement opportunities
The code is now complete, what you just saw was the final version. however, it doesn’t mean this is where the learning ends for you!
First we could ensure that there is absolutely 0 chance for a buffer overflow (even though snprintf
is already pretty safe) and on top of that, signal an error when a path name is too long:
if (strlen(dirname) + strlen(entry->d_name) + 2 > MAX_PATH) {
fprintf(stderr, "listdir: path name too long\n");
return;
}
This snippet can go right before the call to snprintf
;
it adds the lengths of the current path, the directory (entry) name, '/'
, and the null terminator (hence the + 2
), then compares it against MAX_PATH
.
What if we wanted handle long absolute paths?
Easily achievable with a dynamically allocated array (malloc(max_path)
), where max_path
replaces MAX_PATH
as a static (file-private) external int
variable that changes value (could be with max_path *= 2
) every time the current directory entry’s path’s length exceeds it.
Obviously we need to free(path)
after the recursive call, and change path
to be a char *
pointer rather than an array.
Going even deeper, why if instead of allocating a new block of memory everytime we go one level deeper, we move path
to the outside.
This is very complex, requires some pointer arithmetics to keep track of where we are but if memory is a big concern for you, give it a try.
Homework
This is it, you reached the end, all there is left for you to do is add more functionality by doing your own research:
- Add some command line arguments to enable options such as
-L {{NUM}}
(specify maximum depth),-i
(remove indentation lines), or-s
(show the size of entries). - Try to colorize the directory names, and add diferent emojis before the entry names, corresponding to their respective
d_type
. - Instead of showing the full path for the subdirectories, show just the subdirectory name; the indentation is sometimes enough.
- Show some data about each file (creation date, size, etc) using the
struct stat
data structure and thefstat
function from theunistd.h
andsys/stat.h
headers. - Sort the entries by any criteria. You may need to store the entries inside a recursive data structure before attempting such thing.
If you find any mistake in the code don’t forget to hit me up.