Chapter 9 File system tree

This section deviates from our main task, programming, and discusses computer files systems and how to access your files from R. We need it mainly to to access data files, but it also helps to understand better how do knitting and rmarkdown work (see Section 10.4). Understanding the file system also helps you to prepare your code to run on different computers (needed by, e.g., shiny, Section E).

Understanding the file system is also a valuable knowledge in itself, and helps you to avoid many problems later–both with coding and otherwise.

9.1 File system tree and working directory

Before loading our first data files from disk, we discuss what are file system tree and working directory. These two concepts are extremely important for all kinds of text-based navigation–and your code expects the file names to be provided in text. Many common problems that novice programmers encounter when attempting to loading data boil down to misunderstanding these concepts.

9.1.1 File System Tree

Modern computers store its files and folders like a tree. You can imagine folders being branches that may branch further (contain other folders), and files being leaves, the “final ends” of branches. Let us explain this through an example.

Imagine there is a student Yucun. He is taking different courses, and he stores his schoolwork in a folder UW that is inside of his Documents folder.

When look the content of his Documents in the file manager, we can see a picture like shown here. You can see that he has two folders in there, Applications and UW, and a number of files.

He is using the folder UW to store his school stuff. He sorts it by the class he is taking, and hence the content of UW may look like this:

One of the classes he is taking info201. Inside the info201 folder he sorts the files into exercises, homework and stuff; and and there is also a lonely file cheatsheet.pdf.

This is a fairly typical layout of files and folders: there are folders, and folders inside folders, and files inside folders, and so on.

Instead of displaying a single folder’s content at time, we can also draw such a nested structure as a tree. The figure below does just that. To begin with, Documents is just one folder among his other stuff. This is normally called home folder, not “stuff” but we call it “Yucun’s stuff” for now (see more in Section 9.1.3 below). Inside of Documents there are two folders: UW and Applications, for simplicity we do not mark the plethora of files in the latter.

Yucun’s folders displayed as a upside-down tree. He has four main folders: Documents, Pictures, Music and Downloads; those four folders in turn contain other folders and files.

Further, UW contains other folders, including info201 which, in turn, contains exercises, homework, stuff, and cheatsheet.pdf. Obviously, there are more files and folders in his computer, some of which are denoted here as dots ....

This picture reminds an upside-down tree. Root of the tree is marked as “Yucun’s stuff” (more about it below in Sectin 9.1.3).14 It has four branches: Documents, Pictures, Music and Downloads. The Pictures folder contains two pictures, fractal and Ross Lake. We can imagine that pictures, and other files, like cheatsheet.pdf are leaves of the tree as those do not branch any further.

Exercise 9.1 Draw a similar file system tree for your computer. Include in it a) your documents folder; b) your folder where you keep info201 stuff; c) your pictures folder; d) a few images in your picture folder. Include also a few other folders and files. Mark some additional folders and files with dots.

See my version

Exercise 9.2 Draw the tree for your Pictures folder (or another folder where do you keep your pics). Mark at least 5 files on this tree.

See my version

9.1.2 Working directory and relative path

Now when you know what is file system tree, we can discuss how computer programs (including the programs you will write) “see” the file system on your computer.

Namely, programs “think” that they are “located” in a place in the file system tree. This place is called working directory. You may imagine that all open apps are located “somewhere” in this tree. Working directory is important because when a program wants to load files from disk, it always starts from the working directory.

In some sense, the working directory is like the file manager window–if you have it open, it displays the content of a particular folder. We can call that folder “the working directory” for the file manager. In a similar fashion as you can click on files and icons in the file manager, the programs can find files or move to a different folder. But unlike the file manager that can have multiple windows open, the programs have only a single one open at time–the have only one working directory.

For us, perhaps the most important difference between file manager window and the working directory of a program is that the programs rarely display the content of a folder (but some do, e.g. your image viewer may show the content of the image folder).

Another major difference is how the programs access files. They do not click on icons–they use file names. So in order to open a file from inside a program, you need to know its name.

For instance, Yucun may run R in the exercises folder inside info201. That is the working directory for that R app–it “thinks” that it is located in that folder. This is marked with the red “R” inside of exercises.

R running inside exercises

R app “thinks” it is inside of exercises folder on Yucun’s computer.

But if the app wants to access files and folders that are somewhere else in the file system tree, not in its working directory, then it must navigate through the tree to find those files. For instance, if Yucun’s R app (working directory exercises) wants to access the image fractal.png, then it must go (marked with thin red arrows on the figure):

  1. up (into info201)
  2. up (into UW)
  3. up (into Documents)
  4. up (into all the Yucun’s stuff)
  5. into Pictures
  6. grab fractal.png from there.

We can tell computer just to go “up”, as each folder has only a single parent folder. But when we go down, into a subfolder, then we need to give its’ full name like “into Pictures”.

Note that it should not start by moving up to exercises: it already is inside exercises. So it starts by moving up to info201 instead.

Such list of steps resembles navigation directions suggested by navigation apps, such as google maps. On computer, it is knows as relative path. It explains how to get somewhere relative to the current position. In a way we are explaining the program something like “if you start here, then you need to go four blocks north and one block east”. We call it the relative path in long form.

The list above is intuitive, but in the “computer language” we need to write the instructions in a more compact form. This requires two changes in how the directions are written:

  1. replace “up” with two dots “..
  2. adding a slash “/” between the separate instructions instead of line break.

So instead of the google-style list above, your computer wants to see the relative path as

"../../../../Picture/fractal.png"

We call this the relative path in the sort form or in the computer form.15

Exercise 9.3 Now imagine that Yucun runs a java program inside cse142. How can java access the cheatsheet.pdf file in info201? Write out a similar long form list, and short form list as above.

See the solution

Exercise 9.4 Yucun downloaded a file matrix.dat into Downloads. How can his MATLAB code in amath352 access that file?

See the solution

Exercise 9.5

Draw the file system tree of your own computer.
  • Mark there a) the folder where you keep your info201 materials; b) a few images in your Pictures folder.
  • Write the directions for accessing one of the images in your Picture folder from your info201 folder.

See my solution

As computers access files by name, not by icon, it is critical to ensure that the file name, including it’s extension is correct!

Some file managers may not show the file extensions by default. You need to figure out either how to display those, or to use other tools to find full file names. Such other tools include R (Section 9.2.2), RStudio-s tab-completion, or command line (Section B.3.3).

9.1.3 Home folder, file system, and the absolute path

But there is more on the Yucun’s computer than what was shown above. Besides of the user files, all computers contain system files, the apps and data needed to get the computer up and running. Computers may also have more than a single user, and they may have external drives, such as usb disks or network systems connected to them. All those are included in the file system tree, but outside of the “Yucun’s stuff”. Instead, Yucun’s files are just a part of the whole tree. Figure below show the file system tree of Yucun’s computer, beginning from the computer’s root. There are a few slight differences, depending on whether he is using a Mac or a PC. We discuss the mac file system first.

The files and folders in the Yucun’s Mac. All files and drives form a similar three, and Yucun’s home folder, typically a folder named “yucun” or similar, is just one branch of that tree.

If using mac, all Yucun’s stuff is located in a folder called “yucun”. This is called his home folder or home directory, the place for all his documents, pictures, music and other stuff. The images above in Section 9.1.1 depicted just his home folder. The home folder, in turn, is located in the folder “Users”.16 The “Users” folder, in turn is directly in the root folder, here labeled as “root /”. However, its actual name is just “/”, the word root is added here only for clarity.17 Root folder is the “mother of all folders”, the root of the file system. All other folders are either directly or indirectly located inside the root folder. For instance, Yucun’s home folder is first inside “Users” that in turn is inside the root “/”.

The files and folders in the Yucun’s PC. The layout is fairly similar to that of Mac, just root is called “This PC” and the first branch is into drive letters “C:” and “D:”.

If Yucun is using Windows, the tree will look slightly different. First, Windows labels the file system root not as “/” but as “This PC”. And second, before we get into actual folders, we have to walk through the drive letters, such as “C:”. But the idea is similar: Yucun’s home folder is located in the root folder “This PC”, always indirectly through one of the drive letters.

One can write “google-style” directions about how to get to a given folder when starting from the root. For instance, if you want to grab Yucun’s “cheatsheet.pdf”, one needs to (assuming he uses Mac):

  1. Start at root “/”
  2. into “Users”
  3. into “yucun”
  4. into “Documents”
  5. into “UW”
  6. into “info201”
  7. grab “cheatsheet.pdf” from there

Or in a more compact fashion:

"/Users/yucun/Documents/info201/cheatsheet.pdf" 
If he uses Windows, the navigation rules are fairly similar
  1. Start at root “This PC”
  2. into “C:”
  3. into “Users”
  4. into “yucun”
  5. into “Documents”
  6. into “UW”
  7. into “info201”
  8. grab “cheatsheet.pdf” from there

The short form on windows is slightly different–you should not write “This PC” in the path there. This leaves just:

"C:/Users/yucun/Documents/UW/info201/cheatsheet.pdf"

See more in Section 9.2.

Such navigation rules are called absolute path. The advantage of absolute path is that this always tells where the file is located, independent of the current working directory. However, if Yucun decides to rename his “info201” folder, or move it elsewhere, then the absolute path is not valid any more.

So there are two ways to navigate the file system: relative path and absolute path. Relative path starts with the current working directory, and one often has to start by moving “up” on the tree, before descending “down” in other folders. Absolute path starts with the root folder, the topmost folder, and there is no need to go “up” again after descending into a directory.

Exercise 9.6 Yucun again runs a java program inside cse142. How can java access the Ross Lake.jpg file in Pictures using absolute path? Write the absolute path both in long-form, and in short form. Use either mac or PC way, depending on what kind of computer you have.

See the solution

Exercise 9.7 Pick one image in your Pictures folder.

  • Write down the absolute path, as a long form set of directions, for this image.
  • Write the same absolute path in short form.

See the solution

9.1.4 Absolute path in file manager

How can you see the files system outside of your home folder? This may be a bit tricky with file explorer. For instance, the windows file explorer may not show the drive letters and directories by default, but you can still see it if you click on its navigation bar (but the exact behavior depends on the configuration). Mac, however, may even refuse to display files outside of home folder. File managers’ behavior can normally be adjusted through options, but it may be somewhat hard for beginners.

But file managers typically allow you to copy the absolute paths of files and folders.

In the Mac file manager, ⌘ + ⌥ + P lets you to see the path of current directory. If you right click the folder and choose “get info”, then you can copy the path after “Where:”

windows file explorer with the nav bar highlighted

Windows file explorer after a click on the navigation bar. It displays the location (absolute path) of the folder, here “Z:”. Note also it uses backslash \ instead of forward slash /.

In Windows file manager, right clicking the top bar where the file path is and clicking “copy path” allows you to copy the path.

Windows normally uses backslash \, not forward slash / to separate individual moves on the file system tree. This will not work in R without additional adjustments, because R interprets backslashes \ as special string symbols.

See Section 17.1 for more details.

Exercise 9.8 What is the absolute path of your home folder? Draw this as a file system tree.

See the solution

9.1.5 When to use absolute and relative path

Both absolute and relative paths are similar in the sense that both allow you to point to individual files and directories on your computer. The only difference is that the relative path starts its directions from the current working directory while absolute path starts from the root folder.

projects and data in different places in Yucun’s file system tree

Data and project files, located in the different places in Yucun’s file system tree.

Both of these approaches have their merits. Consider Yucun’s computer again, where we now have added a project, Project, inside amath352 folder (in blue); and a dedicated data folder under Documents (green). The project contains some code (code.R) and a data file (data.csv) in a separate data-folder. How can code.R access data.csv inside its data folder using relative path? The directions are easy: descend into data and grab data.csv from there: "data/data.csv".

But what happens if Yucun decides to move his Project into Applications, instead of amath352? The result is shown in gray. How can the new, gray, code.R access the new gray data.csv? The answer is simple: exactly the same way. It still has to descend into data and grab data.csv from there.

This is one of the major advantages of relative path: it does not change if we move our project around! More specifically–only local relative paths, those that refer to files and folders within the project, will remain unchanged. And even more: even if Yucun will move the project over to a different kind of computer, for instance a UNIX server or into a docker container, the relative path will still be the same. This is one of the main reasons why relative paths are so popular.

But what happens if Yucun has data not within the project’s data folder, but in a dedicated place, in the green data folder within Documents? First, how can his original (blue) code.R access that file using relative path? It needs to
  1. up (into amath352)
  2. up (into UW)
  3. up (into Documents)
  4. into data
  5. grab data.csv from there.

However, this is not how the project in the new (gray) location can access the green data.csv. The relative path is broken! If you access files outside of your project, then relative path may not be what you want.

Exercise 9.9

Imagine that Yucun is working on his blue project within amath352. His code.R uses the green data.csv, located in data subfolder within Documents.
  1. He decides to move the project into Applications, marked as gray. Does he have to change his code if he was using absolute path?
  2. He decides to move the project to a different computer. Can he still use the same absolute path? What about relative path?

See the solution

9.1.6 Limitations

File system tree is the main way to think about files and folders on computer. But it is not the best tool for all tasks.

One feature that does not fit well with the tree are recent files. The recent files are recent not because they are located in a special folder, but because they were accessed recently. But in most cases, you do not want to load data in your code because it was recent, but because it is a data file that contains what you need, recent or not. So recent files are not something you normally need when programming!

Another feature that makes the tree somewhat messy are links (shortcuts). These allow a file or folder to be located inside two (or more) other folders. As a result, the tree is not only branching, but sometimes the branches may also merge.

9.2 Accessing files and directories from R

The main reason why we need to understand the concept of file system tree and path is to be able to load data files into R. We look at data–csv files–a little later, here we just display images. But exactly as data, images are also stored in files, and so we need to use similar paths to access those.

9.2.1 R’s working directory

To begin with, all R processes have their working directory–they “think” that they are located somewhere in the file system tree. You can find the name of the working directory as

getwd()  # GET Working Directory
## [1] "/home/siim/tyyq/info201-book"

Exercise 9.10 Is the example here, “/home/siim/tyyq/info201-book”, relative path or absolute path?

See the solution

Note that each instance of R has its own working directory. This means when you run R in multiple windows at the same time, all of these may have different working directories. If the working directory in your RStudio console is /Users/yucun/Desktop, then that does not mean that another window that is compiling your homework, will have the same working directory!

Exercise 9.11 What is the working directory of your R console inside your RStudio?

See the solution

You can change the working directory using setwd(). However, this is rarely needed, as you should start R in the right folder to begin with, and access files using their path.

9.2.2 Accessing files in R

For a refresher: this is how Yucun’s info201 content looks like. See Section 9.1.1.

You can see the files in the current working directory using list.files(). For instance, if Yucun will run R inside of his info201 directory, he would see:

list.files()

## [1] "cheatsheet.pdf" "exercises"      "homework"       "stuff"         

This is the R’s “view” of the same folder. Instead of icons, we see file names. We stress here that R shows the file names in their complete form–including the complete extension (like .pdf) and all eventual spaces and other symbols in the names. The graphical viewers may or may not display the complete names, depending on their configuration. The files may also be displayed in a different order, in the example here, the graphical viewer puts all directories first, but R orders everything alphabetically.

Exercise 9.12

  • Find the working directory of your current R instance (R Console inside RStudio).
  • Print all file names there using list.files().
  • Open the same folder using your graphical file manager.
  • Show that it contains the same files what you saw in R!
  • Does your graphical file manager show the file names in the same form, or do you see simplified versions of the names?

See the solution

After you have figured out what files you have in the working directory, you can easily access those easily. Obviously, what you do with the files will depend on the file type.

TBD: file.info()

For instance, pdf-s and images can be displayed as

library(magick)  # you need to install the package first
cheatsheet <- image_read_pdf("cheatsheet.pdf")
plot(cheatsheet)

(You first need to install the package magick, see Section 3.6).

So in order to access files in your current working directory, you just need the file name. No directions needed here, no folder name or anything else.

Exercise 9.13

  • Install the “magick” package
  • Put an image into your current R working directory
  • Use list.files() to ensure the image is there
  • Use the image_read() and plot() to display the image on screen.

To access files in R:

  • getwd() prints the current working directory
  • list.files() prints the file names in the current working directory
  • file names must be quoted!

But what if Yucun is not interested in the cheatsheet, but wants to access the photos in his Pictures folder instead? This can be done either by using absolute or relative path. Next, we discuss how to do that.

9.2.2.1 Accessing files through relative path

Normally, relative path is what you want to use as this allows you to move your project to a different folder, and collaborate with others more easily. It also indicates the folder layout of your project, it is much harder to understand it from the absolute path.

We should begin by writing the “Directions” to get into the Pictures folder from info201 (see Section 9.1.1):

  1. up (into UW)
  2. up (into Documents)
  3. up (into Yucun’s stuff)
  4. into Pictures

Or in a shorter form “up - up - up - into Pictures”.

But these directions are made for humans, not for the computer. We need to translate it to computer language as this: First, take the short form of the directions. And now:

  • replace up by two dots ..
  • replace the dash “-” with slash / (not backslash \!)
  • ensure that everything is enclosed in quotes.

So the relative path of Pictures from info201 will look like

"../../../Pictures"

The list.files() command we used above accepts a relative path as an argument, so Yucun can check his images as

list.files("../../../Pictures")

## [1] "fractal.png"   "Ross Lake.jpg"

A note about Windows. Mac and linux consistently use slash / as the path separator, but Windows uses backslash \ by default. That will work too, but you need to use double backslashes \\ instead of single one. This is because backslash is a special character inside of character strings, and you need to escape it with another backslash to actually be able to insert one into the string.

In this course we consistently use the forward slash / instead.

A very similar approach also helps to display an image there. For instance, in order to display Ross Lake.jpg, you need directions:

  1. up (into UW)
  2. up (into Documents)
  3. up (into Yucun’s stuff)
  4. into Pictures
  5. grab Ross Lake.jpg
## Error in loadNamespace(x): there is no package called 'magick'
## Error in eval(expr, envir, enclos): object 'rl' not found

Photo of Ross Lake, as plotted in R using the magick library.

And the corresponding short directions are

“up - up - up - into Pictures - grab Ross Lake.jpg”.
In the “computer language” this translates to

"../../../Pictures/Ross Lake.jpg"

Yucun can display it as

pic <- image_read("../../../Pictures/Ross Lake.jpg")
plot(pic)

Exercise 9.14

  1. What is the working directory of your current R instance? (In the Rstudio console.)
  2. Pick and image in your Pictures folder
  3. Draw the file system tree that includes both the current R working directory, and that image (in pictures folder).
  4. Write down the directions to get to the image from R’s working directory.
  5. What is the relative path of the image, from the current R working directory, using R notation?
  6. Use image_read() with relative path, and plot() to show the image on screen!

9.2.2.2 Accessing files through absolute path

Alternatively, we can access the files through absolute path. For a small project that only runs in your computer, the relative and absolute path will work equally well. Absolute path has two distinct advantages:

  • if you need access data or files that are not connected to the current project, you may need absolute path
  • Many graphical file managers have an option to display and copy the absolute path of files. This is very helpful for beginners.

However, it is harder to use when your code also has to run on your team-mates computers, and it is virtually impossible to do if the code also has to run on cloud servers. Absolute path is also somewhat different for unix (mac and linux) and for windows.

The absolute path directions to the image folder (if Yucun uses Mac) are (see Section 9.1.3):

  1. Start at root “/”
  2. into Users
  3. into yucun
  4. into Pictures

or “/ - Users - yucun - Pictures”. This translates in exactly the same way as the relative path into

"/Users/yucun/Pictures"

In a similar fashion, Yucun can display the Ross Lake picture as

pic <- image_read("/Users/yucun/Pictures/Ross Lake.jpg")
plot(pic)

If Yucun uses Windows, the absolute path looks like

  1. Start at root “This PC”
  2. into drive “C:”
  3. into Users
  4. into yucun
  5. into Pictures

Or “start at root This PC - C: - Users - yucun - Pictures”. The difference is that on Windows, the root directory “This PC” is not marked, so the absolute path is

"C:/Users/yucun/Pictures"

Yucun can now display the Ross Lake picture as

pic <- image_read("C:/Users/yucun/Pictures/Ross Lake.jpg")
plot(pic)
file explorer and file access on windows

Accessing and plotting files on Windows. The file explorer window does not display the file extension–the image is just diplayed as “Ross lake” (of type “JPG”). However, R lists it as “Ross lake.jpg”. The latter name is needed to load and plot it.

Exercise 9.15

  1. Pick an pdf document in your Documents folder (or wherever you keep your documents)
  2. Draw the file system tree that includes the document (in Documents folder).
  3. Write down the directions about how to get to the document from the root folder.
  4. What is the absolute path of the document using the R notation?
  5. Use image_read() with absolute path, and plot() to show the pdf on screen!

TBD: tilde as home folder marker (Documents on win)

9.2.3 RStudio’s file name completion

TBD: tab completion

9.3 Summary

Path are directions about how to find a file.

  • Relative path starts from the current location. For instance “up / into Documents / grab cheatsheet.pdf”, or in computer form ../Documents/cheatsheet.pdf is a relative path.
    “into Applications / into Oct-2023 / grab gradschool-essay”, in computer form Applications/Oct-2023/gradschool-essay, is a relative path.

  • Absolute path starts from the root directory. For instance “root - into Users - into yucun - into Documents - grab cheatsheet.pdf”, in computer fodm /Users/yucun/Documents/cheatsheet.pdf is an absolute path. “root - C: - Users - yucun - Desktop - Applications - Oct-2023 - grab gradschool-essay”, in computer form C:/Users/yucun/Desktop/Applications/Oct-2023/gradschool-essay is an absolute path.

TBD: more summary stuff


  1. The file system tree is traditionally depicted upside down, so the root is at the top, and all branches and leaves are down. You can do it differently, but a few common expressions like “up to the parend folder” and “descend to the folder” assume such upside-down structure.↩︎

  2. While relative path is a standard concept, long form, short form and computer form are not. These are just used in this book.↩︎

  3. Linux file system is fairly similar to that of Mac, just using “home” instead of “Users”.↩︎

  4. Some programs may also call it “Macintosh HD”↩︎