Chapter 2 Find your files

First things first. Contemporary programming, in particular data programming, requires working with a large number of files–your code, your written reports, documents, homework and labs, data files and so on. This section discusses how to organize the files, and how to think about the organization as the files system tree. Although thinking about your computer files as a tree may not be what you normally do, this is how computers and programming languages think about your files. It is necessary to have a basic overview of the file system layout for your code to be able to access the files.

Another reason this books introduces the files system as the first topic is to help you to organize your files. In this course you will mainly work with rmarkdown documents that your create yourself, and data files that your download from internet. These files should be placed in a meaningful location on your computer, you should know where they are, and if needed, you must be able to access those both through R and other tools. This will help you to avoid many problems later.

In this course you’ll need to understand the file system mainly to access the data files, but it also helps to understand how do knitting and rmarkdown work (see Section 3.4). It also helps you to prepare your code to run on different computers (needed by, e.g., shiny, see Section E).

2.1 File system tree and working directory

Before loading our first files from disk, we discuss what are file system tree and working directory. These two concepts are extremely important for all kinds of text-based navigation–and your code expects the file names to be provided in text. Many common problems that novice programmers encounter when attempting to loading data boil down to misunderstanding these concepts.

2.1.1 File System Tree

Modern computers store its files and folders like a tree. You can imagine folders being branches that may branch further (contain other folders), and files being leaves, the “final ends” of branches. Let us explain this through an example.

TBD: here something about how people “normally” arrange and access files?

TBD: Apple photos

Imagine there is a student Yucun. He is taking different courses, and he stores his schoolwork in a folder UW that is inside of his Documents folder.

When you look at the content of his Documents in the file manager, you can see a picture like shown here. You can see that he has two folders in there, Applications and UW, and a number of files.

He is using the folder UW to store his school stuff. He sorts it by the class he is taking, and hence the content of UW may look like this:

One of the classes he is taking info201. Inside the info201 folder he sorts the files into exercises, homework and stuff; and and there is also a lonely file cheatsheet.pdf.

This is a fairly typical layout of files and folders: there are folders, and folders inside folders, and files inside folders, and so on.

Instead of displaying a single folder’s content at time, we can also draw such a nested structure as a tree. The figure below does just that. To begin with, Documents is just one folder among his other stuff. This is normally called home folder, not “stuff” but we call it “Yucun’s stuff” for now (see more in Section 2.4.1 below). Inside of Documents there are two folders: UW and Applications, they are placed underneath the Documents folder and connected to it with dotted lines. The files, “add-code.txt”, “SSRN-id3265…” and others are marked inside of the folder (well, inside of the extended gray box next to it). There are more files, those are denoted with three dots ....

Yucun’s folders displayed as a upside-down tree. He has four main folders: Documents, Pictures, Music and Downloads; those four folders in turn contain other folders and files.

Further, UW contains other folders, including info201 which, in turn, contains exercises, homework, stuff, and cheatsheet.pdf. Again, the subfolders are marked underneath info201, but cheatsheet, the file inside info201, is placed in the gray box at left of it. Obviously, there are more files and folders in his computer, some of which are denoted here with dots ....

You can imagine the whole tree as a big building where folders are rooms, and files are furniture in that room. Rooms (folders) contain some furniture (files), but there are also stairs, leading down to other rooms (subfolders).

But the picture is usually compared to an upside-down tree. Root of the tree is marked as “Yucun’s stuff” (more about it below in Sectin 2.4.1).1 It has four branches: Documents, Pictures, Music and Downloads. The Pictures folder contains two pictures, fractal and Ross Lake. We can imagine that pictures, and other files, like cheatsheet.pdf are leaves of the tree as those do not branch any further.

2.2 How to organize your computer

Now when you have a basic idea about the file system, let’s talk somethign that may sound obvious. How should you arrange the files on your computer? More specifically, we discuss how to arrange files for this course. While it is mostly your own private matter and we do not care much about the details, there is one requirement: understand what you do. This course requires creating folders and loading and saving files, and it is a frequent problem that students set up the computer in an inefficient manner, or are not even aware that they have set up something. Here we address some of the more common problems.

  1. Make a dedicated folder for the course materials. Give the folder a meaningful name, e.g. “info201”. Do not spread the markdown files, code and datasets all over your computer. This easily leads to you accidentally deleting some of those, not being able to find them, and not understanding what files belong to what project. If you want, you can also make separate subfolders, e.g. for homework, in-class material, datasets, and so on.

  2. Decide a place where the course folder goes. It should be close to your other schoolwork. It may be on your desktop. It may be in your Documents. It may be in a dedicated Informatics folder in your OneDrive. There are many options. But not all options are equally good, and some of them are rather bad. Do not put it in your home folder (see Section 2.4.1), unless you have good reasons. Do not put it in your Downloads folder–that one gets quickly overloaded with too much stuff.

  3. Ensure that there is only one folder with that name. That is, unless you are able to both explain why do you need another folder with the same name, and be able to locate files in one or another as needed.

Exercise 2.1

Create the folder for info201 course.
  • Find it using the file manager.
  • Create a new file inside that folder.
  • Find the folder using the system search tools. Ensure that it contains the file you just created!

TBD: may not have pics in pic folder

Exercise 2.2 Draw a similar file system tree for your computer. Include in it a) your documents folder; b) your folder where you keep info201 stuff; c) your pictures folder; d) a few images in your picture folder. Include also a few other folders and files. Mark some additional folders and files with dots.

See my version

Exercise 2.3 Draw the tree for your Pictures folder (or another folder where do you keep your pics). Mark at least 5 files on this tree.

See my version

2.3 Working directory and relative path

Now when you know what is file system tree, we can discuss how computer programs (including the programs you will write) “see” the file system on your computer.

First, the programs “think” that they are “located” in a place in the file system tree. This place is called working directory. You may imagine that all open apps are located “somewhere” in this tree–when you open it, it will be “inside” of one of the folders in the tree. Working directory is a very important concept because when a program wants to load files from disk, it always starts from the working directory.

In some sense, the working directory is like the file manager window–if you keep it open, it will display the content of a particular folder. We can call that folder “the working directory” for the file manager. In a similar fashion as you can click on files and icons in the file manager, the programs can find files and move to a different folder. But unlike the file manager that can have multiple windows open, the programs have only a single one open at time–the have only one working directory.

For us, perhaps the most important difference between file manager window and the working directory of a program is that the programs rarely display the content of a folder (but some do, e.g. your image viewer may show the content of the image folder). Another major difference is how the programs access files. They do not click on icons–they use file names. So in order to open a file from inside a program, you need to know its name.

For instance, Yucun may have an rmarkdown file (see Section 3) called lab1.md in the exercises folder inside info201. That will be the working directory when he will compile it into html (see Section 3.2.1). This means that the app “thinks” that it is located in that folder. The markdown file is written in red inside of exercises.

R running inside exercises

R app “thinks” it is inside of exercises folder on Yucun’s computer.

But now if the markdown file needs to access files and folders that are somewhere else on the computer, not in its working directory exercises, then it must navigate through the tree to find those files. For instance, if Yucun’s wants to include the image fractal.png, then it must walk (marked with thin red arrows on the figure):

  1. up (into info201)
  2. up (into UW)
  3. up (into Documents)
  4. up (into Yucun’s stuff)
  5. into Pictures
  6. grab fractal.png from there.

The folder name, when going up is in parenthesis because we can tell computer just to go “up”, as there is only a single way up. But when we go down, into a subfolder, then we need to give its’ full name like “into Pictures”.

Note that the first step is not “up to exercises” because it is already inside exercises. So it starts by moving up to info201 instead.

Such list of steps resembles navigation directions suggested by navigation apps, such as google maps. On computer, it is knows as relative path. It explains how to get somewhere from the current position–relative to the current position. In a way we are explaining the program something like “if you start here, then you need to go four blocks north and one block east”. We call it the relative path in a long form.

The list above is intuitive, but in the “computer language” we need to write the instructions in a more compact form. This requires a few simple changes in how the directions are written:

  1. replace “up” with two dots “..”. Also, remove the parent folder name in parenthesis.
  2. adding a slash “/” between the separate instructions instead of line break.
  3. finally, remove “grab” and “from here” and leave just the file name.

So instead of the google-style list above, your computer wants to see the relative path as

../../../../Picture/fractal.png

We call this the relative path in a short form or in a computer form.2

Exercise 2.4 Now imagine that Yucun runs a java program inside cse142. How can java access the cheatsheet.pdf file in info201? Write out a similar long form list, and short form list as above.

See the solution

Exercise 2.5 Yucun downloaded a file matrix.dat into Downloads. How can his MATLAB code in amath352 access that file?

See the solution

Exercise 2.6

Draw the file system tree of your own computer.
  • Mark there a) the folder where you keep your info201 materials; b) a few images in your Pictures folder.
  • Write the directions for accessing one of the images in your Picture folder from your info201 folder.

See my solution

As computers access files by name, not by icon, it is critical to ensure that the file name, including it’s extension is correct!

Some file managers may not show the file extensions by default. You need to figure out either how to display those, or to use other tools to find full file names. Such other tools include R (Section 11.2), RStudio-s tab-completion, or command line (Section B.3.3).

TBD: more exercises, examples

TBD: a game with a large number of nested folders

TBD: troubleshooting: do you have a single info201 folder? Do you know where it is located? Search may give you a wrong one…

2.4 Absolute path

But there is more on the Yucun’s computer than what was shown above. Besides of the user files, all computers contain system files, the apps and data needed to get the computer up and running. Computers may also have more than a single user, and they may have external drives, such as usb disks or network systems connected to them. Here we discuss where those files are located in the file system tree, and how to access those.

2.4.1 Home folder, file system, and the absolute path

The complete file system tree includes all files and drives, just those are located outside of the “Yucun’s stuff”. Yucun’s files are just a branch of the complete tree.

The figure below shows the file system tree of Yucun’s computer, beginning from the computer’s root directory. There are a few small differences, depending on whether he is using a Mac or a PC. We discuss the mac file system first.

The files and folders in Yucun’s Mac. All files and drives form a similar three, and Yucun’s home folder, typically a folder named “yucun”, is just a branch of that tree.

If using mac, all Yucun’s stuff is located in a folder called yucun. This is called his home folder or home directory, the place for all his documents, pictures, music and other stuff. This is what we called “Yucun’s stuff” above in Section 2.1.1. The home folder, in turn, is located in the folder Users.3 The Users folder, in turn is directly in the root folder, here labeled as “root /”. However, its actual name is just “/”, the word root is added here only for clarity.4 Root folder is the “mother of all folders”, the root of the file system. All other folders are either directly or indirectly located inside the root folder. For instance, Yucun’s home folder is first inside “Users” that in turn is inside the root “/”.

The files and folders in the Yucun’s PC. The layout is fairly similar to that of Mac, just root is called “This PC” and the first branch is into the drive letters “C:” and “D:”.

If Yucun is using Windows, the tree will look slightly different. First, Windows labels the file system root not as “/” but as “This PC”. And second, before we get into actual folders, we have to walk through the drive letters, such as “C:”. But the idea is similar: Yucun’s home folder is located in the root folder “This PC”, always indirectly through one of the drive letters.

One can write “google-style” directions about how to get to a given folder when starting from the root. For instance, if you want to grab Yucun’s “cheatsheet.pdf” (see Section 2.1.1), one needs to (assuming he uses Mac):

  1. Start at root “/”
  2. into “Users”
  3. into “yucun”
  4. into “Documents”
  5. into “UW”
  6. into “info201”
  7. grab “cheatsheet.pdf” from there

Or in a more compact fashion:

/Users/yucun/Documents/info201/cheatsheet.pdf
If he uses Windows, the navigation rules are fairly similar
  1. Start at root “This PC”
  2. into “C:”
  3. into “Users”
  4. into “yucun”
  5. into “Documents”
  6. into “UW”
  7. into “info201”
  8. grab “cheatsheet.pdf” from there

The short form on windows is slightly different–you should not write “This PC” in the path there. This leaves just:

C:/Users/yucun/Documents/UW/info201/cheatsheet.pdf

See more in Section 11.

Such navigation rules are called absolute path. The advantage of absolute path is that this always tells where the file is located, independent of the current working directory. However, if Yucun decides to rename his “info201” folder, or move it elsewhere, then the absolute path is not valid any more.

So there are two ways to navigate the file system: relative path and absolute path. Relative path starts with the current working directory, and the first step is often moving “up” in the tree, before descending “down” in other folders. Absolute path starts with the root folder, the topmost folder, and there is no need to go “up” again after descending into a directory.

Exercise 2.7 Consider Yucun’s files as in Section 2.1.1. Yucun runs a java program inside cse142. How can java access the Ross Lake.jpg file in Pictures using absolute path? Write the absolute path both in long-form, and in short form. Use either mac or PC way, depending on what kind of computer you have.

See the solution

Exercise 2.8 Pick one image in your Pictures folder.

  • Write down the absolute path, as a long form set of directions, for this image.
  • Write the same absolute path in short form.

See the solution

2.4.2 Absolute path in file manager

How can you see the file system outside of your home folder? This may be a bit tricky with file explorer. For instance, the windows file explorer may not show the drive letters and directories by default, but you can still see it if you click on its navigation bar (but the exact behavior depends on the configuration). Mac, however, may even refuse to display files outside of home folder. File managers’ behavior can normally be adjusted through options, but it may be somewhat hard for beginners.

But file managers typically allow you to copy the absolute paths of files and folders.

In the Mac file manager, ⌘ + ⌥ + P lets you to see the path of current directory. If you right click the folder and choose “get info”, then you can copy the path after “Where:”

windows file explorer with the nav bar highlighted

Windows file explorer after a click on the navigation bar. It displays the location (absolute path) of the folder, here “Z:”. Note also it uses backslash \ instead of forward slash /.

In Windows file manager, right clicking the top bar where the file path is and clicking “copy path” allows you to copy the path.

Windows normally uses backslash \, not forward slash / to separate individual moves on the file system tree. This will not work in R without additional adjustments, because R interprets backslashes \ as special string symbols.

See Section 18.1 for more details.

TBD: how to see files outside home on mac

Exercise 2.9 What is the absolute path of your home folder? Draw this as a file system tree.

See the solution

2.4.3 When to use absolute and relative path

Both absolute and relative paths are similar in the sense that both allow you to point to individual files and directories on your computer. The only difference is that the relative path starts its directions from the current working directory while absolute path starts from the root folder.

projects and data in different places in Yucun’s file system tree

Data and project files, located in the different places in Yucun’s file system tree.

Both of these approaches have their merits. Consider Yucun’s computer again, where we now have added a project, Project, inside amath352 folder (in blue); and a dedicated data folder under Documents (green). The project contains some code (code.R) and a data file (data.csv) in a separate data-folder. How can code.R access data.csv inside its data folder using relative path? The directions are easy: descend into data and grab data.csv from there: "data/data.csv".

But what happens if Yucun decides to move his Project into Applications, instead of amath352? The result is shown in gray. How can the new, gray, code.R access the new gray data.csv? The answer is simple: exactly the same way. It still has to descend into data and grab data.csv from there.

This is one of the major advantages of relative path: it does not change if we move our project around! More specifically–only local relative paths, those that refer to files and folders within the project, will remain unchanged. And even more: even if Yucun will move the project over to a different kind of computer, for instance a UNIX server or into a docker container, the relative path will still be the same. This is one of the main reasons why relative paths are so popular.

But what happens if Yucun has data not within the project’s data folder, but in a dedicated place, in the green data folder within Documents? First, how can his original (blue) code.R access that file using relative path? It needs to
  1. up (into amath352)
  2. up (into UW)
  3. up (into Documents)
  4. into data
  5. grab data.csv from there.

However, this is not how the project in the new (gray) location can access the green data.csv. The relative path is broken! If you access files outside of your project, then relative path may not be what you want.

Exercise 2.10

Imagine that Yucun is working on his blue project within amath352. His code.R uses the green data.csv, located in data subfolder within Documents.
  1. He decides to move the project into Applications, marked as gray. Does he have to change his code if he was using absolute path?
  2. He decides to move the project to a different computer. Can he still use the same absolute path? What about relative path?

See the solution

2.4.4 Limitations

File system tree is the main way to think about files and folders on computer. But it is not the best tool for all tasks.

One feature that does not fit well with the tree are recent files. The recent files are recent not because they are located in a special folder, but because they were accessed recently. But in most cases, you do not want to load data in your code because it was recent, but because it is a data file that contains what you need, recent or not. So recent files are not something you normally need when programming!

Another feature that makes the tree somewhat messy are links (shortcuts). These allow a file or folder to be located inside two (or more) other folders. As a result, the tree is not only branching, but sometimes the branches may also merge.

2.4.5 RStudio’s file name completion

TBD: tab completion

2.5 Summary

Path are directions about how to find a file.

  • Relative path starts from the current location. For instance “up / into Documents / grab cheatsheet.pdf”, or in computer form ../Documents/cheatsheet.pdf is a relative path.
    “into Applications / into Oct-2023 / grab gradschool-essay”, in computer form Applications/Oct-2023/gradschool-essay, is a relative path.

  • Absolute path starts from the root directory. For instance “root - into Users - into yucun - into Documents - grab cheatsheet.pdf”, in computer fodm /Users/yucun/Documents/cheatsheet.pdf is an absolute path. “root - C: - Users - yucun - Desktop - Applications - Oct-2023 - grab gradschool-essay”, in computer form C:/Users/yucun/Desktop/Applications/Oct-2023/gradschool-essay is an absolute path.

TBD: more summary stuff


  1. The file system tree is traditionally depicted upside down, so the root is at the top, and all branches and leaves are down. You can do it differently, but a few common expressions like “up to the parent folder” and “descend to the folder” assume such upside-down structure.↩︎

  2. While relative path is a standard concept, long form, short form and computer form are not. These are just used in this book.↩︎

  3. Linux file system is fairly similar to that of Mac, just using “home” instead of “Users”.↩︎

  4. Some programs may also call it “Macintosh HD”↩︎