Chapter 9 File system tree
This section deviates from our main task, programming, and discusses computer files systems and how to access your files from R and from command line. We need it below mainly to be able to access data files, but also to understand better how does knitting and rmarkdown work (see Section 10.4), and how to prepare your code to run on a different computer (needed by, e.g., shiny, Section E).
Understanding the file system is also a valuable knowledge in itself, and helps you to avoid many problems later–both with coding and otherwise.
9.1 File system tree and working directory
Before diving deeper into the commands that one can use through terminal, we have to explain what are file system tree and working directory. These two concepts are extremely important for all kinds of text-based navigation. Moreover, these are central concepts in terms of how programs run and access file system, something that is necessary later when we are loading data from disk. Many common problems that novice programmers encounter when attempting to loading data boil down to misunderstanding these concepts.
9.1.1 File System Tree
Modern computers store its files and folders like a tree. You can imagine folders being branches that may branch further (contain other folders), and files being leaves, the “final ends” of branches. Let us explain this through an example.
Imagine there is a student Yucun. He is taking different courses, and he stores his schoolwork in a folder UW that is inside of his Documents folder.When look the content of his Documents folder in the file manager, we may see a picture like what is displayed here. You can see that he has two folders in there, Applications and UW, and a number of files.
He is using the folder UW to store his school stuff. He sorts it by the class he is taking, and hence the content of UW may look like what is shown here.
One of the classes he is taking info201. Inside the info201 folder he sorts the files into exercises, homework and stuff; and and there is also a lonely file cheatsheet.pdf.
This is a fairly typical layout of files and folders: there are folders, and folders inside folders, and files inside folders, and so on.
Instead of displaying a single folder’s content at time, we can also draw such a nested structure as a tree. The figure below does just that. To begin with, Documents is just one folder among his other stuff. Inside of Documents there are two folders: UW and Applications, for simplicity we do not mark the plethora of files in the latter.Further, UW contains other folders, including info201 which, in
turn, contains exercises, homework, stuff, and cheatsheet.pdf.
Obviously, there are more files
and folders in his computer, some of which are denoted here as dots
...
.
Root of the tree is marked as “Yucun’s stuff” (more about it below). Traditionally, the tree is depicted upside down, so the root is at the top, and all branches and leaves are down. The root has four branches: Documents, Pictures, Music and Downloads. The pictures folder contains two pictures, fractal and Ross Lake–. We can imagine that pictures, and other files, like cheatsheet.pdf are leaves of the tree as those do not branch any further.
Exercise 9.1 Draw a similar file system tree for your computer. Include in it a) your documents folder; b) your folder where you keep info201 stuff; c) your pictures folder; d) a few images in your picture folder. Include also a few other folders and files. Mark some additional ones with dots.
9.1.2 Working directory and relative path
Understanding the file system tree is extremely important in order to be able to navigate the folders and load files. Namely, programs understand one of the folders as their working directory, the place in the file system where they “are located”. Working directory is like the file manager window–if you have it open, it displays the content of a particular folder. This folder is about the same thing as the working directory for the file manager. In a similar fashion as you can click on files and icons in the file manager, the programs can find files or move to a different folder. But unlike the file manager that can have multiple windows open, the programs have only a single one open at time–the have only one working directory.
But perhaps the main difference between file manager window and the working directory of another program is that those other programs typically do not display anything like the content of a folder. You may also imagine that all open apps are located “somewhere” in this tree, for instance, Yucun may run R in the exercises folder inside info201. Now the running R app “thinks” that it is is located in that folder. That is the working directory for that R app (for that running instance of R).
Another major difference is how the programs access files. They do not click on icons–they use file names. So in order to open a file from inside program, you need to know its exact name, including its extension!Some file managers may not show the file extensions by default. You need to figure out either how to display those, or use other tools to find full file names.
But when a program wants to access files and folders that are somewhere else in the file system tree, not in its working directory, then it must navigate through the tree to find the file. For instance, if Yucun’s R app (working directory exercises) wants to access the image fractal.png, then it must move:
- up (into info201)
- up (into UW)
- up (into Documents)
- up (into all the Yucun’s stuff)
- into Pictures
- grab fractal.png from there.
Such recipe is known as relative path. It explains how to get somewhere relative to the current position. In a way we are explaining the program something like “go four blocks north and one block east”.
Exercise 9.2 Now imagine that Yucun runs a java program inside cse142. How can java access the cheatsheet.pdf file in info201? Write out a similar navigation list as above.
See the solution
Exercise 9.3 Draw the file system tree of your own computer.
- Mark there a) the folder where you keep your info201 materials; b) a few images in your Pictures folder.
- Write long-form directions for accessing one of the images in your Picture folder from your info201 folder.
- Write the same directions in short form.
9.1.3 Home folder, file system, and the absolute path
But there is more on the Yucun’s computer than what was shown above. Besides of the user files, all computers also contain system files, the apps and data needed to get the computer up and running. Computers may also have more than a single user, and there may be also external drives, such as usb disks or network systems connected to them. All those are included in the file system tree, but outside of the “Yucun’s stuff”. Instead, Yucun’s files are just a part of the whole tree. Figure below show the file system tree of Yucun’s computer, beginning from its root. There are a few slight differences, depending on whether he is using a Mac or a PC. We discuss the mac file system first.
If using mac, all Yucun’s stuff is located in a folder called “yucun”. This is called his home folder or home directory, and that is the place where all his documents, pictures, music and downloads are located. The images above in Section 9.1.1 depicted just his home folder. The home folder, in turn, is located in the folder “Users”.14 The “Users” folder, in turn is directly in the root folder, here labeled as “root /”. However, its actual name is just “/”, the word root is added here only for clarity. Root folder is the “mother of all folders”, the root of the file system. All other folders are either directly or indirectly located inside the root folder, like Yucun’s home folder is first inside “Users” that in turn is inside the root “/”.
If Yucun is using Windows, the tree will look slightly different. First, Windows labels the file system root not as “/” but as “This PC”. And second, before we get into actual folders, we have to walk through the drive letters, such as “C:”. But the idea is similar: Yucun’s home folder is located in the root folder “This PC” (always indirectly in case of Windows).
The file explorer may not show the drive letters and directories by default, but it becomes visible with a click on its navigation bar.
One can write similar directions about how to get to a given folder using the complete file system tree. For instance, to get Yucun’s “cheatsheet.pdf”, one needs to (assuming he uses Mac):
- Start at root “/”
- into “Users”
- into “yucun”
- into “Documents”
- into “UW”
- into “info201”
- grab “cheatsheet.pdf” from there
Or in a more compact fashion: “/-Users-yucun-Documents-info201-cheatsheet.pdf”
Such navigation rules are called absolute path. The advantage of absolute path is that this always tells where the file is located, independent of the current working directory. However, if Yucun decides to move or rename his “info201” folder, then the absolute path is not valid any more.
So there are two ways to navigate the file system: relative path and absolute path. Relative path starts with the current working directory, and one often has to move “up” on tree, before descending “down” in another branch. Absolute tree starts with the root folder, and there is no need to go “up” again after descending into a directory.
Path are directions about how to find a file.
Relative path starts from the current location. For instance “up - into Documents - grab cheatsheet.pdf” is a relative path. “into Applications - into Oct-2023 - grab gradschool-essay” is relative path.
- Absolute path starts from the root directory. For instance “root - into Users - into yucun - into Documents - grab cheatsheet.pdf” is an absolute path. “root - C: - Users - yucun - Desktop - Applications - Oct-2023 - grab gradschool-essay” is an absolute path.
Exercise 9.4 Yucun again runs a java program inside cse142. How can java access the Ross Lake.jpg file in Pictures using absolute path? Write out a similar navigation list as above.
See the solution
Exercise 9.5 What is the absolute path of your home folder? Draw this as a file system tree.
Exercise 9.6 Pick one image in your Pictures folder.
- Write down the absolute path, as a long form set of directions, for this image.
- Write the same absolute path in short form.
9.2 Accessing files and directories from R
The main reason why do we need to understand the concept of file system tree and path is to be able to load data files into R. We look at data–csv files–a little later, here we just display images.
9.2.1 R’s working directory
To begin with, all R processes have their working directory–they “think” that they are located somewhere in the file system tree. You can ask the name of the working directory as
## [1] "/home/siim/tyyq/info201-book"
Exercise 9.7 Is the example here, “/home/siim/tyyq/info201-book”, relative path or absolute path?
See the solution
Note that each instance of R has its own working directory. This
means when you run R in multiple windows at the same time, all of these
may have different working directories. If the working directory in
your RStudio console is /Users/yucun/Desktop
, then that does not
mean that another window that is compiling your homework, will have
the same working directory!
You can change the working directory using setwd()
. However, this
is rarely needed, as you should start R in the right folder to begin
with, and access files using their path.
Exercise 9.8 What is the working directory of your R console inside your RStudio?
9.2.2 Accessing files in R
You can see the files in the current working directory using
list.files()
. For instance, if Yucun will run R inside of his
info201 directory, he would see:
This is the R’s “view” of the same folder. Instead of icons, we see
file names. We stress here that R shows the file names in their
complete form–including the complete extension (like .pdf
) and all
eventual spaces and other symbols in the names. The graphical viewers
may or may not display the complete names, depending on their
configuration.
Exercise 9.9
- Find the working directory of your current R instance (R Console inside RStudio).
- Print all file names there using
list.files()
. - Open the same folder using your graphical file manager.
- Show that it contains the same files what you saw in R!
- Does your graphical file manager show the file names in the same form, or do you see simplified versions of the names?
After figuring out what files are in the working directory, one can easily access those files. Obviously, what you do with the files will depend on the file type. For instance, pdf-s and images can be displayed as
library(magick) # you need to install the package first
cheatsheet <- image_read("cheatsheet.pdf")
plot(cheatsheet)
(He first needs to install the package magick though, see Section 3.6).
So in order to access files in your current working directory, you just need the file name. No directions needed here, no folder name or anything else.
But what if Yucun does not want to look at the cheatsheet put access the files in his Pictures folder instead? This can be done either by using absolute or relative path.
Exercise 9.10
- Install the “magick” package
- Put an image into your current R working directory
- Use
list.files()
to ensure the image is there - Use the
image_read()
andplot()
to display the image on screen.
9.2.2.1 Accessing files through relative path
Normally, relative path is what you want to use as this allows you to move your project to a different folder, and collaborate with others more easily. It also indicates the folder layout of your project, it is much harder to understand it from the absolute path.
We should begin by writing the “Directions” to get into the Pictures folder from info201 (see Section 9.1.1):
- up (into UW)
- up (into Documents)
- up (into Yucun’s stuff)
- into Pictures
Or in a shorter form “up - up - up - into Pictures”.
But these directions are made for humans, not for the computer. We need to translate it to computer language as this: First, take the short form of the directions. And now:
- replace up by two dots
..
- replace the dash “-” with slash
/
(not backslash\
!) - ensure that everything is enclosed in quotes.
So the relative path of Pictures from info201 will look like
The list.files()
command we used above accepts a relative path as an argument,
so Yucun can check his images as
A note about Windows.
Mac and linux consistently use slash /
as the path separator, but
Windows uses backslash \
by default. That will work too, but you
need to use double backslashes \\
instead of single one. This is
because backslash is a special character inside of character strings,
and you need to escape it with another backslash to actually be able
to insert one into the string.
In this course we consistently use the forward slash /
instead.
A very similar approach also helps to display an image there. For instance, in order to display Ross Lake.jpg, you need directions:
- up (into UW)
- up (into Documents)
- up (into Yucun’s stuff)
- into Pictures
- grab Ross Lake.jpg
And the corresponding short directions are
“up - up - up - into Pictures - grab Ross Lake.jpg”.
In the “computer language”
this
translates to
Yucun can display it as
Exercise 9.11
- What is the working directory of your current R instance? (In the Rstudio console.)
- Pick and image in your Pictures folder
- Draw the file system tree that includes both the current R working directory, and that image (in pictures folder).
- Write down the directions to get to the image from R’s working directory.
- What is the relative path of the image, from the current R working directory, using R notation?
- Use
image_read()
with relative path, andplot()
to show the image on screen!
9.2.2.2 Accessing files through absolute path
Alternatively, we can access the files through absolute path. For a small project that only runs in your computer, the relative and absolute path will work equally well. Absolute path has two distinct advantages:
- if you need access data or files that are not connected to the current project, you may need absolute path
- Many graphical file managers have an option to display and copy the absolute path of files. This is very helpful for beginners.
However, it is harder to use when your code also has to run on your team-mates computers, and it is virtually impossible to do if the code also has to run on cloud servers. Absolute path is also somewhat different for unix (mac and linux) and for windows.
The absolute path directions to the image folder (if Yucun uses Mac) are (see Section 9.1.3):
- Start at root “/”
- into Users
- into yucun
- into Pictures
or “/ - Users - yucun - Pictures”. This translates in exactly the same way as the relative path into
In a similar fashion, Yucun can display the Ross Lake picture as
If Yucun uses Windows, the absolute path looks like
- Start at root “This PC”
- into drive “C:”
- into Users
- into yucun
- into Pictures
Or “start at root This PC - C: - Users - yucun - Pictures”. The difference is that on Windows, the root directory “This PC” is not marked, so the absolute path is
Yucun can now display the Ross Lake picture as
Exercise 9.12
- Pick an pdf document in your Documents folder (or wherever you keep your documents)
- Draw the file system tree that includes the document (in Documents folder).
- Write down the directions about how to get to the document from the root folder.
- What is the absolute path of the document using the R notation?
- Use
image_read()
with absolute path, andplot()
to show the pdf on screen!
Linux file system is fairly similar to that of Mac, just “home” instead of “Users”.↩