R Configuration for Convenient Computing
John Miyamoto                                                                                                   November 7, 2010
Psychology, Box 351525, University of Washington, Seattle, WA 98195
Email:  jmiyamot "at symbol" uw.edu


Configuring R for Convenient Computing

R is a beautiful programming environment for data analysis, statistical modeling, and general purpose computing.  For those who are in the early stages of learning R, however, there are aspects of R that may seem awkward or inconvenient. 

·   For one thing, how do you keep track of all of the different objects that are created in the course of an R analysis? 

·   What is a good way to structure the collections of objects that are created for different, but related projects? 

·   If you write general purpose functions that are useful in many different projects, how can you make them easily accessible when working on different projects? 

Actually, there are easy, although somewhat non-obvious, solutions to these questions and other similar questions.  The basic solution is to configure R so that it automatically creates a convenient configuration and file structure whenever you run R.  The purpose of this document is to describe how to create a convenient default configuration for R. 

 

The discussion below will be easier to follow if you are running the R program, so start R by double clicking on an R icon.

 

Section

Topic

Taking files off of the search pathTable of Contents

1RefTo: s0006

The search path

2RefTo: s0009

The relationship between .RData and .GlobalEnv

3RefTo: s0008

The basic strategy for organizing files of R objects

4RefTo: s0007

Creating files of R functions or objects:  The save function

5RefTo: s0005

Putting R object files on the search path:  The load and attach functions

6RefTo: s0011

7RefTo: s0010

How to move objects from .GlobalEnv to other files on the search path. 
How to delete objects from files that are on the search path.

8RefTo: s0004

What happens when you start up R?

9RefTo: s0003

What is the sequence of functions that run automatically when R starts up?

10RefTo: s0001

Rprofile.site - the global R configuration file

11RefTo: s0002

.First function - the startup configuration function

12RefTo: s0999

Glossary of Terms

 

1s0006.   The search path:  How R finds the functions and objects (e.g., variables or data sets) that are referenced in the code for an analysis. 

Whenever you enter an R-expression at the R-prompt, R looks for functions or objects that correspond to the functions or objects that are referred to in the expression.  For example, suppose you type the following at the R-prompt (the initial ">" is the R-prompt):

 

>X <- Y + Z

 

R will look for objects called "Y" and "Z", and try to assign to "X" the sum of the values assigned to "Y" and "Z".  To find Y and Z, R looks through a series of files for objects with these names.  This series of files is called the search path for R.  You can see the current search path by giving the R command, search().  For example on my computer, search() produces the information shown in Table 1RefTo: t5.  This output tells us that .GlobalEnv is in position 1 (the top position on the search path).  The files in positions 2 - 9 are occupied by other files that contain useful functions.  Position 9 contains the base package which constitute the default basic set of functions that define the R programming language.  The specific search path will differ on different computers, or even on the same computer in different R-sessions; later I will explain how the user can control which files will be placed on the search path.   

 

Table 1t5   

> search()

 [1] ".GlobalEnv"        "file:data.rda"  "package:stats"     "package:graphics"

 [5] "package:grDevices" "package:utils"  "package:datasets"  "package:methods" 

 [9] "Autoloads"         "package:base"  

 

 

If the expression, "X <- Y + Z", is entered at the R-prompt, R looks for objects corresponding to "Y" and "Z".  R looks first in .GlobalEnv, second in file:data.rda, third in package:stats, fourth in package:graphics, etc.  R assigns to each expression the first object with a matching name that it finds in this search.  For example, suppose that there is an object named "Y" in data.rda where "Y" has the value 4, and there is another object named "Y" in package:datasets where it has the value 10, and there is no object named "Y" in any other file along the search path.  Y will be assigned the value 4 because this is the value of Y in the earliest file on the search path.  Suppose that there is an object named "Z" in .GlobalEnv where it has the value 2, and there is another object named "Z" in package:stats where it has the value 15, and there is no object named "Z" in any other file along the search path.  Then Z will be assigned the value 2 because this is the value of Z in the earliest file on the search path.  The expression "X <- Y + Z" results in assigning the number 6 to "X" (because 6 = 4 + 2).  Furthermore the object called X will be placed in .GlobalEnv (if there previously existed an object called X in .GlobalEnv, that object would be replaced by the X whose value is 2). 

In general, whenever an R-expression is entered at the R-prompt, R looks for functions and objects that correspond to the names of functions and objects that are referenced in the expression.  R will always interepret such references according to the first function or object of the same name[1] that occurs along the search path. 

Summary:  When you run R, R creates a search path which is a sequence of files in which it searches for functions and objects that are named in any R expressions. 

The remainder of this document discusses a number of topics that are related to the search path:

·   How to put files onto the search path or take them off of the search path?

·   How to move R functions and objects from one file on the search path to another file on the search path. 

TOC

2s0009.   The relationship between .RData and .GlobalEnv

Each R icon has a specific working directory associated with it.  To find out the directory that is associated with an R icon, right mouse click on the R icon, and choose the Shortcut tab from the Properties dialog box.  The "Start in" field indicates the directory which is the default working directory that is associated with the R icon. 

Suppose that we double click on an R icon that I will call R.Icon.A, and suppose its working directory happens to be C:\mydata.  There are two cases to consider:

Case 1cases0001: Suppose that when you double click on R.Icon.A, there does not already exist a file called .RData in C:\mydata.  If this is so, then R creates an empty .GlobalEnv, i.e., the initial .GlobalEnv will have no objects in it.  Later, when you exit from the R program you will be given the opportunity to save all objects that are currently in .GlobalEnv.  If you choose to save these objects by answering "yes" to the query, R will create a file called .RData in C:\mydata and will save all objects in .GlobalEnv to this file. 

Case 2cases0002:  Suppose that when you double click on R.Icon.A, there already exists a file called .RData in C:\mydata.  If this is so, then R will load the objects that are in .RData into .GlobalEnv.  Thus, the initial contents of .GlobalEnv will be the objects that were saved in a previous R session that ran in this same working directory.  Just as in Case (1RefTo: cases0001), when you exit from the R program you will be given the opportunity to save all objects that are currently in .GlobalEnv.  If you choose to save these objects by answering "yes" to the query, R will replace the previous version of .RData with a new version that contains all of the objects that are currently in .GlobalEnv.

I should mention that in addition to creating a new .RData file when you exit from R, you can also save all objects in .GlobalEnv to .RData in the middle of a R session by using the save.image function. 

Summary:  .GlobalEnv contains all of the current objects in a R session.  At the beginning of a R session, .GlobalEnv contains all objects that were saved to a .RData file at the end of a previous R session in this working directory, or if no such .RData file exists, then .GlobalEnv is empty.  At the end of a R session, the user gets to decide whether or not to create a new .RData file that contains all of the objects in .GlobalEnv. 

TOC

3s0008.   The basic strategy for organizing files of R objects

Most users of R are working on a number of different projects.  They have data objects (vectors, matrices, dataframes) that are relevant to one project and other data objects that are relevant to other projects.  How should the user keep track of the objects that are relevant to these different projects?  I will first describe an easy, but bad strategy for storing R data objects and functions that pertain to different projects.  Next I will describe a somewhat better, but still dubious strategy for this purpose.  Finally, I will describe what I regard as the basic strategy for organizing files of R objects for different projects. 

A bad strategy:  Suppose you always double click on the same R icon to start R.  Unless you resort to other extra measures to be described below, what this implies is that all of your R objects will be stored in the same .RData file (the .RData file that is in the working directory of your R icon).  If you are working on multiple projects, then all objects and functions for these projects will be mixed together.  Ultimately, this will cause an unbearable headache for the data analyst.  

A good idea:   One can create multiple R icons for different projects.  For example, if you already have an R icon called R.icon.A, right click on this icon and choose "Send to" from the menu that appears.  Choose "Desktop" as the location for the new icon.  This will create a copy of R.icon.A on the Desktop which I will call R.icon.A.copy.  Right click on R.icon.A.copy, and choose "Properties" from the menu.  You can change the name of this icon on the "General" tab and you can change the working directory of this icon on the "Shortcut" tab.  I will assume that you have changed the name of the new icon from R.icon.A.copy to R.icon.B.  Indeed you will probably want to create R icons that start in the working directories of each of your projects, and to give names to the R icons that are reminders for these projects.  If each project has a different working directory, e.g., C:\project.1, C:\project.2, C:\project.3, etc., and if you have R icons, R.icon.1, R.icon.2, R.icon.3, that start in these respective directories, then there will be a separate .RData file in each of these directories.  This will help you keep track of the R objects for different projects.  But this idea by itself is insufficient (see next paragraph). 

A dubious strategy:  Although one can create a different R icon for each project, and assign a different, appropriate working directory to each R icon, it is still not a good idea to use the .RData file to store all data objects and functions for a given project.  The reason is that typical analyses require the creation of many objects that have some temporary value in the context of the analysis, even if they have no permanent value to the user.  For example, a given analysis might find it useful to create several vectors, x1, x2, x3, and several matrices, m1 and m2, even though these objects have no lasting value once the analysis is completed.  If one does not delete these files, then they will remain in .RData, and the user is left to wonder at some future date what these objects are and whether they can be replaced by new objects with the same names.  Alternatively, one can end each R session with some housekeeping activity, carefully going through the current contents of .GlobalEnv and deleting objects that have no future value, but this housekeeping activity is tedious and somewhat prone to error (one can delete the wrong objects, or fail to delete objects). 

The basic strategy:  I propose that it is a better strategy to create files of objects that have permanent value with respect to a project, to attach these files to the search path, and to move objects that have permanent value for a project to these files.  For example, suppose that file pertaining to Project 1 are stored in a directory, C:\project1.  I propose that we should create a file called project1.rda that is stored in C:\project1 that will be used to store R object that have permanent value for Project 1.  To implement this strategy we will need to know how to do several things:

(istr0001)    how to create a file of R objects like project1.rda (see Section 4RefTo: s0007).

(iistr0002)   how to put a file like project1.rda on the search path (see Section 5RefTo: s0005) so that R objects in project1.rda become accessible in the current R session.

(iiistr0003)  how to move objects that are in .GlobalEnv to other files that are on the search path, and how to delete objects that are in these other files (see Section 7RefTo: s0010). 

In other words, if we have a file called project1.rda, we would like to put it on the search path as shown in Table 2RefTo: t0002. 

Table 2t0002

> search()

 [1] ".GlobalEnv"              "file:c:/project1/project1.rda"

 [3] "package:stats"           "package:graphics"            

 [5] "package:grDevices"       "package:utils"                

 [7] "package:datasets"        "package:methods"             

 [9] "Autoloads"               "package:base"    

 

Any objects that are in project1.rda will be accessible for use in the current data analysis because they are on the search path.  If we create new functions, vectors, matrices and dataframes that have permanent value for Project 1, we can move them from .GlobalEnv (where they are located at the time they are first created) to file:c:/project1/project1.rda (the R name for project1.rda after it has been attached to the search path).  This strategy has several advantages:

·   R objects that have only temporary value remain in .GlobalEnv; they do not become confused with objects that have permanent value.  If you create an R object that has permanent value for a project, then you can move it to the project file, project1.rda in this example.

·   Whenever you want to clear away all old objects that have temporary value, you simply have to delete all objects in .GlobalEnv (this is easy to do).  You don't have to pick out the temporary objects from the permanent objects in .RData. 

·   You don't have to worry about destroying an object that has permanent value.  For example, suppose you have a dataframe called, summary.dat, that summarizes the results of an important analysis.  If summary.dat were in .RData and hence were loaded into .GlobalEnv at the start of a future session, you could accidentally give the name, "summary.dat" to a new R objects, thereby destroying the old version of summary.dat.  By keeping summary.dat in project.rda, you reduce the chances that you will inadvertently destroy it.  (You can still make this mistake, but not so easily.) 

TOC

4s0007.   Creating files of R functions or objects:  The save function

The save function is used to save R functions or objects to a file on your hard drive.  To illustrate the use of this function, we should first create a few R objects.  Assuming that you are currently running R, give the commands:

 

  xx = c(1, 3, 5)

  yy = c(2, 4, 6)

  mm = rbind(xx, yy)

  sq.fn = function(X) {Y = X^2; return(Y)}

We can check that xx and yy are vectors, mm is a matrix, and sq.fn is a function, by giving the following commands:

 

  xx
  yy
  mm

  sq.fn(5)
  sq.fn(xx)
  sq.fn(mm)

 

Now we will create a file on your hard drive that contains these objects and function (henceforth, I will simply say "R objects" when I mean "R functions and objects" since a function counts as an object and the former term is shorter). 

 

  save(list = c("xx", "yy", "mm", "sq.fn"), file = "c:/temp/temp1.rda")      #1r0001

 

The list argument of the save function names the objects to be saved, and the file argument indicates the file to which the objects are to be saved including the path to this file.  Two comments about the save function:

(isv0001)    In Windows, the \ is used as the separator between successive directories; thus, c:\temp\temp1.rda would be the standard Windows syntax for designating the temp1.rda file in the c:\temp directory.  But in R, the successive directories must be separated by either / or \\ because a single backslash, \, has a different meaning in R;  thus, c:/temp/temp1.rda and c:\\temp\\temp1.rda are legitimate ways to designate this file in R.  By the way, it is assumed in this context that there already exists a directory called c:\temp on your hard drive; if it doesn't already exist, you will have to create a c:\temp directory before trying to use R to save files to this directory.    

(iisv0002)   WARNING:  If there already exists a file called c:\temp\temp1.rda on your computer, then the save command #1RefTo: r0001 will delete this file and replace it with a new file that contains, xx, yy, mm, and sq.fn as its only objects.  So if there already exists a file with this name and you don't want to delete it, then you need to change its name or change the name of the to-be-created file before issuing the save command.  

TOC

5s0005.     Putting R files on the search path:  The load and attach functions

Suppose that c:\temp\temp1.rda is a file of R objects that was previously created with a save command (as explained in Section 4RefTo: s0007).  To put this file on the search path, use any of the following commands:

Table 3t0003

 attach("c:\temp\temp1.rda", pos = 2)    #2r0002

Puts c:\temp\temp1.rda on the search path in position 2. 

 attach("c:\temp\temp1.rda")             #3r0003

#2RefTo: r0002 and #3RefTo: r0003 have exactly the same effect because pos = 2 is the default position for attaching a file.

 

Either #2RefTo: r0002 and #3RefTo: r0003 will produce the path shown in Table 2RefTo: t0002.  As noted in Section 4RefTo: s0007, attaching c:\temp\temp1.rda to the search path makes all objects in temp1.rda available for R computing. 

The load function provides an alternative way to access the R objects in a file.  Basically, load loads the objects in a file into .GlobalEnv (replacing any objects that have the same name as objects in the loaded file), whereas attach puts a file on the search path without changing .GlobalEnv.  For example, recall that the objects in temp1.rda are:  mm, sq.fn, xx, and yy and mm is a matrix as shown below:

 

> mm

   [,1] [,2] [,3]

xx    1    3    5

yy    2    4    6

 

Let's create a different version of mm:

 

  mm = "This is a new mm"

  mm

  [1] "This is a new mm"

 

Now load temp1.rda and see what is the value of mm. 

 

  load("c:/temp/temp1.rda")

  mm

 

> mm

   [,1] [,2] [,3]

xx    1    3    5

yy    2    4    6

 

The output shows that loading c:\temp\temp1.rda has the effect of replacing the current value of mm ("This is a new mm") to the matrix version of mm. 

Summary:  If C:\path\rfile.rda is a file of R objects (created by the save function), then the attach function can be used to put this file on the search path (makes all objects in C:\path\rfile.rda available for use in R), and load can be used to put all objects in C:\path\rfile.rda in .GlobalEnv, replacing any objects with the same names. 

TOC

6s0011.   Taking files off of the search path

The detach function is used to detach a file from the search path. 

 

TOC

7s0010.   How to move objects from .GlobalEnv to other files on the search path. 
How to delete objects from files that are on the search path.

 

Table 4t0004

> search()

 [1] ".GlobalEnv"             "file:c:/path/rfile.rda" "package:stats"        

 [4] "package:graphics"       "package:grDevices"      "package:utils"        

 [7] "package:datasets"       "package:methods"        "Autoloads"            

[10] "package:base" 

 

 

 

TOC

8s0004.   What happens when you start up R?  TOC

#####

To change the default working directory that is associated with a R icon, right mouse click on the R icon, choose the Shortcut tab, and change the "Start in" field to the directory of choice. 

 

TOC

 

 

9s0003.   What is the sequence of functions that run automatically when R starts up?  TOC

 

TOC

10s0001. Rprofile.site - the global R configuration file

* C:\Program Files\R\R-2.10.1\etc\Rprofile.site

 

TOC

11s0002. .First function - the startup configuration function

* .First runs automatically whenever R starts up.  The .First function runs immediately after any actions that are contained Rprofile.site. 

 

 

TOC

12s0999. Glossary of Terms

.GlobalEnv

.GlobalEnv is the current working environment as referenced within an R session.  As you create objects during your current R session, they are stored by default in .GlobalEnv.  If you save your session as you exit R, the objects in .GlobalEnv will be saved permanently in a file called .RData in the working directory associated with the R icon. 

.RData

.RData is a file on a computer's hard drive that contains the R functions and objects that were created during a R session.  See Section

TOC

 

 

 

 



[1]     Recent versions of R allow functions and data objects to have the same name without ambiguity.  Thus, when R encounters new.name(x = 12), R knows that new.name refers to a function named "new.name" because the parentheses could only follow a function.  When R encounters, y <- x + new.name, R knows that "new.name" refers to a data object because something is added to it.  R permits a function and data object to have the same name because it is always clear from context whether the name designates a function or data object.