Chapter 6 Lists

This chapter covers an additional R data type, called lists. Lists are in many ways similar to atomic vectors (they are “generalized vectors”!), but can store more types of data and more details about that data (with some cost). Lists are another way to create R’s version of a Map data structure, a common and extremely useful way of organizing data in a computer program. Most importantly: lists are used to create data frames, which is the primary data storage type used for working with sets of real data in R. This chapter will cover how to create and access elements in a list, as well as how to apply functions to lists or vectors.

Despite being in many ways similar to atomic vectors, list syntax is less intuitive and may be quite confusing for beginners. In particular the single-bracket and double-bracket notation does not feel intuitive, and the fact that R prints lists in a lot less intuitive way does not help either.

6.1 What is a List?

A List is a lot like an atomic vector. It is also a one-dimensional positional ordered collection of data. Exactly as in case of atomic vectors, list elements preserve their order, and they have a well-defined position in the list. However, lists have a few major differences from vectors:

  1. Unlike a vector, you can store elements of different types in a list: e.g., a single list can contain numeric data and character string data, functions, and even other lists, all of it at the same time. A particularly useful feature of lists is that its component can be a vector, i.e. not just a single number or word, but a sequence of numbers and words.

  2. Because lists can contain any type of data, they are much less efficient as vectors. The vectorized operations that can handle atomic vectors on the fly will either fail completely in case of lists, or if they work they may be substantially slower. Hence one should prefer atomic vectors over lists if possible.

  3. Elements in a list can be named as the elements of vectors, but unlike in case of vector, there exists a convenient shorthand, $-construct, to extract named elements from lists.

Lists are extremely useful for organizing data. They allow you to group data in a logical manner, e.g. you can collect person’s name (character), job title (character), salary (number), and whether they are in a union (logical)—into a single list. You even don’t have to remember whether the person’s name or title was the first element! In this sense lists can be used as a quick alternative to formal classes, objects that can store heterogeneous data in a consistent way. This is one of the primary uses of lists.

R prints lists in a distinct way. Below are three examples: a vector, an unnamed list, and a named list. All contain only three elements, 1, 2, and 3.

## vector
c(1, 2, 3)
## [1] 1 2 3
## unnamed list
list(1, 2, 3)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## named list 
list(a=1, b=2, c=3)
## $a
## [1] 1
## 
## $b
## [1] 2
## 
## $c
## [1] 3

As one can see, the vector printout is rather easy to understand, but lists require a more careful look. R prints lists in a somewhat unintuitive way because each element may be a different object, including a vector or function. Such object cannot easily be printed in a single line, unlike vectors.

Exercise 6.1 Create a vector of three arguments 1, 2:4 and 5 using the combine function c(). Create a list using the same three arguments using list() function. Print both and explain the difference.

See the solution

6.2 Creating Lists

An intuitive and handy way to create lists is to use the list() function and pass it any number of arguments (separated by commas) that you want to put into that list—this is similar to the c() function for creating vectors.

However, if your list contains heterogenous elements, it is usually a good idea to specify the names (or tags) for each element in the list in the same way you can give names to vector elements in c()—by putting the name tag (which is like a variable name), followed by an equal symbol (=), followed by the value you want to go in the list and be associated with that tag. For example, you may put an employee data into a list as

person <- list(name = "Ada", job = "Programmer", salary = 78000,
               union = TRUE)
person
## $name
## [1] "Ada"
## 
## $job
## [1] "Programmer"
## 
## $salary
## [1] 78000
## 
## $union
## [1] TRUE

This creates a list of 4 elements: a character string "Ada" named name, character string "Programmer" named job, a number 78000 named salary, and logical TRUE named union. The output lists all component names following the dollar sign $ (more about it below), and prints the components themselves right after the names.

  • Note that you can have vectors as elements of a list. In fact, each of these scalar values are really vectors (of length 1) as indicated by [1] preceeding their values!

  • The use of the = symbol here is an example of assigning a value to a specific named argument. You can actually use this syntax for any function (e.g., rather than listing arguments in order, you can explicit “assign” a value to each argument), but it is more common to just use the normal order of the arguments if there are only a few.

If you need, you can see names of the list elements with names() function:

names(person)
## [1] "name"   "job"    "salary" "union"

This is often a very useful way for understanding structure of an unfamiliar list as the names quite often suggest the meaning of the components.

It is possible to create a list without tagging the elements. The printout is in this case a little less intuitive, and one has to keep track which element is at which position:

person2 <- list("Ji", 123000, FALSE)
person2
## [[1]]
## [1] "Ji"
## 
## [[2]]
## [1] 123000
## 
## [[3]]
## [1] FALSE

Note that instead of the $name-tags we see element indices in double brackets like [[1]] (more about it below).

Unnamed lists are common when all elements are similar, e.g. we can create a list of all employees:

employees <- list(person, person2)

Here it is little need to keep track of the names as all elements are employees and contain their personal data.

Exercise 6.2 Create and print out the resulting list employees. How does the output look like? Can you explain why does it look like what you see?

See the solution

If needed, one can assign names to an existing list using names(list) <- construct:

names(person2) <- c("name", "income", "membership")
person2
## $name
## [1] "Ji"
## 
## $income
## [1] 123000
## 
## $membership
## [1] FALSE

Making name-less lists and assigning names later is usually more error-prone and harder way to make lists manually, but when you automatically create lists in your code, it may be a very good option.

Finally, empty lists of given length can also be created using the general vector() function. For instance, vector("list", 5), creates a list of five NULL elements. This is a good approach if you just want am empty list to be filled in a loop later.

6.3 Accessing List Elements

There are four ways to access elements in lists. Three of these reflect atomic vector indexing, the $-construct is unique for lists. However, there are important differences.

6.3.1 Indexing by position

You can always access list elements by their position. It is in many ways similar to that of atomic vectors with one major caveat: indexing with single brackets will extract not the components but a sublist that contains just those components:

# note: this list is not not an atomic vector, even though elements have the same types
animals <- list("Aardvark", "Baboon", "Camel")
animals[c(1,3)]

## [[1]]
## [1] "Aardvark"
## 
## [[2]]
## [1] "Camel"

You can see that the result is a list with two components, “Aardvark” and “Camel”, picked from the the positions 1 and 3 in the original list.

The fact that single brackets return a list in case of vector is actually a smart design choice. First, it cannot return a vector in general—the requested components may be of different type and simply not fit into an atomic vector. Second, single-bracket indexing in case of vectors actually returns a subvector. We just tend to overlook that a “scalar” is actually a length-1 vector. But however smart this design decision may be, people tend to learn it in the hard way. When confronted with weird errors, check that what you think should be a vector is in fact a vector and not a list.

The good news is that there is an easy way to extract components. A single element, and not just a length-one-sublist, is extracted by double brackets. For instance,

animals[[2]]

## [1] "Baboon"

returns a length-1 character vector.

Unfortunately, the good news end here. You can extract individual elements in this way, but you cannot get a vector of individual list components: animals[[1:2]] will give you subscript out of bounds. As above, this is a design choice: as list components may be of different type, you may not be able to mold these into a single vector.

There are ways to merge components into a vector, given they are of the same type. For instance Reduce(c, animals) will convert the animals into a vector of suitable type. Ditto with as.character(animals).

6.3.2 Indexing by Name

If the list is named, one can use a character vector to extract it’s components, exacly in the same way as we used the numeric positions above. For instance

person <- list(first_name = "Bob", last_name = "Wong", salary = 77000, in_union = TRUE)

person[c("first_name", "salary")]
## $first_name
## [1] "Bob"
## 
## $salary
## [1] 77000
person[["first_name"]]  # [1] "Bob"
## [1] "Bob"
person[["salary"]]  # [1] 77000
## [1] 77000

As in case of positional indexing, single brackets return a sublist while double brackets return the corresponding component itself.

A common reason to use undexing by name over dollar notation (see Section 6.3.4) is to use indirect variable names. These are variable names that are stored in a different variable: we store not the value, but name of a list component in a variable. Here is an example:

var <- "last_name"  # want to see 'last_name'
person[[var]]  # 
## [1] "Wong"

The first line here tells which component of the person-list we want to get, currently “last_name”. This is stored into var. The second line display the component, name of which is stored in var.

Why is such approach, indirect variable name, preferable over simpler person[["last_name"]]? This is because sometimes you do not know which component name you need. This happens quite often in code, where the component name is different from dataset-to-dataset, where it should be asked from the user, or maybe you are just writing a function that can perform similar operations for different columns.

6.3.3 Indexing by Logical Vector

As in case of atomic vectors, we can use logical indices with lists too. There are a few differences though:

  • one can only extract sublists, not individual components. person[c(TRUE, TRUE, FALSE, FALSE)] will give you a sublist with first and last name. person[[c(TRUE, FALSE, FALSE, FALSE)]] will fail.
  • many operators are vectorized but they are not “listified”. You cannot do math like * or + with lists. Hence the powerful logical indexing operations like x[x > 0] are in general not possible with lists. This substantially reduces the potential usage cases of logical indexing.

For instance, we can extract all components of certain name from the list:

planes <- list("Airbus 380"=c(seats=575, speed=0.85),
               "Boeing 787"=c(seats=290, speed=0.85),
               "Airbus 350"=c(seats=325, speed=0.85))
                           # cruise speed, Mach
planes[startsWith(names(planes), "Airbus")]  # extract components, names
                           # of which starting with "Airbus"
## $`Airbus 380`
##  seats  speed 
## 575.00   0.85 
## 
## $`Airbus 350`
##  seats  speed 
## 325.00   0.85 

However, certain vectorized operations, such as > or == also work with lists that contain single numeric values as their elements. It seems to be hard to come up with general rules, so we recommed not to rely on this behaviour in code.

6.3.4 Extracting named elements with $

Finally, there is a very convenient $-shortcut alternative for extracting individual components. If you printed out one of the named lists above, for instance person, you would see the following:

person <- list(name = "Ada", job = "Programmer")
print(person)
## $first_name
## [1] "Ada"
## 
## $job
## [1] "Programmer"

Notice that the output lists each name tag prepended with a dollar sign ($) symbol, and then on the following line the vector that is the element itself. You can retrieve individual components in a similar fashion, the dollar notation is one of the easiest ways of accessing list elements. You refer to the particular element in the list with its tag by writing the name of the list, followed by a $, followed by the element’s tag:

person$name  # [1] "Ada"
person$job  # [1] "Programmer"

Obviously, this only works for named lists. There are no dollar notation analogue for atomic vectors, even for named vectors. $ extractor only exists for lists (and such data structures that are derived from lists, like data frames).

You can almost read the dollar sign as like an “apostrophe s” (possessive) in English: so person$salary would mean “the person list’s salary value”.

Dollar notation allows list elements to almost be treated as variables in their own right—for example, you specify that you’re talking about the salary variable in the person list, rather than the salary variable in some other list (or not in a list at all).

person <- list(first_name = "Ada", job = "Programmer", salary = 78000, in_union = TRUE)

# use elements as function or operation arguments
paste(person$job, person$first_name)   # [1] "Programmer Ada"

# assign values to list element
person$job <- "Senior Programmer"  # a promotion!
print(person$job)  # [1] "Senior Programmer"

# assign value to list element from itself
person$salary <- person$salary * 1.15  # a 15% raise!
print(person$salary)  # [1] 89700

Dollar-notation is a drop-in replacement to double-brackets extraction given you know the name of the component. If you do not—as is often the case when programming—you have to rely on double bracket approach. You cannot use dollar notation with indirect variable names, the example from Section 6.3.2 will give NULL as the list has no component called var.

var <- "first_name"
person$var
## NULL

6.3.5 Single vs. Double Brackets vs. Dollar

The list indexing may be confusing: we have single and double brackets, indexing by position and name, and finally the dollar-notation. Which is the right thing to do? As is so often the case, it depends.

  • Single bracketss extracts sublists. It is the most powerful and universal way of indexing lists. It works in a very similar fashion than vector indexing. The main caveat here is that it returns a sub-list, not a vector. This is usually not what you want if you are out for a single component, but this is all you can do if interested in multiple components in a single go. It allows indexing by position, by names, and by logical vector.

    In some sense it is filtering by whatever vector is inside the brackets (which may have just a single element). In R, single brackets always mean to filter the collection where the collection may be either atomic vector or list. So if you put single-brackets after a collection, you get a filtered version of the same collection, containing the desired elements. The type of collection you get is the same–if it was list, you get a list, if it was an atomic vector, you’ll get an atomic vector.

Watch out: In vectors, single-bracket notation returns a vector, in lists single-bracket notation returns a list!

  • Single bracket extracts sublists.

  • Dollar notation is the quickest and easiest way to extract a single named component in case you know it’s name. It does not allow for indirect variable names.

  • Double brackets is very much a more verbose alternative to the dollar notation. It returns a single component exactly as the dollar notation.

    Doble brackets allow one to use indirect variable names, to decide later which components to extract, see Section 6.3.2. (This is terribly useful in programs!) For instance, we can decide if we want to use someones first or last name:

person <- list(first_name = "Bob", last_name = "Wong", salary = 77000)
name_to_use <- "last_name"  # choose name (i.e., based on formality)
person[[name_to_use]]  # [1] "Wong"
## [1] "Wong"
name_to_use <- "first_name"  # change name to use
person[[name_to_use]]  # [1] "Bob"
## [1] "Bob"

Note: you can often hear that double brackets return a vector. This is only true if the corresponding element is a vector. But they always return the element!

We recap this section by an example:

animal <- list(class='A', count=201, endangered=TRUE, species='rhinoceros')

## SINGLE brackets returns a list
animal[1]
## $class
## [1] "A"
## can use any vector as the argument to single brackets, just like with vectors
## get a list
animal[c("species", "endangered")]
## $species
## [1] "rhinoceros"
## 
## $endangered
## [1] TRUE
## DOUBLE brackets returns the element (here its a vector)!
animal[[1]]  # [1] "A"
## [1] "A"
## Indirect variable names only work with brackets:
## (both single/double brackets)
var <- "endangered"
animal[[var]] 
## [1] TRUE
## Dollar notation is equivalent to the double brackets
animal$class  # [1] "A"
## [1] "A"

Finally, all these methods can also be used for assignment. Just put any of these construct on the left side of the assignment operator <-. See Section 6.4.

6.4 Modifying Lists

As in the case with atomic vectors, you can assign new values to existing elements. Lists also have a dedicated syntax to remove elements. (Remember, you can always “unselect” an element in a vector, including list, by using negative positional index.)

You can add new elements to a list simply by assigning a value to a component name (or position):

person <- list(name = "Ada", job = "Programmer",
               salary = 78000, union = TRUE)
person$age  # no 'age' element
## NULL
person$age <- 40  # assign age
person$age
## [1] 40
## new element at a given position:
person[[7]] <- "7th component"  # element 6 will be NULL

This parallels fairly closely with atomic vectors.

You can also use double-bracket notation:

person[["zip"]] <- 98195  # add zip
var <- "name"
person[["name"]] <- "Ada Huang"  # modify exiting component
person
## $name
## [1] "Ada Huang"
## 
## $job
## [1] "Programmer"
## 
## $salary
## [1] 78000
## 
## $union
## [1] TRUE
## 
## $age
## [1] 40
## 
## [[6]]
## NULL
## 
## [[7]]
## [1] "7th component"
## 
## $zip
## [1] 98195

Note that we used indirect access to modify name. Indirect approach works here exactly the same way as when extracting elements.

You can remove elements by assiging the special value NULL to their tag or index:

l <- list("A", 201, TRUE)
l[[2]] <- NULL  # remove element #2
l  # '201' is gone
## [[1]]
## [1] "A"
## 
## [[2]]
## [1] TRUE

There is no analogue here to atomic vectors.