C R Cheatsheet

This section gives a very brief overview of the most useful functions, and links to sections where those are discussed more in depth.

C.1 Rstudio Keyboard shortcuts

Note: some of these are system-dependent. See the complete list at Posit’s page, also available from menu Help -> Keyboard Shortcuts.

  • <-, the assignment operator: Alt - - (alt-minus).
  • %>%, the pipe operator: Ctrl + Shift + M
  • Open new script: Ctrl + Shift + N
  • zooming/focusing (see the View -> Panes menu)
    • switch to source: Ctrl + 1
    • zoom to source: Ctrl + Shift + 1
    • switch to console: Ctrl + 2
    • zoom to console: Ctrl + Shift + 2
    • bring back the default 4-pane view: Ctrl + Shift + 0
  • Work with code
    • indent: Ctrl - I
    • comment/uncomment: Ctrl + Shift + C
  • run program (hover mouse over the buttons):
    • source with echo: Ctrl + Shift + Enter
    • source without echo: Ctrl + Shift + S
    • run selection/statement: Ctrl + Enter
    • save current file: Ctrl + S (asks for name if saving for the first time).
  • Rmarkdown
    • insert code chunk: Ctrl + Alt + I
    • knit: Ctrl - Shift - K

C.2 Handling packages

  • install.packages, e.g. install.packages("lubridate"): install the package (library). Do it once on each computer. See 2.7
  • library (library(tidyverse)): load the package. Once per R session. See 2.7

C.3 Loading and creating data

  • read_delim, e.g. titanic <- read_delim("titanic.csv"): load a csv (delimited) file, automatically detect the delimiter. See 3.7.
  • c, e.g. grades <- c(4.0, 3.5, 2.4): create a new variable (vector) that contains multiple values.
  • data.frame, e.g.  students <- data.frame(name=c("Ah", "Yuval", "Shaykh"), grades=c(4.0, 3.5, 3.0)): create a new data frame

C.4 Describing data

  • nrow (nrow(titanic)): number of rows (observations) in the dataset. See 6.2.1.
  • ncol (ncol(titanic)): number of columns (variables) in the dataset. See 6.2.1.
  • names (names(titanic)): list of variables names in the dataset. See 6.2.1.
  • head (titanic %>% head(3)): show the first few observations of the dataset. See 6.2.1.
  • tail (titanic %>% tail(3)): show the last few observations of the dataset. See 6.2.1.
  • summary (titanic %>% pull(age) %>% summary()): provide a brief description of the variables (or dataset). See 6.4.
  • counting missings titanic %>% pull(age) %>% is.na() %>% sum() tests if the values are missing, and counts all that are. See 6.4.
  • is.na tests if certain values are missing (but does not count them). See 6.4.
  • table (titanic %>% pull(sex) %>% table()) compute frequency table of a variable (or pivot table of two variables). See 6.4.

C.5 Selecting observations

  • sample_n (titanic %>% sample_n(3)): show a random subset of the dataset. See 6.2.1.
  • pull (titanic %>% pull(fare)): extract a single variable from the data frame. See 6.2.2.

C.6 Computing

  • sum (titanic %>% pull(fare) %>% sum(); titanic %>% pull(age) %>% is.na() %>% sum()) compute sum, or count cases where a condition holds. See 6.4.
  • min, max, range (titanic %>% pull(fare) %>% min(); titanic %>% pull(age) %>% range(na.rm=TRUE)) compute minimum, maximum, or both. Extra argument na.rm=TRUE makes this to ignore NA-s. See 6.4.

C.7 dplyr

C.7.1 main functions

See Section 5.3.

  • select: select desired columns (variables). Useful to avoid unnecessary clutter in your data.
  • filter: filter (keep) only desired rows (observations). In this way we narrowed our example analysis down to just male survivors.
  • arrange: order observations by descending/ascending order by some sort of value, e.g. by age.
  • mutate: compute new variables, or overwrite existing ones.
  • summarize: collapse data down to a small number of summary statistics.

C.7.2 Comparison operators for filtering

See Section

  • ==: equality, as filter(survived == 1)
  • >: greater than, as filter(age > 60)
  • >=: greater than or equal, e.g. filter(age >= 60).
  • <: less than, as filter(age < 60)
  • <=: less than or equal, e.g. filter(age <= 60).
  • !=: not equal. For instance, filter(embarked != "S").
  • %in% c(...): only keep observations where the value is in a given list of values, as filter(embarked %in% c("C", "Q")).
  • &: logical “AND”, as filter(sex == "male" & age >= 20) for adult male
  • |: logical “OR”, as filter(sex == "female" | age <15) for a woman or a child.
  • !: logical “NOT”, as filter(!(sex == "female" | age <15)) for someone who is neither woman nor a child. Note the double parenthesis.

C.8 Data cleaning and processing

C.8.1 Converting into different formats

  • as.numeric (harden %>% mutate(FG = as.numeric(FG))) converts a text (character) variable to numbers. If the column is already numeric then it does nothing.
  • as.character (harden %>% mutate(FG = as.numeric(FG))) converts a numeric (or factor) column into a character column. If it is already a text column then nothing happens.

C.8.2 Other

  • separate (harden %>% separate(MP, into=c("min", "sec"), convert=TRUE)) separates a text column into multiple columns based on a separator (default: a non-letter/non-numeric character). See Section 11.5.