C R Cheatsheet
This section gives a very brief overview of the most useful functions, and links to sections where those are discussed more in depth.
C.1 Rstudio Keyboard shortcuts
Note: some of these are system-dependent. See the complete list at
Posit’s
page,
also available from menu Help -> Keyboard Shortcuts
.
<-
, the assignment operator:Alt - -
(alt-minus).%>%
, the pipe operator:Ctrl + Shift + M
- Open new script:
Ctrl + Shift + N
- zooming/focusing (see the
View -> Panes
menu)- switch to source:
Ctrl + 1
- zoom to source:
Ctrl + Shift + 1
- switch to console:
Ctrl + 2
- zoom to console:
Ctrl + Shift + 2
- bring back the default 4-pane view:
Ctrl + Shift + 0
- switch to source:
- Work with code
- indent:
Ctrl - I
- comment/uncomment:
Ctrl + Shift + C
- indent:
- run program (hover mouse over the buttons):
- source with echo:
Ctrl + Shift + Enter
- source without echo:
Ctrl + Shift + S
- run selection/statement:
Ctrl + Enter
- save current file:
Ctrl + S
(asks for name if saving for the first time).
- source with echo:
- Rmarkdown
- insert code chunk:
Ctrl + Alt + I
- knit:
Ctrl - Shift - K
- insert code chunk:
C.3 Loading and creating data
- read_delim, e.g.
titanic <- read_delim("titanic.csv")
: load a csv (delimited) file, automatically detect the delimiter. See 3.7. - c, e.g.
grades <- c(4.0, 3.5, 2.4)
: create a new variable (vector) that contains multiple values. - data.frame, e.g.
students <- data.frame(name=c("Ah", "Yuval", "Shaykh"), grades=c(4.0, 3.5, 3.0))
: create a new data frame
C.4 Describing data
- nrow (
nrow(titanic)
): number of rows (observations) in the dataset. See 6.2.1. - ncol (
ncol(titanic)
): number of columns (variables) in the dataset. See 6.2.1. - names (
names(titanic)
): list of variables names in the dataset. See 6.2.1. - head (
titanic %>% head(3)
): show the first few observations of the dataset. See 6.2.1. - tail (
titanic %>% tail(3)
): show the last few observations of the dataset. See 6.2.1. - summary (
titanic %>% pull(age) %>% summary()
): provide a brief description of the variables (or dataset). See 6.4. - counting missings
titanic %>% pull(age) %>% is.na() %>% sum()
tests if the values are missing, and counts all that are. See 6.4. - is.na tests if certain values are missing (but does not count them). See 6.4.
- table (
titanic %>% pull(sex) %>% table()
) compute frequency table of a variable (or pivot table of two variables). See 6.4.
C.6 Computing
- sum (
titanic %>% pull(fare) %>% sum()
;titanic %>% pull(age) %>% is.na() %>% sum()
) compute sum, or count cases where a condition holds. See 6.4. - min, max, range (
titanic %>% pull(fare) %>% min()
;titanic %>% pull(age) %>% range(na.rm=TRUE)
) compute minimum, maximum, or both. Extra argumentna.rm=TRUE
makes this to ignore NA-s. See 6.4.
C.7 dplyr
C.7.1 main functions
See Section 5.3.
- select: select desired columns (variables). Useful to avoid unnecessary clutter in your data.
- filter: filter (keep) only desired rows (observations). In this way we narrowed our example analysis down to just male survivors.
- arrange: order observations by descending/ascending order by some sort of value, e.g. by age.
- mutate: compute new variables, or overwrite existing ones.
- summarize: collapse data down to a small number of summary statistics.
C.7.2 Comparison operators for filtering
See Section 5.3.2.2
==
: equality, asfilter(survived == 1)
>
: greater than, asfilter(age > 60)
>=
: greater than or equal, e.g.filter(age >= 60)
.<
: less than, asfilter(age < 60)
<=
: less than or equal, e.g.filter(age <= 60)
.!=
: not equal. For instance,filter(embarked != "S")
.%in% c(...)
: only keep observations where the value is in a given list of values, asfilter(embarked %in% c("C", "Q"))
.&
: logical “AND”, asfilter(sex == "male" & age >= 20)
for adult male|
: logical “OR”, asfilter(sex == "female" | age <15)
for a woman or a child.!
: logical “NOT”, asfilter(!(sex == "female" | age <15))
for someone who is neither woman nor a child. Note the double parenthesis.
C.8 Data cleaning and processing
C.8.1 Converting into different formats
- as.numeric (
harden %>% mutate(FG = as.numeric(FG))
) converts a text (character) variable to numbers. If the column is already numeric then it does nothing. - as.character (
harden %>% mutate(FG = as.numeric(FG))
) converts a numeric (or factor) column into a character column. If it is already a text column then nothing happens.
C.8.2 Other
- separate (
harden %>% separate(MP, into=c("min", "sec"), convert=TRUE)
) separates a text column into multiple columns based on a separator (default: a non-letter/non-numeric character). See Section 11.5.