Machine learning using R
1
Introduction
2
(PART) R language
3
R: Programming Language and a Statistical System
3.1
Base language
3.1.1
Variables and assignment
3.1.2
Atomic variables are vectors
3.1.3
Mathematical, logical and other operators
3.1.4
Control structures
3.1.5
Strings and printing
3.1.6
Functions
3.1.7
Categorical variables
3.2
Indexing and Named Vectors
3.2.1
Integer indexing
3.2.2
Logical indexing
3.2.3
Named vectors and indexing by name
3.3
Lists
3.4
Data frames
3.4.1
What is data frame
3.4.2
Workspace variables and data variables
3.4.3
Extracting and assigning individual variables in data frames
3.4.4
Loading data
4
Rmarkdown: literal programming with R
4.1
Creating and using rmarkdown files
4.1.1
Creating rmarkdown files
4.1.2
Knitting (compiling) rmarkdown
4.1.3
R Notebooks
4.2
Rmarkdown syntax
4.2.1
Markdown syntax
4.2.2
Code chunks
4.2.3
Code chunk options
4.3
How to debug rmarkdown documents
4.3.1
Common errors
4.3.2
Debugging strategies
5
Pipes and
dplyr
: the easy way of data manipulation
5.1
R pipes
5.1.1
A motivational story: how to make pancakes
5.1.2
Magrittr pipes
5.1.3
Base-R pipes
5.1.4
Advantages of pipe-based approach
5.2
Basics of
dplyr
package
5.2.1
Basic functions (verbs)
5.2.2
group_by
: grouped operations
5.2.3
Combining
dplyr
functions into a single pipeline
5.3
More advanced dplyr usage
5.4
Limitations of dplyr
6
First steps with data: descriptive analysis
6.1
Basic data description
7
Cleaning and preparing data
7.1
Missing values
7.1.1
Counting and identifying missings
7.1.2
Removing missings
7.1.3
Recoding missings
7.2
Recoding variables
7.2.1
Creating groups of continuous variables
7.2.2
Rearranging existing groups
8
Visualizations: Plotting Data
8.1
Base R plotting
8.2
Plotting data–the
ggplot
way
8.2.1
The main ideas:
aesthetics
, variables and
geometry
8.3
geoms: different kind of plots
8.4
Displaying distributions
8.4.1
Histogram
8.4.2
Density plot
8.4.3
Violin plot plot
8.5
Scales: how are aesthetics and values related
9
String Operations
9.1
The basic functions
9.1.1
grep
9.1.2
sub
and
gsub
9.1.3
strsplit
: tokenizing text
9.1.4
Using the string functions with pipes
9.2
Regular expressions
9.2.1
A few basic classes of regexps
9.3
Examples
9.3.1
Find word context
10
Manipulating data: merging and reshaping
10.1
Merging dataframes line-by-line and column-by-column
10.2
Merging by key (database joins)
10.2.1
What is key
10.2.2
Inner and outer joins
10.3
Reshaping
10.3.1
Wide form and long form data
10.3.2
Reshaping between wide and long form
11
Web Scraping
11.1
Before you begin
11.2
HTML Basics
11.2.1
Tags, elements and attributes
11.2.2
Overall structure
11.2.3
Important tags
11.2.4
Html tables
11.3
Web scraping in R and the rvest package
11.3.1
Example html file
11.3.2
First part: download the webpage
11.3.3
The hard part: navigating the page and extracting data
11.4
Finding elements on webpage
12
Regression Models
12.1
Linear regression
12.1.1
Working With Iris Virginica Data
12.1.2
Manually Compute and Minimize
\(SSE\)
12.1.3
Using R modeling tools
12.1.4
Categorical variables and
factor
12.1.5
Using functions in formulas
12.1.6
Polynomial regression
12.2
Summary
13
Prediction and Predictive Modeling
13.1
Manually predicting regression outcomes
13.1.1
Linear regression
13.1.2
Logistic regression
13.2
Base R modeling tools
13.2.1
Linear regression
13.2.2
Logistic regression
13.3
Using
tidymodels
13.3.1
Basic modeling
13.3.2
Resources
14
Assessing model goodness
14.1
Regression model performance measures
14.1.1
Base-R tools
14.1.2
Tidymodels tools
14.2
Confusion matrix
14.2.1
Base-R tools to display confusion matrix
14.2.2
tidymodels approach
15
Machine Learning Workflow
15.1
Overfitting: random numbers improve results!
15.2
Training-validation split
15.3
tidymodels
workflow
16
Dataset Description
16.1
Hubble
16.2
Iris
16.3
Orange trees
16.4
Titanic
16.5
Treatment
17
Solutions
17.1
R Language
17.1.1
Operators
17.1.2
Strings
17.1.3
Collections
17.2
Pipes and
dplyr
17.2.1
Pipes
17.2.2
dplyr
17.3
Cleaning data
17.3.1
Missing values
17.4
ggplot
17.5
Regression models
17.5.1
Linear Regression
17.6
Assessing model goodness
17.6.1
Confusion matrix
Published with bookdown
Machine learning in R
Chapter 2
(PART) R language