STAT220: Midterm review (Weeks 1,2,3,4)
1. Design of studies
Controlled experiments
- Study units, population, sample, variable, response, factors
- treatment, treatment group, control group, placebo.
- blind and double-blind studies, randomized study, stratified experiment
Observational Studies
- Controlling for variables, confounding factors, stratified study
- Distinguishing between Controlled Experiments and Observational Studies
Sample Surveys
- Sampling unit, population, parameter, sample, statistic
- Sources of bias: selection bias, nonresponse bias, response bias,
- A large sample does not protect against selection bias
- Sampling procedures --- simple random sample, stratified sample,
quota sample
--- describe, identify, strengths and weaknesses
2. Displaying data
Types of variables
- qualitative vs. quantitative, discrete vs. continuous
- Categorical variables: nominal, ordinal; ordered categorical, interval scale
Histograms
- Understanding histograms: the shape tells us the
relative counts/proportions in different intervals
--- Area respresents the percent or proportion
--- Scale on the vertical axis are percent (or proportion) per horizontal-axis unit.
- Forms of histogram display:
bars of equal/different widths, back-to-back, stacked
Pie charts, Bar charts, Stem-and-leaf diagrams
3. Summary statistics, and the Normal curve
Numerical summaries
- Measures of center: Median, Average, Mean, Percentiles.
- Measures of spread: Standard Deviation (SD), Range,
inter-quartile range,
- How to define them, find their values, ``guesstimate'' them from histograms.
The normal curve
- Standard units (SD units), z-scores
- Normal approximation for data
- Normal curve arithmetic:
Finding z-scores, percentiles, percentages
4. Correlation and Association
Scatter diagrams
- -- make, read, interpret:
- describe basic shapes, identify outliers, non-linearity
- Identify independent and dependent variables in a problem and
on scatter diagrams
Association and correlation
- Associations may be positive, negative, weak, strong
- Correlation coefficient, r : a number between -1 and +1
- Interpretation of r -- measures the degree of clustering of the scatter
diagram about the SD line
- Sensitivity of the correlation coefficient to outliers and to non-linearity
Five summary statistics
- for data on 2 variables: 2 means, 2 SD's, and r
- Association is not cause! -- confounding factors, time trends, etc.