Bivariate relationships:
Categorical data
-
relationship = association
-
contingency table = crosstabulation
= cross-classification
-
intersecting frequency distributions
-
rows, columns
-
cells
-
marginals - row and column
totals
-
r x c designation of a table
- # rows x # columns
-
computing row or column percentages
(conditional percentages)
-
goal of bivariate analysis:
measure association or relationship
Assessing relationships in
bivariate data
1) is there a relationship?
statistical independence/dependence
-
detecting relationship by
comparing percentage distributions for one variable across different categories
of a second
-
almost always some relationship
2) strength of relationship?
-
simple assessment: compare
percentage difference across the different categories of i.v.
3) direction of relationship?
-
cases' values on one variable
go up, values on other variable go up (positive) or down (negative)
-
e.g., positive = height and
weight
-
e.g., negative = size of
vehicle and gas mileage
-
can be defined for ordinal
and interval variables only
2 x 2 tables
binary/dichotomous variable
(yes/no re reference value; two values)
measures of association
relative risk (RR) = % in
reference category of dv for one category of iv divided by % in reference
category of dv for other category of iv
odds ratio (OR) = odds of
falling in reference category to not falling in reference category for
one category of iv to odds for other category of iv
if row variable is i.v. and
column variable is d.v., and right column is reference category of d.v.
(this format is different from text), then
RR = (b / (a + b)) / (d /
(c + d))
increased/decreased risk
=
(RR - 1) x 100%
OR = (b/a) / (d/c)
Perry Preschool (Schweinhart,
Barnes, Weikart)
randomized experiment:
preschool vs. control (no preschool)
outcome at age 27:
| |
never arrested
|
arrested 1+ times
|
|
|
preschool
|
25 (43%)
|
33 (57%)
|
58
|
|
control
|
20 (31%)
|
45 (69%)
|
65
|
| |
45
|
78
|
113
|
Association can be interpreted
as causation in experimental studies
be careful w/ percentage
increases--also important to consider absolute percentage to interpret
fully (issue of baseline risk)
Pitfalls in analyzing
associations
Simpson's paradox
-
direction or strength of
relationship changes when aggregating across natural groups that should
be treated individually
Ignoring denominators/Incomplete
crosstabulations
-
"More pedestrians are killed
crossing with the signal than against it."
-
"More people get killed crossing
streets with crosswalks than those without."
"In 1993, juveniles arrested
in Orange County, CA, were tested for drug use (urinalysis). Forty-three
percent (43%) tested positive, and a newspaper stated that 'the results
seemed to bolster beliefs that drug use and juvenile crime are related.'"
Multiple interpretations
of associations in observational studies
-
"spurious" relationship
(confounding)- two variables related by their common association
with another variable that each causes them, but the two original variables
not causally linked, even indirectly
-
e.g., ice cream sales and
murder - positive association (weather = key)
-
intervening relationship
- third variable links the independent variable to the dependent variable
-
e.g., race and traffic tickets
-
conditional/interactive
relationship - strength and/or direction of relationship between two
variables differs for different values on third variable
-
e.g., lightning strikes and
forest fires - positive when dry, weakly positive or none when wet
-
causal direction - can go
either/both directions
-
e.g., friends' behavior x
own behavior
Elaboration/partial tables
-
examine third variables that
could account for association (either as spurious or intervening relationship)
or explore conditional relationships
-
create tables between two
variables for cases that fall into particular categories of a third variable
Prosecution of child sexual
abuse cases (Brewer, Rowe, Brewer)
| |
rejected
|
prosecuted
|
|
|
medical evidence
|
19 (44%)
|
24 (56%)
|
43
|
|
no med. evidence
|
43 (61%)
|
28 (39%)
|
71
|
| |
62
|
52
|
114
|
elaboration: partial tables
re seriousness of abuse (conditional relationship?)
less serious abuse (e.g.,
fondling)
| |
rejected
|
prosecuted
|
|
|
medical evidence
|
8 (100%)
|
0 (0%)
|
8
|
|
no med. evidence
|
20 (67%)
|
10 (33%)
|
30
|
| |
28
|
10
|
38
|
more serious abuse (e.g.,
penetration)
| |
rejected
|
prosecuted
|
|
|
medical evidence
|
11 (31%)
|
24 (69%)
|
35
|
|
no med. evidence
|
23 (56%)
|
18 (44%)
|
41
|
| |
34
|
42
|
76
|
Be careful about three
criteria for causality - association, time order, and nonspuriousness (no
confounding)