Bivariate relationships: Categorical
data
-
goal of bivariate analysis: measure
association or relationship
-
crosstabulation = cross-classification
= contingency table
-
intersecting frequency distributions
-
rows, columns
-
cells
-
marginals - row and column totals
-
r x c designation of a table - # rows
x # columns
-
computing row or column percentages
Assessing relationships in bivariate
data
1) is there a relationship?
statistical independence/dependence
-
detecting relationship by comparing
percentage distributions for one variable across different categories of
a second
-
almost always some relationship
2) strength of relationship?
-
crude assessment: compare percentage
difference across the different categories of i.v.
3) direction of relationship?
-
if cases' values high on one variable,
values on other variable high (positive) or low (negative)
-
e.g., positive = height and weight
-
e.g., negative = size of vehicle and
gas mileage
-
can be defined for ordinal and interval
variables only
Multiple interpretations of associations
in observational studies
-
"spurious" relationship - two
variables related by their common association with another variable that
each causes them, but the two original variables not causally linked, even
indirectly
-
e.g., ice cream sales and murder -
positive association (weather = key)
-
intervening relationship - third
variable links the independent variable to the dependent variable
-
e.g., sex and job status among executives
and managers
-
conditional relationship - strength
and/or direction of relationship between two variables differs for different
values on third variable
-
e.g., lightning strikes and forest
fires - positive when dry, weakly positive or none when wet
-
statistical interaction or interactive
relationship
-
causal direction - can go either/both
directions
-
e.g., friends' behavior x own behavior
Association can be interpreted as causation
in experimental studies
Elaboration/partial tables
-
examine third variables that could
account for association (either as spurious or intervening relationship)
or explore conditional relationships
-
create tables between two variables
for cases that fall into particular categories of a third variable
Be careful about three criteria for
causality - association, time order, and nonspuriousness
Pitfalls in analyzing associations
Simpson's paradox
-
direction or strength of relationship
changes when aggregating across natural groups that should be treated individually
Ignoring denominators/Incomplete crosstabulations
-
"More pedestrians are killed crossing
with the signal than against it."
-
"More people get killed crossing streets
with crosswalks than those without."
-
"In 1993, juveniles arrested in Orange
County, CA, were tested for drug use (urinalysis). Forty-three percent
(43%) tested positive, and a newspaper stated that 'the results seemed
to bolster beliefs that drug use and juvenile crime are related.'"