Regression and Correlation
 
  • bivariate relationships between interval scale variables

  •  

     

    goals:
     

  • display associations graphically

  •  
  • predict d.v. from i.v.

  •  
  • measure strength and direction of association

  •  

     
     
     
     
     
     
     

  • use scatterplot to display bivariate interval data

  •  
  • X-axis = i.v., Y-axis = d.v.
  • using X to predict Y

  •  
  • simple linear regression equation: Y = a + bX
  • Issues to consider:
     

  • influence of outliers - outliers can suppress or drive relationships

  •  

     

  • nonlinear relationships - linear regression based on assumption of linearity

  •  

     

  • extrapolation - predicting y based on regression equation and x outside of observed range of x values - be careful!

  •  
  • heteroschedasticity - errors in prediction vary over range of values

  •  

     

    Correlation
     
     

  • strength of association - degree of clustering about line

  •  

     

  • r2 as a PRE measure - strength of association

  •  

     
     
     

    Proportional Reduction in Error (PRE)
     
     

  • PRE = (E1 - E2) / E1

  •  

    r2:
     
     
     

  • E1 = predict values of Y based on Y bar (mean)

  •  

     

  • E2: predict Y based on X and regression line

  •  


    Correlation: r
     
     

  • Pearson correlation coefficient or product-moment coefficient

  •  
  • indicates how closely observed values fall around regression line/clustering about line & direction of association

  •  
  • r = square root of r2 and takes sign of slope

  •  
  • ranges between -1 and 1
  • strength of r

  •  
  • which is stronger:  -.2 or +.1?  -.5 or +.75?

  •  
     
     
     
  • absolute value of r indicates strength

  •  
  • general guide for interpreting strength of r (absolute value)

  •  


    0 - .2 =  weak, slight

    .2 - .4 = mild/modest

    .4 - .6 = moderate

    .6 - .8 = moderately strong

    .8 - 1.0 = strong


  • r standardizes the degree of association, regardless of units of measurement

  •  
  • one approach to computing r:
  • standardize each case's value on x and y:
  • for each case, multiply standardized X value by standardized Y value, sum the products, and divide by   n - 1
  • r appropriate for describing for linear relationships only

  •  

     

  • restricted range on one or both variables attenuates correlation

  •  
  • outliers influence correlation, too

  •  

     

    Ecological correlation
     

  • correlation between rates or averages

  •  
  • units of analysis = some kind of aggregate (e.g., neighborhoods, companies, states, countries)

  •  
  • interpret carefully; may inflate degree of association between underlying conceptual variables

  •  

    Nonlinear relationships and transformations
     

  • one response: transform the data on one or both variables to make relationship more linear

  •  
  • many different transformations depending on the circumstances - logarithms, square roots, squaring, raising to higher powers, etc.
  • Investigating spurious, intervening, & conditional relationships with correlation
     

  • correlation of interest rAB

  •  
  • more sophisticated techniques exist for evaluating these interpretations; these are simple approaches