Another science project on El Niño

A project that requires some programming and math knowledge would be to get temperature and precipitation data for your local area, and make a graph showing the correlation between your area and an index of El Niño. This requires the ability to use a graphics program to make plots, and the ability to do calculations with data (can be done in Basic or in a spreadsheet). I also assume that you know how to calculate a standard deviation (7th or 8th grade math).

Getting the data
Get your local data from NOAA's Climate Diagnostic Center in Boulder, CO: http://www.cdc.noaa.gov/USclimate/USclimdivs.html

(By the way you can make some cool plots from this page).

Click "New webpage: get timeseries from climate division dataset"
Click "US climate division temp, precip and PDSI", then click "Go to selection options"
Select your state, then under the "Division" pulldown menu, pick "State area averaged"
Enter the years you want in the boxes. Select "Mean" (that means the mean value of temp or precip for each month). Click "Create timeseries".
There you have it! A complete table of the your state's monthly-average temp or precip for the years you select. (See the bottom of the table for the months and title). Now you can input this stuff into your spreadsheet. Alternatively, each state is divided into several climate divisions covering a smaller area. You could pick those separately if you want. In that case, click "Map of climate divisions", pick your state from the menu, and you can then choose the division where you live, and use that instead of the whole state average.

Get the Southern Oscillation Index (SOI) showing El Niño (see question 16 on my El Niño FAQ page for a description of this).
The SOI monthly values can be downloaded from http://www.cpc.ncep.noaa.gov/data/indices (-> Southern Oscillation Index. You have to get 1876-1950 and 1951-1999 separately).

What the plot will look like:
A plot showing the relation between the El Niño/La Niña variations and Seattle-area winter temperature and precipitation.

The plot is a scatter diagram. A scatter diagram means putting SOI on the x-axis, and your winter-average temperatures or precipitations on the y-axis. For each year you have a pair. Use those pair values to define the (x,y) position of a dot for each year. It will look like a cloud of 50 dots. This is exactly what I did in my plot for Puget Sound. For this purpose, you want to get a single value of the SOI, Temperature and Precipitation for each winter season (November-March). You need at least 30 or more years to get a reasonable result. The reason for the winter season averaging is that you only want to include the winter conditions in US locations, since the summer conditions have little to do with El Niño. For example, suppose one August was really rainy. That would have nothing to do with El Niño (probably). But if you included that in your averaging, then that particular year would seem rainy, even though the winter might not have been, and winter is what you are really trying to analyze. So when you did the correlation (described below), or made the kind of dot diagram I show in my figure, the extra rain would be misleading. In general, the kind of correlations you are doing always have a lot of scatter (since real weather is affected by many things). The more scatter, the harder it is to see the effects of what you are trying to analyze. So if you know in advance that some part of the variability is not connected to what you are studying, then it is greatly to your advantage to remove it first. So for that reason only average the temp and precip and the SOI only over the winter season. Note, if you live in Hawaii, then you are too lucky!

A few things to note (when you look at the plot):
The numbers "r" on the plots are the correlation values (see below). Note that either a positive or a negative correlation is equally important and show a useful signal. It is the near-zero values that don't mean much. Correlations closer to 1 or -1 mean the dots fall closer to the regression line (the straight line that approximates the scatter of the dots (described below).
You can see by these plots that the more values you have the more meaningful it gets. Not every El Niño or La Niña behaves the same way. You have to have at least a few of them to see the pattern.

Correlation
Do you know about correlation? Correlation is the basic statistical measure of the relation between two variables. Think of "co" meaning "joint", as in "cooperate", "coexist", "copilot", "coeducation", combined with "relation". The "correlation", also called "correlation coefficient" is a measure of how similar two time series are. The correlation is a number between -1 and 1. A correlation of 1 means the relation is perfect, a correlation of 0.5 means a partial relation, a correlation of 0 means no relation, while a correlation of -1 means one thing varies exactly oppositely to another.

How about some examples. There is a very high correlation (near 1) between sea surface temperature (SST) in the eastern Pacific and rainfall in Ecuador. When SST is high (namely during El Niño), then almost always there is also high rainfall in Ecuador. Such a correlation would be about 0.9 (in the real world, one rarely finds perfect correlations). Conversely, the Ceara region of NE Brazil typically has drought during El Niños. The correlation between east Pacific SST and Ceara rainfall would be strongly negative (meaning when SST is high, Ceara rainfall is low). Where I live (in Seattle), El Niño usually brings a moderate drought, but we are far from the equatorial Pacific so the effect is only partial. We find a weakly negative correlation (about -0.3), which indicates that El Niño has some influence but it is not the whole story. If you took two unrelated things, say, the number of student absences in a high school in Seattle compared to the number of computers sold in New York, the correlation would be near 0. Remember, a correlation of -1 is just a meaningful as a correlation of 1; it is the near-zero values that do not indicate a connection.

In case you haven't covered correlation coefficients in math, I can explain it. How about standard deviation? I hope so. Otherwise I think you'll have to ask your math teacher for a book or formula. But if you know standard deviation then I can explain the rest.

First, you have to have two time series. The requirement about the two time series is that they both have values for all the same times. That is, both have the same start and end date and both be monthly, or daily, or whatever they are be the same. If they're not, for example you have monthly SOI but daily rainfall, then you have to make them the same, say by turning the rainfall into monthly totals. If the SOI goes from 1876 to 1999, but the rainfall goes from 1932 to 1998, then you're limited by the rainfall and can only do the calculation for the smallest date range.

So, you have two time series. Call one A, so it's values are A1, A2, A3, etc. Call the other one B: B1, B2, B3. Suppose there are N values of each, at all the same times. That is, A16 refers to the same month as B16, etc. If there are missing values you can leave those out, but you must leave out the corresponding ones in both. For example, if A16 is missing, then you must leave out B16. You must end up with two lists of values, with both lists having the same length.

Now, the correlation between A and B, denoted R, is:

R(A,B) = (Sum(A*B)/N - Average(A)*Average(B)) / (SD(A)*SD(B))

where:

Sum(A*B)/N means for each month you find the product A*B and add all those up over the whole time series, then divide by N SD(A) and SD(B) are the "standard deviation" of A and B. Most scientific calculators have these functions, but since you will have hundreds of numbers that isn't practical. You can also do these calculations in a spreadsheet program on a PC, and that can also read in the values you have downloaded from those web sites, and then can make the plots. A scientific graphics program would be better if it is available, check at your school about that, but if not then an ordinary spreadsheet (like Excel) can do the basic stuff. It should be able to do correlations without you haveing to worry about the details.

The regression line:
This is the line in my plot that is a (crude) approximation of the relation between Seattle conditions and the SOI. OK, you have bunch of pairs (yearly values) of the SOI and your local winter temps. Make the scatter diagram from the yearly winter values as described above.

The regression line is the "best-fit" line to that cloud of dots, and the true meaning of the correlation coefficient is the measure of how close the dots fall to the line. For a correlation of 1, all the dots would be on the line. For a correlation of zero, the cloud would be shapeless and no line would be a good representation of the dot positions. So how do you calculate the regression line? Easy (now that you have the correlation):

The "slope-intercept" formula for a line is y = mx + b (I assume you've had this formula in math class). "m" is the "slope" and "b" is the "y-intercept". You know this, right? OK, for the regression line, the slope is:

m = r*s2/s1

where r is the correlation and s2 and s1 are the standard deviations of variables 2 and 1 respectively. Here variable 2 is your local temperatures (to be predicted) and variable 1 is the SOI (the predictor). Keep track of which is 1 and which is 2, or you'll get messed up!

The y-intercept is:

b = a2 - m*a1 = a2 - (r*s2/s1)*a1

where a2 and a1 are the average values of your temperature and the SOI, respectively. (For the second equation for b I just substituted the equation for m from above).

Easy, right? Once you have the correlation. So now you have the formula for the regression line. Just plot that on top of your scatter diagram (like I did in my plot).

Making a prediction from the regression line:
OK, finally you want to make a prediction. The best prediction, given the available data, is the regression line. That is, if you think you know what the SOI will be for next winter, just make a dot along the line at that value of the SOI. Then read across to see what local temperature that corresponds to! So if you think the SOI will be high this winter, you would therefore predict that your temperatures will be also, and the regression line shows you exactly how high. Piece of cake.

Getting the range of temperatures predicted (that is, how confident you are of your prediction) is harder, and requires some more advanced statistical theory and a book of mathematical tables. Remember, you have uncertainty in what you think the SOI will be, so you don't know exactly where to put that dot, and then you also have the uncertainty due to the fact that the dots don't all fall exactly on the line. But in general, the correlation measures how confident you can be. Going back to the example of a correlation of zero, which would be found in a round cloud of dots for which no line would be "best", you would have zero confidence. For a correlation of 1 (or -1, remember a perfect negative correlation is just as good in the sense of the dots falling exactly along a line), the confidence would be high.

Good luck!

Back to El Niño FAQ page
Back to Kessler home page