next up previous
Next: Saving data Up: No Title Previous: Saving commands

Loading data files

Your data can be stored on disk in two different forms. Files with the extension .RAW contain the numbers in plain text (ASCII) form. These can be created using any text editor or produced by most statistics packages. Stata can also produce files with a .DTA extension, which contain the observations, labels and other information in a format specific to Stata.

Initially your data will be in the .RAW form and can be read using the infile command. Copy the file esoph.raw (oesophageal cancer data from Ile-et-Villaine) into your temporary folder. Now, select the Command window in Stata and type

(where you would replace thomas with your own temporary folder name). You will see the message

(975 observations read)
in the Results window. Stata has read variables called age alchohol tobacco and case for 975 people from the file. Type describe to see a description of the data set.

Click on the Browse button to view the data in a spreadsheet form. The data are all numeric, with no indication of what the values mean. We want to attach some labels to the values and variables to make things clearer. Type the following:

 label variable age "Age Group"                                                              label variable alcohol "Average alcohol consumption"                          
 label variable tobacco "Average tobacco consumption"                          
 label variable case "Case/Control Status"

If you now type describe again the variable labels will be displayed. They will also be used in the output of most statistical commands.

We can also label the values of the variables. Type:

 label define agegroups 1 "25-24" 2 "35-44" 3 "45-54" 4 "55-64" 5 "65-74" 6 "75+"
 label define alcgps 1 "<40g/day" 2 "40-79g/day" 3 "80-119g/day" 4 "120+g/day"
 label define tobgps 1 "0-9g/day" 2 "10-19g/day" 3 "20-29g/day" 4 "30+g/day"
 label define status 1 "Case" 0 "Control"
 label values age agegroups
 label values alcohol alcgps
 label values tobacco tobgps
 label values case status

To show that this has worked, display the first few rows of the data set by typing list There will be a prompt --more-- at the bottom of the screen. Type q to tell Stata that you don't want to see any more of the data. You can still see the original numbers by typing list ,nolabel.



Thomas Lumley
Wed Jan 22 09:16:46 PST 1997