*** Assignment 1 -- STATA .do file *** pause on ***I first saved the dataset as a .txt file on my hard drive*** infile id months age cd4 cd8 vload0 aidscase vtime sctime atime dtime ideath using macs.txt replace cd4=. if cd4==-999 replace vload0=. if vload0==-999 replace aidscase=. if aidscase==-999 replace atime=. if atime==-999 save macs, replace ***Number 1*** ***I am curious as to how many individual subjects are in the dataset*** codebook id pause [ type q to continue ] ***Summarizing viral load over full data set, *** i.e. multiple data points per subject*** summarize vload0 pause [ type q to continue ] ***Collapsing the data to give me 1 row per subject *** by taking the mean of each of the multiple data points per subject*** collapse (mean) months age cd4 cd8 vload0 aidscase vtime sctime atime dtime ideath, by (id) ***Summarizing viral load in the collapsed dataset*** summarize vload0 pause [ type q to continue ] ***Number 2*** clear use macs ***Creating cd4 by years since seroconversion categories*** gen y1 = cd4 recode y1 (min/max=.) if months >12 recode y1 (min/max=.) if months <0 gen y2 = cd4 recode y2 (min/max=.) if months >24 recode y2 (min/max=.) if months <13 gen y3 = cd4 recode y3 (min/max=.) if months >36 recode y3 (min/max=.) if months <25 gen y4 = cd4 recode y4 (min/max=.) if months >48 recode y4 (min/max=.) if months <37 ***Summaring cd4 distribution in years 1, 2, 3, 4 in full data set*** summarize y1 y2 y3 y4 ***Collapsing the data to give me 1 row per subject by taking the mean of each of the multiple data points per subject*** collapse (mean) months age cd4 cd8 vload0 aidscase vtime sctime atime dtime ideath y1 y2 y3 y4, by (id) ***Summaring cd4 distribution in years 1, 2, 3, 4 in collapsed data set*** summarize y1 y2 y3 y4 pause [ type q to continue ] ***Number 3*** clear use macs ***Creating cd4 by years since seroconversion categories*** gen y1 = cd4 recode y1 (min/max=.) if months >12 recode y1 (min/max=.) if months <0 gen y2 = cd4 recode y2 (min/max=.) if months >24 recode y2 (min/max=.) if months <13 gen y3 = cd4 recode y3 (min/max=.) if months >36 recode y3 (min/max=.) if months <25 gen y4 = cd4 recode y4 (min/max=.) if months >48 recode y4 (min/max=.) if months <37 ***Collapsing the data to give me 1 row per subject by taking the mean of each of the multiple data points per subject*** collapse (mean) months age cd4 cd8 vload0 aidscase vtime sctime atime dtime ideath y1 y2 y3 y4, by (id) ***Creating viral load categories as per VFHL paper, chapter 1, page 8, example 2*** gen load = 0 replace load = 1 if vload0 < 15000 replace load = 2 if (vload0 >= 15000 & vload0 < 46000) replace load = 3 if vload0 >= 46000 replace load =. if vload0==. ***Creating labels for the categories*** label define vload 1 "Low (<15000)" 2 "Medium (15000 - 46000)" 3 "High (>46000)" label values load vload tab load ***CD4 levels for years 1, 2, 3, 4 by viral load levels*** sort load by load: summarize y1 y2 y3 y4 pause [ type q to continue ] ***Saving the collapsed dataset to use later*** sort id save collapsed, replace ***Number 4*** clear use macs ***Plotting individual series of longitudinal observations *** for a selection of subjects*** graph twoway (lowess cd4 months) (scatter cd4 months) if id < 1756, by (id) pause [ type q to continue ] ***Number 5*** gen y1 = cd4 recode y1 (min/max=.) if months >12 recode y1 (min/max=.) if months <0 gen y2 = cd4 recode y2 (min/max=.) if months >24 recode y2 (min/max=.) if months <13 gen y3 = cd4 recode y3 (min/max=.) if months >36 recode y3 (min/max=.) if months <25 gen y4 = cd4 recode y4 (min/max=.) if months >48 recode y4 (min/max=.) if months <37 collapse (mean) months age cd4 cd8 vload0 aidscase vtime sctime atime dtime ideath y1 y2 y3 y4, by (id) ***Getting correlation matrix*** corr y1 y2 y3 y4 pause [ type q to continue ] ***Number 6*** clear use macs ***Removing any subjects with < 3 datapoints*** sort id by id: drop if _N < 3 ***Regressing CD4 on months since seroconversion*** statsby "reg cd4 months" _b, by(id) clear ***Listing the results for the first 10 subjects*** list in 1/10 pause [ type q to continue ] ***Number 7*** clear use macs ***Removing any subjects with < 3 datapoints*** sort id by id: drop if _N < 3 ***Regressing CD4 on months since seroconversion*** statsby "reg cd4 months" _b, by(id) clear ***Merging the slopes and intercepts back into the full dataset using previously sorted and saved collapsed dataset*** sort id merge id using collapsed ***Generating a variable for the natural log of the baseline viral load*** gen lnvload0 = ln(vload0) ***Plotting slopes versus ln baseline viral load as a scatterplot with a lowess curve*** graph twoway (lowess b_months lnvload0) (scatter b_months lnvload0) pause [ type q to continue ] ***Number 8*** clear ********** ********** Now for some regression models! ********** clear use macs **** create categories of vload0 (again) *** gen load = -999 replace load = 1 if vload0 <= 15000 replace load = 2 if (vload0 > 15000 & vload0 <= 46000) replace load = 3 if vload0 > 46000 replace load =. if vload0==. replace load =. if vload0== -999 gen load2 = (load==2) gen load3 = (load==3) gen monthXload2 = months * (load==2) gen monthXload3 = months * (load==3) ***fit mixed models xtmixed cd4 months load2 load3 monthXload2 monthXload3 || id: , mle pause [ type q to continue ] xtmixed cd4 months load2 load3 monthXload2 monthXload3 || id: months , mle pause [ type q to continue ] ***Number 10*** xtgee cd4 months load2 load3 monthXload2 monthXload3, /// corr(independent) i(id) robust xtgee cd4 months load2 load3 monthXload2 monthXload3, /// corr(exchangeable) i(id) robust