LINKS:

Codeml program

PAML website

MHC data

Additional programs

Chi-square table

Description of PAML

Codeml output files from excerise

mapping of adaptive sites

BBEdit


selecton

A web interface to the type of analysis below: Datamonkey



Tests for positive selection by dN/dS analysis.

PAML Exercise: For PAML you edit the control file (codeml.ctl) to change all parameters. The program then reads this control file and writes the output files, one file is the name you specify the other is called "rst". Remember the dN/dS rate ratio is abbreviated as omega (w). Download the “Codeml program” folder to Desktop, which contains the codeml program, aligned sequences, treefile and control file in folder named mhcexample.

Use the terminal application to use the command line. To get to the folder type:


cd Desktop/mhcexample


then, to execute the codeml program, type:


./codeml codeml.ctl


where codeml.ctl is the control file you have edited.


Perform an analysis for variation in the dN/dS between sites using the mhc.phy sequence file and the mhc.tre treefile. Change runmode = 0, Nssites = 0 1 2 7 8. This will run the models 0, 1, 2, 7, and 8. The parameter ncatG should be set automatically. Be sure to check convergence by performing the analysis with different starting omega values (probably will not have time for this). Compare the likelihood of the different models (M1 vs. M2, M7 vs. M8).


a. How many degrees of freedom for the comparison of M1 vs M2 and M7 vs M8?

b. Which model fits the data better, M1 or M2? M7 or M8?

c. What are the parameter estimates for the models with the highest likelihood?

d. From this analysis would you conclude the genes have been subjected to

selection?

e. Which sites, if any, have been subjected to positive selection in the comparison

M1 vs M2?

f. Which sites, if any, have been subjected to positive selection in the comparison

M7 vs M8?


Try web based versions selecton and datamonkey. Selecton M8 Results , M7 Results and M8a Results.

Optional Analysis not discussed in lecture

Perform an analysis for variation in the dN/dS between lineages using the mhc.phy sequence file and the mhc.tre treefile. Set NSsites = 0. Run PAML with model = 1. This will estimate a different dN/dS ratio for each lineage. Compare the likelihood to the model 0 above (one dN/dS ratio for each linage, no variation between sites).

a. How many degrees of freedom between these models?

b. Which model fits the data better?

c. From this analysis, would you conclude the gene has been subject to positive selection?


Perform a pairwise comparison of sequences in the mhc.phy file to get ML estimates of dN and dS. Do this by changing the codeml.ctl file to "runmode = -2". To assess significance, calculate the likelihood from the same datafile setting the omege ratio = 1.


a. What are the estimates of dN , dS and dN/dS (give only a few comparisons)?

b. How many degrees of freedom between these two models?

c. Which model fits the data better, and why?

d. From this analysis would you conclude the genes have been subject to positive selection? Why or why not?



Send mail to: wswanson@gs.washington.edu