Shadowing / Footprinting lab
Links:
Comparative Genomics at Lawrence Livermore
Download all data (stuff it expander)
Wikipedia Phylogenetic Footprinting
The goal of this exercise is to gain familiarity with methods to identify conserved regions using comparative genomic data.
Phylogenetic Shadowing
1. Log on the to the Comparative Genomics Center at Lawrence Livermore National lab (http://www.dcode.org/ ), and go the eShadow program ( http://eshadow.dcode.org/ ). Submit the sequences all.fasta. Be sure to check the clustal alignment generated never trust an alignment always check by eye.
Assume you know that 1300 - 1350 is a functional region. Use that region for feature annotation and select mapping conserved regions using HMM Islands. Which regions are defined as conserved? What are the estimated HMM parameters?
Next assume 548-814 is functional? Which regions are defined as conserved.
Try using the default values from their previous analyses (do not check parameter optimization). The values should be eS = 0.85 eF = 0.77 T = 0.1.
Try several different options for the analysis. Try changing sliding window size. The values eS, eF are the initial values used in the optimization try starting from different initial values.
Start the analysis over, but try with a subset of the sequences provided. For example, try 6 primates of varying divergence.
From the above analyses, make a prediction about the location of conserved domains. What can you conclude from these predictions?
2. Try using VISTA tools to identify conserved regions (http://gsd.lbl.gov/vista/index.shtml ). First use mVISTA and submit each of the sequences used for phylogenetic shadowing to computer vista plots. Which regions are identified as conserved?
From the above analyses, make a prediction about the location of conserved domains. What can you conclude from these predictions?
3. Use GenomeVISTA to submit one sequence to compare to multiple completed genomes. Add several additional plots including chicken. Which ones are informative?
4. Log into the ECR Browser (http://ecrbrowser.dcode.org/ ) and view Vista plots for the region. The gene name is NR1H3.
Write a short report describing the different analyses to identify conserved domains. Address using the same primate dataset using eShadow and vista (# 1 and 2 above) and the efficiency of detecting conserved domains. How did adding additional divergent taxa help in the identification of conserved domains (#3 above). The dataset we used was primarily looking at exons, what would be most useful for the identification of other conserved regions?
Send mail to: wswanson@gs.washington.edu