Zonal Phylogeny Software

(ZPS)

Developed by Sujay Chattopadhyay, Evgeni V. Sokurenko (Microbiology, Univ. of Washington, USA) and Daniel E. Dykhuizen (Biology, Univ. of Louisville, USA) 

Links to Other Softwares Required


Perl
ClustalX
PAUP*
TreeView
HyperTree

A Visualization Tool for Recent Adaptive Evolution of Proteins through SNPs

What is Zonal Phylogeny (ZP)?

A maximum-likelihood based phylogeny that separates structural variants of a protein into two categories, or zones on the DNA tree – ones that are encoded by multiple haplotypes (i.e. that differ from each other in silent SNPs) are assigned as Primary zone and the others that are encoded by a single haplotype form External zone. Details of the tool and its use on microbial datasets are available at Sokurenko et al. (Mol. Biol. Evol., 2004, 21: 1373-1383) and Chattopadhyay et al. (J. Mol. Evol., 2007, 64: 204-214).

What is ZPS?

An application that provides distribution of single nucleotide polymorphims (SNPs) of synonymous (silent) and non-synonymous (replacement) nature along branches of a given DNA tree of protein-encoding gene locus, allowing visualization of recent adaptive evolution through selection of advantageous structural mutations, even when advantage of such mutations is relatively short-lived.

What is the use of ZP in evolution?

1.   Separates ‘newer' External zone variants from ‘older' Primary zone ones to visualize

  • the consensus structural variant for the gene in Primary zone
  • the extent of recent emergence of structural variants
  • any External zone structural variant(s) with high representation suggesting recent/short-term selective structural/functional advantage of the variant(s)

2.   Helps to use zone-based statistics to

  • compare the diversity in frequency and size of haplotypes within and between genes
  • compute the concentration of structural hot-spot frequencies
  • analyze the niche-based concentration of haplotypes/structural variants to have a deeper insight on complex evolutionary dynamics such as source-sink model (Sokurenko et al., Nat. Rev. Microbiol., 2006, 4: 548-555; Chattopadhyay et al.J. Mol. Evol., 2007, 64: 204-214)

Using ZPS

Inputs

  • A PAUP*-generated maximum-likelihood DNA tree topology file (<filename>.ml.tre)
  • A DNA alignment file in FASTA format (<filename>.fasta)
  • [Tips to save computer processing time for alignment and PAUP runs, and also to help ZPS perform the haplotype size and frequency-based analysis: ZPS prefers only unique haplotypes (or alleles) in the input files. For singletons (haplotypes represented by a single member in the dataset), only the sequence name is needed. In other cases, ZPS wants the user to select one member as the representative for the particular haplotype, and present the haplotype in the dataset with that sequence name, followed by the letter ‘n' and the frequency of the haplotype; for instance, if there are three identical sequences seq1, seq2, seq3 in the dataset, to generate alignment file and tree topology, the user would include any one sequence for that haplotype, say seq1, and name it ‘seq1n3' to allow for haplotype size/frequency based analysis. Important: The user can only use alphanumeric characters (alphabets and decimal digits) in the sequence names.]

Outputs

  • The tree output: "zp_tree.dnd" where each node name (for example, ‘E4-seqA-n3-2S/1N-A77D') depicts (i) haplotype separation to either the External (‘E') or Primary (‘P') zone, with intermediate hypothetical (unresolved) nodes marked as ‘H'; (ii) followed by an arbitrary number assigned to a protein variant encoded by the haplotype (e.g. ‘E4'); (iii) original name of the representative haplotype and the user defined number of haplotypes that are identical to it in the dateset (e.g. ‘seqA-n3'), with ZPS automatically adding ‘–n1' to the haplotypes with single representatives; (iv) number of synonymous(S)/non-synonymous(N) SNPs along the connecting branch (e.g. ‘2S/1N'), and (v) specification of amino acid changes due to the non-synonymous SNPs (e.g. ‘A77D'). If there are more than one multi-haplotype structural variant in the Primary zone, they are denoted by ‘P1', ‘P2', and so on in the node name.
  • The "pairwise-variation.txt" file details the positions and specific changes, both synonymous and non-synonymous, along each branch in the tree.
  • The "analysis-results.txt" file outputs the Primary and External zone representatives, haplotype ratio (as a ratio of the number of External zone haplotypes to the total number of haplotypes in the dataset), position-wise structural mutation information, both overall and zone-wise structural hot-spot frequency (as a ratio of the number of hot-spot structural mutations to the total number of structural mutations), calculations of a and Simpson's diversity statistics (Chattopadhyay et al., J. Mol. Evol., 2007, 64: 204-214).
  • Two other output files "seq_comp-out1.txt" and "seq_comp-out1.txt" are used as sequential inputs by ZPS.

Viewing Tree Outputs

  • To view the ZPS output trees, any program that reads the topology in Newick format to generate the tree should work, e.g. TreeView, HyperTree.
  • Keeping HyperTree in mind, ZPS generates an additional color-code file, for the output tree file, to color-code the Primary and the External zone representatives. Two color-codes have been used: blue for all the Primary zone haplotypes that exhibit same-protein silent variability and red for all the External zone representatives. To color-view "zp_tree.dnd" in HyperTree (see HyperTree view as "zp_tree.pdf") , the user needs to ‘import colors' calling "color-zp_tree.txt" file.

Implementation

  • Written in Perl 5.8.8
  • Currently available for Windows

Requirements

  • Perl 5.8.8 or above
  • PAUP* and any DNA alignment software, such as ClustalX to generate the input files for ZPS
  • A tree-viewing software such as HyperTree (enables to view ZPS-generated color-coded nodes), TreeView

Reporting Bugs

If you encounter a bug, it would be very helpful if you could send us the input files you were using, so that we can try to reproduce the bug and fix it.

 

Download ZPS

Running the program

  • Save zps.pl in the folder where the input files are stored.
  • Run the program from Command Prompt as "perl zps.pl".
  • Input the tree topology file (<filename>.ml.tre) and the fasta file (<filename>.fasta) as prompted by the program.
  • The output files are stored in the same folder.
  • During each run, the output files are overwritten; therefore save them in other names as per your need.