Inferring Phylogeny and Introgression using RADseq Data: An Example from the Flowering Plants (Pedicularis: Orobanchaceae)

Eaton, D. A., & Ree, R. H. (2013). Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Systematic Biology62(5), 689-706.

In this paper Eaton and Ree explore the use of RADseq data in inferring the phylogeny of recently diverged clades that are suspected of experiencing current or past hybridization. To that end they developed a new sequence alignment pipeline for RAD data (pyRAD – we love it), described a novel extension of the ABBA/BABA test (AKA “Patterson’s D” or the D-test), and tested both with simulated and empirical data. Our discussion was preceded by a brief review of the use of ABBA/BABA in Neandertals and humans (Green et al. 2010), and most conversation focused on the use and interpretation of this statistic in Eaton and Ree’s paper.

The authors rightly point out that inferring a bifurcating phylogeny while also testing for introgression is kind of a paradox, as hybridization between non-sister taxa masks the signal of the “true” phylogeny in the genome via introgression. The resulting complications caused some frustration when we tried to make sense of the empirical results, and suggested to some participants that the D statistic is not well suited for use in systems for which we have no strong prior hypothesis of the phylogeny. The author’s final empirical results, in which they infer the biogeographic history of a clade of flowering plants from China via ABBA/BABA and censored SH test results, was not overwhelmingly persuasive – at least in part because the logic required to assess it depended on combing through D-test tables that we felt were difficult to parse given limited description in the text and the use of over-abbreviated taxa names in the tables, which required repeated flipping back and forth to table 1. In addition, the use of BUCKy with RAD loci was greeted with much skepticism: nearly all participants thought that the resolution offered by these loci seemed insufficient for concordance analysis.

The results of the simulated data analysis were necessarily more convincing, and provided a useful demonstration of how the D-test is susceptible to false positives when interpreted as a test for introgression rather than a test for shared derived alleles, which is what it actually counts. The “partitioned D-statistic” adds another taxa branching from the P3 lineage of the standard D-test. Although the ability to determine direction of gene flow is a true advantage, both that and the reported lack of false positives in the partitioned test seemed dependent on having more and better sampling, rather than any innate property of the partitioned test itself. Some phyloseminarians also suggested that these data may have been better suited to coalescent analyses in programs like migrate-N or IMA.

The authors also employ ML methods to infer phylogenies and used censored SH tests to compare topologies, but we didn’t have time to get to everything! In general we found this paper interesting, well-written (bonus points for use of the phrase “systematically recalcitrant”), and extremely relevant for our own work. [UW Phylogenetics Seminar, 4/3/14; CJ Battey]

This entry was posted in Reviews. Bookmark the permalink.