I recently completed an assembly of the UW PacBio sequencing data using Racon and wanted some assembly stats, as well as a way to compare this assembly to the assemblies Sean had completed.

Additionally, Steven recently performed an assembly comparison and I noticed he got some odd results. Specifically, of the three assemblies he compared (PacBio x 1, Illumina x 2), both of the Illumina assemblies had a large quantity of “Ns” in the assemblies. This didn’t seem right and the comparison program he used (QUAST) spit out a message indicating that it seemed like scaffolds were used, instead of contigs. So, I thought I’d give it a shot and see if I could track down non-scaffolded assemblies produced by Sean.

Jupyter notebook (GitHub): 20171003_docker_oly_assembly_comparisons.ipynb

First, I compared the following six assemblies (FASTA files) using QUAST:

Sean’s Assemblies:

Sam’s Assembly:

QUAST output directory: http://ift.tt/2xZNoli

Here’s the assembly comparison of all assemblies (click on image for larger view):

Interactive version of that graphic is here: http://ift.tt/2xZNoSk

The first thing that jumps out to me is the fact that two of the Illumina assemblies, which used different assemblers(!!) have the EXACT same assembly stats. This occurrence seems extremely unlikely. I’ve double-checked my Jupyter notebook to make sure that I didn’t assign the same file by accident (see Input #6)

Very strange!

I also noticed that the first Redundans assembly of Sean’s has a ton of “Ns”, suggesting that it’s actually a scaffolded assembly. As with Steven’s QUAST run, QUAST spits out the messages suggesting to use the “–scaffold” option for this file.

The other thing I noticed is the two PacBio assemblies have a huge difference in the total number of bp (~13,000,000)! I ran a QUAST assembly comparison between just those two for easier viewing/comparison (http://ift.tt/2xYbA7r):

Interactive version of that graphic is here: http://ift.tt/2xZNppm

The fact that there is such a large discrepancy in the total number of bps between these two assemblies really leaves me to believe that I am missing a FASTQ file from my assembly. I’m going to go back and see if that is indeed the case or if this difference in the assemblies is real.

Here’s an embedded version of my Jupyter notebook:

                <script>
                jQuery(function($){
                    var target_height = $(window).height();
                    $("iframe.iframe-class").height(target_height);
                });
                </script>

from Sam’s Notebook http://ift.tt/2hKVaJI

Assembly Comparisons – Olympia oyster genome assemblies
Tagged on: