We have an upcoming meeting with Illumina to discuss how the geoduck genome project is coming along and to decide how we want to proceed.
Used the following assemblies as references:
- sn_ph_01 : SuperNova assembly of 10x Genomics data
- sparse_03 : SparseAssembler assembly of BGI and Illumina project data
- pga_02 : Hi-C assembly of Phase Genomics data
The analysis is documented in a Jupyter Notebook.
Jupyter Notebook (GitHub):
NOTE: Due to large amount of stdout from first genome index command, the notebook does not render well on GitHub. I recommend downloading and opening notebook on a locally install version of Jupyter.
Here’s a brief overview of the process:
- Generate Bowtie2 indexes for each of the genome assemblies.
- Map 1,000,000 reads from the following Illumina NovaSeq FastQ files:
Bowtie2 Genome Indexes:
Bowtie2 sn_ph_01 alignment folder:
Bowtie2 sparse_03 alignment folder:
Bowtie2 pga_02 alignment folder:
MAPPING SUMMARY TABLE
All mapping data was pulled from the respective *.err file in the Bowtie2 alignment folders.
|sequence_ID||Assembler||Alignment Rate (%)|
|pga_02||Hi-C (Phase Genomics)||79.90||
Mapping efficiency is similar for all assemblies. After speaking with Steven, we’ve decided we’ll begin exploring genome annotation pipelines.
from Sam’s Notebook https://ift.tt/2IbDHSL