FileNewTemplate - University of Connecticut

FileNewTemplate - University of Connecticut

Statistical Mitogenome Assembly with Repeats Fahad Alqahtani & Ion Mndoiu 10-19-2018 Outline Background SMART pipeline Results

Conclusions and future work Mitochondria: the powerhouse of the cell Cellular organelles within eukaryotic cells Convert chemical energy from food into adenosine triphosphate (ATP) The popular term "powerhouse of the cell" was coined by Philip Siekevitz in 1957 The second genome Source:https://www.fbi.gov/about-us/lab/forensic-science-communications/fsc/july1999/dnalist.htm/dnaf1.htm

Why sequence the mitogenome? Important role in disease Tuppen, Helen AL, et al. "Mitochondrial DNA mutations and human disease." Biochimica et Biophysica Acta (BBA)-Bioenergetics 1797.2 (2010): 113-128. Why sequence the mitogenome? Important role in disease Tracing maternal ancestry

Source: http://www.norwaydna.no/mtdna_en/ Why sequence the mitogenome? Important role in disease Tracing maternal ancestry Inferring human population migrations https://blog.23andme.com/ancestry/haplogroups-explained/

Why sequence the mitogenome? Important role in disease Tracing maternal ancestry Inferring human population migrations Species tree reconstruction Kurabayashi, Atsushi, and Masayuki Sumida. "Afrobatrachian mitochondrial genomes: genome reorganization, gene rearrangement mechanisms, and evolutionary trends of duplicated and rearranged genes." BMC genomics 14.1 (2013): 633.

Mitogenome assembly Most existing pipelines rely on reference genome or mitogenome of related species Off-the-shelf de novo assemblers poorly suited for assembling mtDNA from WGS reads Mitochondrial reads often discarded due to much higher sequencing depth of mtDNA compared to gDNA Do not handle well circular genomes & repeats Outline

Background SMART pipeline Results Conclusions and future work SMART Statistical Mitogenome Assembly with RepeaTs Input: Paired-end WGS reads Seed sequence (COI gene) Output: Complete/circular mitogenome (or largest scaffold)

SMART workflow Adapter trimming Automatic detection of adaptors and trimming using Perl/C++ modules from the IRFinder package PE overlap allows very precise (single base resolution) adapter trimming Middleton, Robert, et al. "IRFinder: assessing the impact of intron retention on mammalian gene expression." Genome biology 18.1 (2017): 51. Seed (COI) sequences

A ~648bp region of Cytochrome c oxidase subunit 1 (COI) gene has been selected as a DNA barcode for taxonomic classification Barcode of Life Datasystem (BOLD) has >6M barcodes from 194K animal species, 67K plant species, 21k fungi & other species http://www.boldsystems.org/ Coverage based filter Reads with 1 error OK Preliminary assembly

Reads passing coverage filter assembled using Velvet De Bruijn Graph assembler https://en.wikipedia.org/wiki/Velvet_assembler Preliminary contig filtering Contigs aligned against eukaryotic mitogenomes using BLAST Keep contigs with significant hits only Read alignment Using HISAT2

Fast and sensitive aligner for NGS reads Pulls out additional mitochondrial reads missed by coverage filter Secondary assembly Using SPAdes Based on multisized de Bruijn graph Robust to non-uniformities in read coverage Read alignment and SPAdes assembly repeated Until simplified contig graph is Eulerian, or max iterations reached Max-likelihood search Eulerian paths evaluated using likelihood model implemented in ALE [Clark et al 2013]

ALE likelihood Placement scoring: How well read sequences agree with the assembly Insert scoring: How well PE insert lengths match those we would expect Depth scoring: How well depth at each location agrees with depth expected after GCbias correction

K-mer scoring: How well k-mer counts of each contig match multinomial distribution estimated from entire assembly https://academic.oup.com/bioinformatics/article/29/4/435/199222 Bootstrapping & clustering Process repeated for n=10 bootstrap samples Rotation invariant pairwise distances computed using fitting alignment ML sequences clustered using hierarchical clustering Consensus computed for each cluster

A A B MITOS annotation Galaxy interface @ neo.engr.uconn.edu/?toolid=SMART Outline

Background SMART pipeline Results Conclusions and future work Coverage filter accuracy 2.5M reads Ground truth determined by bowtie2 alignment to known reference

Species Sample_ID TPR PPV F-Score Human HG00501

0.750 0.443 0.557 Human HG00524 0.454 0.147

0.222 Human HG00581 0.779 0.516 0.620 Human

HG00635 0.771 0.240 0.366 Chimpanzee SRR490082 0.715 0.207

0.321 Goat 0.875 0.220 0.352 ERR219544 1KGP human datasets

Birds and frog datasets Sample mtDNA sequence length (bp) LASTZ pairwise % identity MUSCLE pairwise % identity

ClustalW pairwise % identity MAFFT pairwise % identity Balearica regulorum 16,742

98.0 98.3 98.3 98.3 Grus japonensis 16,615 98.4

97.8 97.8 97.8 Xenopus laevis 17,922 98.0

95.9 96.1 95.7 Other datasets Sample mtDNA sequence length (bp) LASTZ

pairwise % identity MUSCLE pairwise % identity ClustalW pairwise % identity MAFFT pairwise % identity

Pan Troglodytes 16,085 97.5 94.7 94.7 94.7

Mus Musculus 15,802 99.97 96.9 96.7 96.9 Canis lupus

16,580 97.1 96.7 96.7 96.7 Outline

Background SMART pipeline Results Conclusions and future work Conclusions

SMART is an automated pipeline for de novo mitogenome assembly from WGS reads Based on statistical framework Probabilistic read classifier based on coverage Likelihood maximization for resolving ambiguities in assembly graph Assembly confidence estimated by bootstrapping Produces complete/circular assemblies even in presence of repeats Available via galaxy interface at neo.engr.uconn.edu/?toolid=SMART Ongoing work

Large-scale pipeline validation 47 frog species from [Zhang et al 2013] Reconstruction of plant mitochondrial and chloroplast genomes Extension to long read sequencing technologies (PacBio, Nanopore) Thank you for you attention! Any questions?

Recently Viewed Presentations

  • Exploring the network

    Exploring the network

    Automatically detects the appropriate cable connection type. Example: If your NICs have Auto-MDIX ports, you can connect them to each other with a straight-through cable. It will detect the cable and make the adjustment (to a crossover).
  • The Rime of the Ancient Mariner, Part 1 - Mrs. O's Brit Lit ...

    The Rime of the Ancient Mariner, Part 1 - Mrs. O's Brit Lit ...

    The Rime of the Ancient Mariner evokes a feeling, or mood, of danger, adventure, and the supernatural. The emotions range from happy and merry to somber in the poem as it progresses, but throughout it possesses an undertone of mystery....
  • The Roman World

    The Roman World

    Marius, Julius Caesar, and Pompey overthrew the empire and ran as a Triumvirate in 60 B.C.E. Then from 53 B.C.E. to 48 B.C.E. all three proceed to fight each other for power. 49 B.C.E. Caesar crosses the Rubicon, declares war...
  • Unit A4 Translation shifts - iauq.ac.ir

    Unit A4 Translation shifts - iauq.ac.ir

    In simplified terms, this means a TL piece of language which plays the same role in the TL system as an SL piece of language plays in the SL system. Catford: Two Kinds of Translation Shifts level shifts (between the...
  • Investing in Mutual Funds Topic 11 A. Pooled

    Investing in Mutual Funds Topic 11 A. Pooled

    Fixed income securities (25 - 40%) D. Types of Funds (continued) 5. Small Company. ... Arbitrage is the exploitation of an observable price inefficiency and, as such, pure arbitrage is considered riskless. Consider a very simple example. Say Acme stock...
  • Managing Change in the Local Church - USA / Canada Region

    Managing Change in the Local Church - USA / Canada Region

    The graduates of Harvard Business School are abysmal failures because Harvard Business School assumes that the "Melting Pot" theory of homogenous America is true. While in reality, we are the most diversified country in the world." Traditional structures are changing...
  • St. Joseph's Catholic High School

    St. Joseph's Catholic High School

    Encourage you child to plan responses before they begin answering questions.. Write about: • how Stevenson describes the setting . in this extract • how Stevenson uses settings to create a foreboding and sinister atmosphere
  • LAMP Convection and Lightning Jess Charba Fred Samplatsky

    LAMP Convection and Lightning Jess Charba Fred Samplatsky

    LAMP Convection and Lightning . LAMP Lightning and Convection Products: Review. Operational lightning products. Prob of ≥ 1 CTG lightning strikes in 20-km boxes during 2-h period. Yes/no lightning forecasts derived from probs. ... GFS MOS lightning probs.