Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-hfldf Total loading time: 0 Render date: 2024-05-01T13:43:52.771Z Has data issue: false hasContentIssue false

14 - Genomics

from Part V - Applications

Published online by Cambridge University Press:  05 May 2015

Veli Mäkinen
Affiliation:
University of Helsinki
Djamal Belazzougui
Affiliation:
University of Helsinki
Fabio Cunial
Affiliation:
University of Helsinki
Alexandru I. Tomescu
Affiliation:
University of Helsinki
Get access

Summary

We shall now explore how the techniques from the previous chapters can be used in studying the genome of a species. We assume here that we have available the so-called finalized genome assembly of a species. Ideally, this means the complete sequences of each chromosome in a species. However, in practice, this consists of chromosome sequences containing large unknown substrings, and some unmapped contigs/scaffolds.

To fix our mindset, we assume that the species under consideration is a diploid organism (like a human). We also assume that the assembled genome is concatenated into one long sequence T with some necessary markers added to separate its contents, and with some auxiliary bookkeeping data structures helpful for mapping back to the chromosome representation.

We start with a peak detection problem in Insight 14.1 that gives a direct connection to the segmentation problem motivating our study of hidden Markov models in Chapter 7. Then we proceed into variation calling, where we mostly formulate new problems assuming read alignment as input. The results of variant calling and read alignments are then given as the input to haplotype assembly.

Insight 14.1 Peak detection and HMMs

The coverage profile of the genome is an array storing for each position the amount of reads aligned to cover that position. In ChIP-sequencing and bisulfite sequencing only some parts of the genome should be covered by reads, so that clear peak areas should be noticeable in the coverage profile. In targeted resequencing, the areas of interest are known beforehand, so automatic detection of peak areas is not that relevant (although usually the targeting is not quite that accurate). Peak detection from a signal is a classical signal processing task, so there are many existing generalpurpose peak detectors available. In order to use the machinery we developed earlier, let us consider how HMMs could be used for our peak detection task.

[…]

Type
Chapter
Information
Genome-Scale Algorithm Design
Biological Sequence Analysis in the Era of High-Throughput Sequencing
, pp. 307 - 324
Publisher: Cambridge University Press
Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×