Algorithms for Next Generation Sequencing 1st edition by Wing Kin Sung – Ebook PDF Instand Download/DeliveryISBN: 1498752985, 9781498752985
Full dowload Algorithms for Next Generation Sequencing 1st edition after payment
Product details:
ISBN-10 : 1498752985
ISBN-13 : 9781498752985
Author: Wing Kin Sung
Advances in sequencing technology have allowed scientists to study the human genome in greater depth and on a larger scale than ever before – as many as hundreds of millions of short reads in the course of a few days. But what are the best ways to deal with this flood of data? Algorithms for Next-Generation Sequencing is an invaluable tool for students and researchers in bioinformatics and computational biology, biologists seeking to process and manage the data generated by next-generation sequencing, and as a textbook or a self-study resource. In addition to offering an in-depth description of the algorithms for processing sequencing data, it also presents useful case studies describing the applications of this technology.
Algorithms for Next Generation Sequencing 1st Table of contents:
1 Introduction
1.1 DNA, RNA, protein and cells
1.2 Sequencing technologies
1.3 First-generation sequencing
1.4 Second-generation sequencing
1.4.1 Template preparation
1.4.2 Base calling
1.4.3 Polymerase-mediated methods based on reversible terminator nucleotides
1.4.4 Polymerase-mediated methods based on unmodified nucleotides
1.4.5 Ligase-mediated method
1.5 Third-generation sequencing
1.5.1 Single-molecule real-time sequencing
1.5.2 Nanopore sequencing method
1.5.3 Direct imaging of DNA using electron microscopy
1.6 Comparison of the three generations of sequencing
1.7 Applications of sequencing
1.8 Summary and further reading
1.9 Exercises
2 NGS file formats
2.1 Introduction
2.2 Raw data files: fasta and fastq
2.3 Alignment files: SAM and BAM
2.3.1 FLAG
2.3.2 CIGAR string
2.4 Bed format
2.5 Variant Call Format (VCF)
2.6 Format for representing density data
2.7 Exercises
3 Related algorithms and data structures
3.1 Introduction
3.2 Recursion and dynamic programming
3.2.1 Key searching problem
3.2.2 Edit-distance problem
3.3 Parameter estimation
3.3.1 Maximum likelihood
3.3.2 Unobserved variable and EM algorithm
3.4 Hash data structures
3.4.1 Maintain an associative array by simple hashing
3.4.2 Maintain a set using a Bloom filter
3.4.3 Maintain a multiset using a counting Bloom filter
3.4.4 Estimating the similarity of two sets using minHash
3.5 Full-text index
3.5.1 Suffix trie and suffix tree
3.5.2 Suffix array
3.5.3 FM-index
3.5.3.1 Inverting the BWT B to the original text T
3.5.3.2 Simulate a suffix array using the FM-index
3.5.3.3 Pattern matching
3.5.4 Simulate a suffix trie using the FM-index
3.5.5 Bi-directional BWT
3.6 Data compression techniques
3.6.1 Data compression and entropy
3.6.2 Unary, gamma, and delta coding
3.6.3 Golomb code
3.6.4 Huffman coding
3.6.5 Arithmetic code
3.6.6 Order-k Markov Chain
3.6.7 Run-length encoding
3.7 Exercises
4 NGS read mapping
4.1 Introduction
4.2 Overview of the read mapping problem
4.2.1 Mapping reads with no quality score
4.2.2 Mapping reads with a quality score
4.2.3 Brute-force solution
4.2.4 Mapping quality
4.2.5 Challenges
4.3 Align reads allowing a small number of mismatches
4.3.1 Mismatch seed hashing approach
4.3.2 Read hashing with a spaced seed
4.3.3 Reference hashing approach
4.3.4 Suffix trie-based approaches
4.3.4.1 Estimating the lower bound of the number of mismatches
4.3.4.2 Divide and conquer with the enhanced pigeon-hole principle
4.3.4.3 Aligning a set of reads together
4.3.4.4 Speed up utilizing the quality score
4.4 Aligning reads allowing a small number of mismatches and indels
4.4.1 q-mer approach
4.4.2 Computing alignment using a suffix trie
4.4.2.1 Computing the edit distance using a suffix trie
4.4.2.2 Local alignment using a suffix trie
4.5 Aligning reads in general
4.5.1 Seed-and-extension approach
4.5.1.1 BWA-SW
4.5.1.2 Bowtie 2
4.5.1.3 BatAlign
4.5.1.4 Cushaw2
4.5.1.5 BWA-MEM
4.5.1.6 LAST
4.5.2 Filter-based approach
4.6 Paired-end alignment
4.7 Further reading
4.8 Exercises
5 Genome assembly
5.1 Introduction
5.2 Whole genome shotgun sequencing
5.2.1 Whole genome sequencing
5.2.2 Mate-pair sequencing
5.3 De novo genome assembly for short reads
5.3.1 Read error correction
5.3.1.1 Spectral alignment problem (SAP)
5.3.1.2 k-mer counting
5.3.2 Base-by-base extension approach
5.3.3 De Bruijn graph approach
5.3.3.1 De Bruijn assembler (no sequencing error)
5.3.3.2 De Bruijn assembler (with sequencing errors)
5.3.3.3 How to select k
5.3.3.4 Additional issues of the de Bruijn graph approach
5.3.4 Scaffolding
5.3.5 Gap filling
5.4 Genome assembly for long reads
5.4.1 Assemble long reads assuming long reads have a low sequencing error rate
5.4.2 Hybrid approach
5.4.2.1 Use mate-pair reads and long reads to improve the assembly from short reads
5.4.2.2 Use short reads to correct errors in long reads
5.4.3 Long read approach
5.4.3.1 MinHash for all-versus-all pairwise alignment
5.4.3.2 Computing consensus using Falcon Sense
5.4.3.3 Quiver consensus algorithm
5.5 How to evaluate the goodness of an assembly
5.6 Discussion and further reading
5.7 Exercises
6 Single nucleotide variation (SNV) calling
6.1 Introduction
6.1.1 What are SNVs and small indels?
6.1.2 Somatic and germline mutations
6.2 Determine variations by resequencing
6.2.1 Exome/targeted sequencing
6.2.2 Detection of somatic and germline variations
6.3 Single locus SNV calling
6.3.1 Identifying SNVs by counting alleles
6.3.2 Identify SNVs by binomial distribution
6.3.3 Identify SNVs by Poisson-binomial distribution
6.3.4 Identifying SNVs by the Bayesian approach
6.4 Single locus somatic SNV calling
6.4.1 Identify somatic SNVs by the Fisher exact test
6.4.2 Identify somatic SNVs by verifying that the SNVs appear in the tumor only
6.4.2.1 Identify SNVs in the tumor sample by posterior odds ratio
6.4.2.2 Verify if an SNV is somatic by the posterior odds ratio
6.5 General pipeline for calling SNVs
6.6 Local realignment
6.7 Duplicate read marking
6.8 Base quality score recalibration
6.9 Rule-based filtering
6.10 Computational methods to identify small indels
6.10.1 Split-read approach
6.10.2 Span distribution-based clustering approach
6.10.3 Local assembly approach
6.11 Correctness of existing SNV and indel callers
6.12 Further reading
6.13 Exercises
7 Structural variation calling
7.1 Introduction
7.2 Formation of SVs
7.3 Clinical effects of structural variations
7.4 Methods for determining structural variations
7.5 CNV calling
7.5.1 Computing the raw read count
7.5.2 Normalize the read counts
7.5.3 Segmentation
7.6 SV calling pipeline
7.6.1 Insert size estimation
7.7 Classifying the paired-end read alignments
7.8 Identifying candidate SVs from paired-end reads
7.8.1 Clustering approach
7.8.1.1 Clique-finding approach
7.8.1.2 Confidence interval overlapping approach
7.8.1.3 Set cover approach
7.8.1.4 Performance of the clustering approach
7.8.2 Split-mapping approach
7.8.3 Assembly approach
7.8.4 Hybrid approach
7.9 Verify the SVs
7.10 Further reading
7.11 Exercises
8 RNA-seq
8.1 Introduction
8.2 High-throughput methods to study the transcriptome
8.3 Application of RNA-seq
8.4 Computational Problems of RNA-seq
8.5 RNA-seq read mapping
8.5.1 Features used in RNA-seq read mapping
8.5.1.1 Transcript model
8.5.1.2 Splice junction signals
8.5.2 Exon-first approach
8.5.3 Seed-and-extend approach
8.6 Construction of isoforms
8.7 Estimating expression level of each transcript
8.7.1 Estimating transcript abundances when every read maps to exactly one transcript
8.7.2 Estimating transcript abundances when a read maps to multiple isoforms
8.7.3 Estimating gene abundance
8.8 Summary and further reading
8.9 Exercises
9 Peak calling methods
9.1 Introduction
9.2 Techniques that generate density-based datasets
9.2.1 Protein DNA interaction
9.2.2 Epigenetics of our genome
9.2.3 Open chromatin
9.3 Peak calling methods
9.3.1 Model fragment length
9.3.2 Modeling noise using a control library
9.3.3 Noise in the sample library
9.3.4 Determination if a peak is significant
9.3.5 Unannotated high copy number regions
9.3.6 Constructing a signal profile by Kernel methods
9.4 Sequencing depth of the ChIP-seq libraries
9.5 Further reading
9.6 Exercises
10 Data compression techniques used in NGS files
10.1 Introduction
10.2 Strategies for compressing fasta/fastq files
10.3 Techniques to compress identifiers
10.4 Techniques to compress DNA bases
10.4.1 Statistical-based approach
10.4.2 BWT-based approach
10.4.3 Reference-based approach
10.4.4 Assembly-based approach
10.5 Quality score compression methods
10.5.1 Lossless compression
10.5.2 Lossy compression
10.6 Compression of other NGS data
10.7 Exercises
People also search for Algorithms for Next Generation Sequencing 1st:
algorithms for next-generation sequencing
algorithms for next-generation sequencing data
next generation sequencing technique
what are next generation sequencing
what are the types of next generation sequencing
Reviews
There are no reviews yet.