Useful links

under construction ……..

Genome Browsers

Statistical Analysis of high-throughput data

Machine Learning

Next Generation Sequencing Tools

  • Novoalign http://www.novocraft.com The most accurate aligner to date for single-ended and paired-end reads from the Illumina Genome Analyser & 454 paired end reads.
  • BWA http://bio-bwa.sourceforge.net/ Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome
  • The Genome Analysis Toolkit http://www.broadinstitute.org/gatk/ The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data
  • samtools http://samtools.sourceforge.net/  SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
  • vcftools http://vcftools.sourceforge.net/ a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of    VCFtools is to provide methods for working with VCF files: validating,     merging, comparing and calculate some basic population genetic statistics
  • BEDTools http://code.google.com/p/bedtools/ a flexible suite of utilities for comparing genomic features: The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by “streaming” several BEDTools together.
  • FASTX-Toolkit  http://hannonlab.cshl.edu/fastx_toolkit/ The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
  • FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/  A quality control tool for high throughput sequence data.
  • Integrative Genomics Viewer (IGV) http://www.broadinstitute.org/igv/ is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations
  • DELLY http://www.embl.de/~rausch/delly.html is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome

Variant Annotation for NGS data

Genetic Analysis Software

  • PLINK http://pngu.mgh.harvard.edu/~purcell/plink/ is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
  • SNPTEST https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html a program for Frequentist and Bayesian tests of SNP association with binary (case-control) and quantitative phenotypes that takes genotype uncertainty into account.
  • QUICKTEST http://toby.freeshell.org/software/quicktest.shtml The software implements the statistical methods for uncertain (imputed) genotype association testing that were published in the article Methods for testing association between uncertain genotypes and quantitative traits . USE THIS FOR QUANTITATIVE TRAITS!
  • GenABEL http://www.genabel.org a set of R packages for the analysis of genetic data.  Includes tools for data management, file conversions (e.g. impute to mach format), efficient storage, analysis of genotyped and imputed (e.g. dosage) data, meta-analysis, prediction and more.
  • IMPUTE2 http://mathgen.stats.ox.ac.uk/impute/impute_v2.html a program for genotype imputation and phasing in genome-wide association studies and fine-mapping
  • SHAPEIT http://www.shapeit.fr/ a program for accurate and efficient phasing of genetic datasets
  • BEAGLE http://faculty.washington.edu/browning/beagle/beagle.html  a state of the art software  package for analysis of large-scale genetic data sets with hundreds of thousands  of markers genotyped on thousands of samples. BEAGLE can :
    1. phase genotype data (i.e. infer   haplotypes) for unrelated individuals, parent-offspring pairs, and   parent-offspring trios.
    2. infer sporadic missing genotype data.
    3. impute ungenotyped markers that have   been genotyped in a reference panel.
    4. perform single marker and haplotypic   association analysis.
    5. detect genetic regions that are   homozygous-by-descent in an individual or identical-by-descent in pairs of individuals
    6. BEAGLE Utilities http://faculty.washington.edu/browning/beagle_utilities/utilities.html This page includes simple utility programs for manipulating text  files.  If you are performing analyses using BEAGLE, you may find some of these programs to be useful for  preparing input files and for working with output files. The BEAGLE utilities  are written in java and run on all common computing platforms (e.g.  Windows, Unix, Linux, Solaris, Mac).
  • EIGENTSTRAT http://genepath.med.harvard.edu/~reich/Software.htm detects and corrects for population stratification in genome-wide association studies. The method, based on principal components analysis, explicitly models ancestry differences between cases and controls along continuous axes of variation. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The approach is powerful as well as fast, and can easily be applied to disease studies with hundreds of thousands of markers.
  • GWAPOWER http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html an R package for assessing the power of genome-wide association studies using commercially available genotyping chips. The package encapsulates extensive simulation results generated by our program HAPGEN and described fully in the paper
  • META https://mathgen.stats.ox.ac.uk/genetics_software/meta/meta.html a program to carryout meta-analysis of genetic studies
  • GWAMA http://www.well.ox.ac.uk/gwama/  Genome-Wide Association Meta Analysis software to perform meta-analysis of the results of GWA studies of binary or quantitative phenotypes
  • METAL http://www.sph.umich.edu/csg/abecasis/metal/index.html  The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.
  • INRICH: Interval-based Enrichment Analysis Tool for Genome Wide Association Studies http://atgu.mgh.harvard.edu/inrich/
  • FORGE https://github.com/inti/FORGE/wiki FORGE is tool to perform gene based Genome-Wide Association Studies. It allows to combine information from different genetic variants into a single statistic. We have shown it provides additional power to detect true disease loci and it is useful to perform pathway or network analyses (Pedroso et al . submitted)
  • Mike Weale’s GWAS  tools(Mike Weale is a statistical geneticist at King’s College London)
    1. Bayes Factors
    2. GWAS code
    3. EIGENSOFTplus
    4. Manhattan plots
    5. QQ plots

MISC

MISC R

  • CIT algorithm (R script) CITtest.r Disentangling molecular relationships with a causal inference test (Joshua Millstein, Bin Zhang, Jun Zhu, Eric E. Schadt)

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">html</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*