under construction ……..
Genome Browsers
- UCSC Genome Browser http://genome.ucsc.edu/cgi-bin/hgGateway
- Ensembl Genome Browser http://www.ensembl.org/index.html
- HAPMAP http://hapmap.ncbi.nlm.nih.gov/
- 1000 Genomes http://www.1000genomes.org/
- VISTA Genome Browser http://pipeline.lbl.gov/cgi-bin/gateway2
- VISTA Tools for Comparative Genomics http://genome.lbl.gov/vista/index.shtml
Statistical Analysis of high-throughput data
- R http://www.r-project.org/ The R Project for Statistical Computing: R is a free software environment for statistical computing and graphics. Quick-R: accessing the power of R
- Bioconductor http://www.bioconductor.org/ Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data.
- TM4 Microarray Software Suite http://www.tm4.org/
- GALAXY https://main.g2.bx.psu.edu/ Galaxy is an open, web-based platform for data intensive biomedical research
- DAVID Functional Annotation Bioinformatics Microarray Analysis http://david.abcc.ncifcrf.gov/
- GeneMania http://www.genemania.org/ The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function
- Gene Set Enrichment Analysis http://www.broadinstitute.org/gsea/index.jsp (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states.
- Cytoscape http://www.cytoscape.org/ is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of plugins are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.
- BINGO http://www.psb.ugent.be/cbd/papers/BiNGO/Home.html Visualisation for over-representation of GO categories, a plug-in for cytoscape
Machine Learning
- Weka 3: Data Mining Software in Java http://www.cs.waikato.ac.nz/ml/weka/
- The SHOGUN Machine Learning Tool Box http://www.shogun-toolbox.org/
Next Generation Sequencing Tools
- Novoalign http://www.novocraft.com The most accurate aligner to date for single-ended and paired-end reads from the Illumina Genome Analyser & 454 paired end reads.
- BWA http://bio-bwa.sourceforge.net/ Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome
- The Genome Analysis Toolkit http://www.broadinstitute.org/gatk/ The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data
- samtools http://samtools.sourceforge.net/ SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
- vcftools http://vcftools.sourceforge.net/ a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide methods for working with VCF files: validating, merging, comparing and calculate some basic population genetic statistics
- BEDTools http://code.google.com/p/bedtools/ a flexible suite of utilities for comparing genomic features: The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by “streaming” several BEDTools together.
- FASTX-Toolkit http://hannonlab.cshl.edu/fastx_toolkit/ The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
- FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ A quality control tool for high throughput sequence data.
- Integrative Genomics Viewer (IGV) http://www.broadinstitute.org/igv/ is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations
- DELLY http://www.embl.de/~rausch/delly.html is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome
Variant Annotation for NGS data
- ANNOVAR http://www.openbioinformatics.org/annovar/
- Variant Effect Predictor http://www.ensembl.org/info/docs/variation/vep/index.html
- MutationTaster http://www.mutationtaster.org/
- snpEff http://snpeff.sourceforge.net/
Genetic Analysis Software
- PLINK http://pngu.mgh.harvard.edu/~purcell/plink/ is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
- SNPTEST https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html a program for Frequentist and Bayesian tests of SNP association with binary (case-control) and quantitative phenotypes that takes genotype uncertainty into account.
- QUICKTEST http://toby.freeshell.org/software/quicktest.shtml The software implements the statistical methods for uncertain (imputed) genotype association testing that were published in the article Methods for testing association between uncertain genotypes and quantitative traits . USE THIS FOR QUANTITATIVE TRAITS!
- GenABEL http://www.genabel.org a set of R packages for the analysis of genetic data. Includes tools for data management, file conversions (e.g. impute to mach format), efficient storage, analysis of genotyped and imputed (e.g. dosage) data, meta-analysis, prediction and more.
- IMPUTE2 http://mathgen.stats.ox.ac.uk/impute/impute_v2.html a program for genotype imputation and phasing in genome-wide association studies and fine-mapping
- SHAPEIT http://www.shapeit.fr/ a program for accurate and efficient phasing of genetic datasets
- BEAGLE http://faculty.washington.edu/browning/beagle/beagle.html a state of the art software package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. BEAGLE can :
- phase genotype data (i.e. infer haplotypes) for unrelated individuals, parent-offspring pairs, and parent-offspring trios.
- infer sporadic missing genotype data.
- impute ungenotyped markers that have been genotyped in a reference panel.
- perform single marker and haplotypic association analysis.
- detect genetic regions that are homozygous-by-descent in an individual or identical-by-descent in pairs of individuals
- BEAGLE Utilities http://faculty.washington.edu/browning/beagle_utilities/utilities.html This page includes simple utility programs for manipulating text files. If you are performing analyses using BEAGLE, you may find some of these programs to be useful for preparing input files and for working with output files. The BEAGLE utilities are written in java and run on all common computing platforms (e.g. Windows, Unix, Linux, Solaris, Mac).
- EIGENTSTRAT http://genepath.med.harvard.edu/~reich/Software.htm detects and corrects for population stratification in genome-wide association studies. The method, based on principal components analysis, explicitly models ancestry differences between cases and controls along continuous axes of variation. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The approach is powerful as well as fast, and can easily be applied to disease studies with hundreds of thousands of markers.
- GWAPOWER http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html an R package for assessing the power of genome-wide association studies using commercially available genotyping chips. The package encapsulates extensive simulation results generated by our program HAPGEN and described fully in the paper
- META https://mathgen.stats.ox.ac.uk/genetics_software/meta/meta.html a program to carryout meta-analysis of genetic studies
- GWAMA http://www.well.ox.ac.uk/gwama/ Genome-Wide Association Meta Analysis software to perform meta-analysis of the results of GWA studies of binary or quantitative phenotypes
- METAL http://www.sph.umich.edu/csg/abecasis/metal/index.html The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.
- INRICH: Interval-based Enrichment Analysis Tool for Genome Wide Association Studies http://atgu.mgh.harvard.edu/inrich/
- FORGE https://github.com/inti/FORGE/wiki FORGE is tool to perform gene based Genome-Wide Association Studies. It allows to combine information from different genetic variants into a single statistic. We have shown it provides additional power to detect true disease loci and it is useful to perform pathway or network analyses (Pedroso et al . submitted)
- Mike Weale’s GWAS tools(Mike Weale is a statistical geneticist at King’s College London)
MISC
- Unix & Perl Primer for Biologists http://korflab.ucdavis.edu/unix_and_Perl/
- Harvester Portal http://harvester.kit.edu/HarvesterPortal crosslinking 100s of search engines, 6500 scientific sites, 800 Mio documents
MISC R
- CIT algorithm (R script) CITtest.r Disentangling molecular relationships with a causal inference test (Joshua Millstein, Bin Zhang, Jun Zhu, Eric E. Schadt)