Useful links

under construction ……..

Genome Browsers

UCSC Genome Browser http://genome.ucsc.edu/cgi-bin/hgGateway
Ensembl Genome Browser http://www.ensembl.org/index.html
HAPMAP http://hapmap.ncbi.nlm.nih.gov/
1000 Genomes http://www.1000genomes.org/
VISTA Genome Browser http://pipeline.lbl.gov/cgi-bin/gateway2
VISTA Tools for Comparative Genomics http://genome.lbl.gov/vista/index.shtml

Statistical Analysis of high-throughput data

R http://www.r-project.org/ The R Project for Statistical Computing: R is a free software environment for statistical computing and graphics. Quick-R: accessing the power of R
Bioconductor http://www.bioconductor.org/ Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data.
TM4 Microarray Software Suite http://www.tm4.org/
GALAXY https://main.g2.bx.psu.edu/ Galaxy is an open, web-based platform for data intensive biomedical research
DAVID Functional Annotation Bioinformatics Microarray Analysis http://david.abcc.ncifcrf.gov/
GeneMania http://www.genemania.org/ The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function
Gene Set Enrichment Analysis http://www.broadinstitute.org/gsea/index.jsp (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states.
Cytoscape http://www.cytoscape.org/ is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of plugins are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.
BINGO http://www.psb.ugent.be/cbd/papers/BiNGO/Home.html Visualisation for over-representation of GO categories, a plug-in for cytoscape

Machine Learning

Weka 3: Data Mining Software in Java http://www.cs.waikato.ac.nz/ml/weka/
The SHOGUN Machine Learning Tool Box http://www.shogun-toolbox.org/

Next Generation Sequencing Tools

Novoalign http://www.novocraft.com The most accurate aligner to date for single-ended and paired-end reads from the Illumina Genome Analyser & 454 paired end reads.
BWA http://bio-bwa.sourceforge.net/ Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome
The Genome Analysis Toolkit http://www.broadinstitute.org/gatk/ The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data
samtools http://samtools.sourceforge.net/ SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
vcftools http://vcftools.sourceforge.net/ a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide methods for working with VCF files: validating, merging, comparing and calculate some basic population genetic statistics
BEDTools http://code.google.com/p/bedtools/ a flexible suite of utilities for comparing genomic features: The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by “streaming” several BEDTools together.
FASTX-Toolkit http://hannonlab.cshl.edu/fastx_toolkit/ The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing
FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ A quality control tool for high throughput sequence data.
Integrative Genomics Viewer (IGV) http://www.broadinstitute.org/igv/ is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations
DELLY http://www.embl.de/~rausch/delly.html is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome

Variant Annotation for NGS data

ANNOVAR http://www.openbioinformatics.org/annovar/
Variant Effect Predictor http://www.ensembl.org/info/docs/variation/vep/index.html
MutationTaster http://www.mutationtaster.org/
snpEff http://snpeff.sourceforge.net/

Genetic Analysis Software

PLINK http://pngu.mgh.harvard.edu/~purcell/plink/ is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
SNPTEST https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html a program for Frequentist and Bayesian tests of SNP association with binary (case-control) and quantitative phenotypes that takes genotype uncertainty into account.
QUICKTEST http://toby.freeshell.org/software/quicktest.shtml The software implements the statistical methods for uncertain (imputed) genotype association testing that were published in the article Methods for testing association between uncertain genotypes and quantitative traits . USE THIS FOR QUANTITATIVE TRAITS!
GenABEL http://www.genabel.org a set of R packages for the analysis of genetic data. Includes tools for data management, file conversions (e.g. impute to mach format), efficient storage, analysis of genotyped and imputed (e.g. dosage) data, meta-analysis, prediction and more.
IMPUTE2 http://mathgen.stats.ox.ac.uk/impute/impute_v2.html a program for genotype imputation and phasing in genome-wide association studies and fine-mapping
SHAPEIT http://www.shapeit.fr/ a program for accurate and efficient phasing of genetic datasets
BEAGLE http://faculty.washington.edu/browning/beagle/beagle.html a state of the art software package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. BEAGLE can :
1. phase genotype data (i.e. infer haplotypes) for unrelated individuals, parent-offspring pairs, and parent-offspring trios.
2. infer sporadic missing genotype data.
3. impute ungenotyped markers that have been genotyped in a reference panel.
4. perform single marker and haplotypic association analysis.
5. detect genetic regions that are homozygous-by-descent in an individual or identical-by-descent in pairs of individuals
6. BEAGLE Utilities http://faculty.washington.edu/browning/beagle_utilities/utilities.html This page includes simple utility programs for manipulating text files. If you are performing analyses using BEAGLE, you may find some of these programs to be useful for preparing input files and for working with output files. The BEAGLE utilities are written in java and run on all common computing platforms (e.g. Windows, Unix, Linux, Solaris, Mac).
EIGENTSTRAT http://genepath.med.harvard.edu/~reich/Software.htm detects and corrects for population stratification in genome-wide association studies. The method, based on principal components analysis, explicitly models ancestry differences between cases and controls along continuous axes of variation. The resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The approach is powerful as well as fast, and can easily be applied to disease studies with hundreds of thousands of markers.
GWAPOWER http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html an R package for assessing the power of genome-wide association studies using commercially available genotyping chips. The package encapsulates extensive simulation results generated by our program HAPGEN and described fully in the paper
META https://mathgen.stats.ox.ac.uk/genetics_software/meta/meta.html a program to carryout meta-analysis of genetic studies
GWAMA http://www.well.ox.ac.uk/gwama/ Genome-Wide Association Meta Analysis software to perform meta-analysis of the results of GWA studies of binary or quantitative phenotypes
METAL http://www.sph.umich.edu/csg/abecasis/metal/index.html The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.
INRICH: Interval-based Enrichment Analysis Tool for Genome Wide Association Studies http://atgu.mgh.harvard.edu/inrich/
FORGE https://github.com/inti/FORGE/wiki FORGE is tool to perform gene based Genome-Wide Association Studies. It allows to combine information from different genetic variants into a single statistic. We have shown it provides additional power to detect true disease loci and it is useful to perform pathway or network analyses (Pedroso et al . submitted)
Mike Weale’s GWAS tools(Mike Weale is a statistical geneticist at King’s College London)

MISC

Unix & Perl Primer for Biologists http://korflab.ucdavis.edu/unix_and_Perl/
Harvester Portal http://harvester.kit.edu/HarvesterPortal crosslinking 100s of search engines, 6500 scientific sites, 800 Mio documents

MISC R

CIT algorithm (R script) CITtest.r Disentangling molecular relationships with a causal inference test (Joshua Millstein, Bin Zhang, Jun Zhu, Eric E. Schadt)

Bioinformatics and Biostatistics at the NIHR Maudsley BRC

Useful links

Genome Browsers

Statistical Analysis of high-throughput data

Machine Learning

Next Generation Sequencing Tools

Variant Annotation for NGS data

Genetic Analysis Software

MISC

MISC R

Leave a Reply Cancel reply