Bioinformatics pipelines often use large numbers of components and deploying them incurs substantial configuration and maintenance burden (a significant barrier to reproducible research). Our aim is to define a new paradigm and best practices for developing, distributing and running pipelines encapsulated in Docker containers (lightweight virtualization), with a focus on Next Generation Sequencing (NGS) workflows. This approach provides several advantages, namely: efficiency, portability, versioning and reproducibility. Users and developers will often require the ability to run pipelines in heterogeneous environments (e.g. operating systems, workstations, clusters, clouds), with the containerized model, they can quickly deploy any pipeline version, in any environment. While this might also be achieved with a virtual machine (VM); VMs  lack portability, have substantial overhead (disk, CPU, RAM), and require allocated resources to be provisioned statically — Docker, to a large extent, solves these issues.

Git Repo & Docs:


can be downloaded from our ftp server:
├── gatk_resources.tar.gz
└── reference_genomes_b37.tar.gz

A public user with readonly access to the ‘Public’ folder

user: compbio-public
pwd: compbio-public

