Home

Welcome to the official page for OpenCGA documentation.

OpenCGA is the most advanced big data genomic analysis platform. It is implemented as an open-source project that implements a high-performance, scalable and secure platform for Genomic data analysis and visualisation.

OpenCGA provides the most advanced and complete genomic data platform. The performance, scalability and huge number features makes of OpenCGA an unique full-stack solution today. OpenCGA takes care of security and implements a high-performance query engine and analysis frameworks for Big Data analysis and visualisation in current genomics. OpenCGA uses the most modern and advanced technologies, and has been designed and implemented to scale to hundreds of thousands if genomes accounting for petabytes of variant data. It is built on top of three main components: Catalog Metadata Database, Variant Storage Engine and Analysis Framework.

Main Features

  • Authenticated and secure platform to query and visualise data. An advanced permission system has been implemented to ensure data privacy.

  • A metadata database to keep track of registered users, projects, studies, files, samples, families, jobs and other entities.

  • Advanced Clinical Data database implemented, users can define their data models for samples, patients or families.

  • Alignment storage allows to index BAM/CRAM, calculate index and query data and coverage

  • The most advanced, high-performance and scalable Variant Storage Engine solution today. Variant Storage Engine can normalise, load, index, aggregate, annotate and precompute variant stats for hundreds of thousands of whole genomes.

  • Analysis Framework implemented on top of variant and alignment storage engines. OpenCGA comes with many analysis already implemented such as GWAS. Users can easily extend OpenCGA functionality by implementing a plugin or connecting to a external binary.

  • Real Big Data Analytics supported, you can use different computing frameworks such as MapReduce or Spark on top HBase or Parquet files.

  • Full Clinical Analysis Solution implemented, you can create the cases and run different clinical interpretations algorithms from your scripts or from a web application.

  • Rich and comprehensive RESTful Web Services API with more than 160 endpoints to manage, query and analyse metadata, variants, alignments and clinical data.

  • Easy programmatic access and pipeline integration thanks to the four different client libraries developed in Java, Python, R and Javascript

  • Interactive web-based application to query, analyse and visualise variants, alignments and clinical data

Zetta Genomics is a start-up launched in 2019 to offer official support and customisation of your OpenCB applications.

Zetta offers advanced data management systems for precision medicine based on the OpenCB applications. Find more information about this consolidated and demanding initiative at https://zettagenomics.com/

Last updated