OpenCGA
OverviewInstallationUsing OpenCGA
v2.2
v2.2
  • Home
  • Overview
    • Features
    • Architecture Diagram
    • Security
    • Metadata and Clinical Data
    • Data Models
      • Sample
      • Individual
    • Alignment and Coverage
    • Variant Query and Analysis
    • Clinical Data Analysis
    • Running Analysis
    • Scalability and Performance
    • Sizing OpenCGA
  • OpenCGA Architecture
    • Catalog
    • Alignment and Coverage
    • Variant Storage Engine
      • Variant Data Model
    • Clinical Analysis
    • Analysis Framework
  • Data Models
    • User
    • Project
    • Study
    • File
    • Sample
    • Individual
    • Family
    • Cohort
    • Job
    • Clinical Analysis
  • Case Studies
    • Genomics England Research
    • Microsoft Azure
  • User Manual
    • Installation
      • Kubernetes Cluster
        • Azure
        • Configuration
      • On-Premise HPC Cluster
        • Server Configuration
        • Getting OpenCGA
        • Choosing Variant Storage Engine
        • Configuration
      • Running Docker
    • Login
    • Using OpenCGA
      • REST Web Service API
      • IVA Web App
      • Client Libraries
        • pyopencga - Python Library
        • opencgaR - R library
        • Java
        • JavaScript
      • Command Line
        • Configuration
        • Files
      • Public Demo
    • Managing Data
      • Working with Projects and Studies
      • Loading of VCF files
      • Create Projects and Studies
      • Load VCF Files to a Study
      • Working with Files
      • Sharing and Permissions
        • Users and Groups
        • Study ACLs
      • Population of metadata
        • Data Versioning
        • Adding Custom Metadata
    • Alignment Engine
      • BAM Index
      • Alignment Read Query
      • Working with Coverage
      • Alignment Analysis
    • Variant Storage Engine
      • Variant Query
      • Variant Aggregation Stats
      • Variant Analysis
    • Clinical Genomics
      • Clinical Interpretation
      • Clinical Analysis
    • Using JupyterLab
    • Administrator
      • User Management
      • Templates / Manifest
  • About
    • Roadmap
    • Release Notes
    • Community
Powered by GitBook
On this page
  • OpenCGA Alignment User Interfaces
  • OpenCGA command line interface
  • OpenCGA RESTful web services interface

Was this helpful?

  1. OpenCGA Architecture

Alignment and Coverage

PreviousCatalogNextVariant Storage Engine

Last updated 3 years ago

Was this helpful?

OpenCGA Alignment Engine provides a solution to storage and process sequence alignment data from Next-Generation Sequencing (NGS) projects. The Alignment Engine supports the most common alignment file formats, i.e.: , and takes the alignment data model specification from and the implementation from . See a full description at .

We do not define or endorse any dedicated unaligned sequence data format. Instead we recommend storing such data in one of the alignment formats (SAM, BAM, or CRAM) with the unmapped flag set.

OpenCGA alignment engine provides the following analysis:

  • Index analysis To index a coordinate-sorted alignment file (BAM or CRAM) for fast random access. This index is needed when region parameters are used to limit the query analysis to particular regions of interest.

  • Query analysis This analysis outputs those alignments matching the specified filters, such as minimum mapping quality, maximum insert size, maximum number of mismatches in the alignment, properly paired alignments,... In addition, users may specify one or more comma-separated regions to restrict output to only those alignments which overlap the specified region(s). Note that use of region specifications requires a coordinate-sorted and indexed input file (in BAM or CRAM format).

  • Coverage analysis This analysis takes a coordinate-sorted and indexed alignment file (in BAM or CRAM format) as input and generates a coverage file (in BigWig format). The coverage is calculated as the number of reads per window of a user-defined size, if window size is equal to 1, the coverage is the number of reads per position. Once coverage is computed, the read coverage over multiple genomic regions can be fetched quite quickly.

  • Statistics analysis OpenCGA computes statistics for a given alignment file by using the . Alignment statistics are indexed in order to allow users to query for alignment files according to those statistics.

In addtion, OpenCGA provides wrappers to the following third-party alignment software packages:

  • : a quality control tool for high throughput sequence data.

  • : a software package for mapping low-divergent sequences against a large reference genome.

  • : a program for interacting with high-throughput sequencing data in SAM, BAM and CRAM formats.

  • : a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq.

OpenCGA Alignment User Interfaces

OpenCGA provides two interfaces to allow users execute the alignment tools and analysis:

  • Command line inteface

  • RESTful web services interface

OpenCGA command line interface

The OpenCGA command line interface to manage alignment data is accessible through the script opencga.sh using the command alignments:

OpenCGA RESTful web services interface

Next image shows the OpenCGA RESTfull web services to manage alignment data:

The tutorial shows how to use the OpenCGA alignment commandline.

SAM, BAM and CRAM
GA4GH
OpenCB GA4GH
Alignment Data Model
samtools stats command
FastQC
BWA
Samtools
deepTools
Working with Alignment Data