OpenCGA
OverviewInstallationUsing OpenCGA
v2.1
v2.1
  • Home
  • Overview
    • Features
    • Architecture Diagram
    • Security
    • Metadata and Clinical Data
    • Data Models
      • Sample
      • Individual
    • Alignment and Coverage
    • Variant Query and Analysis
    • Clinical Data Analysis
    • Running Analysis
    • Scalability and Performance
    • Sizing OpenCGA
  • OpenCGA Architecture
    • Catalog
    • Alignment and Coverage
    • Variant Storage Engine
      • Variant Data Model
    • Clinical Analysis
    • Analysis Framework
  • Data Models
    • User
    • Project
    • Study
    • File
    • Sample
    • Individual
    • Family
    • Cohort
    • Job
    • Clinical Analysis
  • Case Studies
    • Genomics England Research
    • Microsoft Azure
  • User Manual
    • Installation
      • Kubernetes Cluster
        • Azure
        • Configuration
      • On-Premise HPC Cluster
        • Server Configuration
        • Getting OpenCGA
        • Choosing Variant Storage Engine
        • Configuration
      • Running Docker
    • Login
    • Using OpenCGA
      • REST Web Service API
      • IVA Web App
      • Client Libraries
        • pyopencga - Python Library
        • opencgaR - R library
        • Java
        • JavaScript
      • Command Line
        • Configuration
      • Public Demo
    • Managing Data
      • Sharing and Permissions
        • Users and Groups
        • Study ACLs
      • Create Projects and Studies
      • Load VCF Files to a Study
      • Working with Files
      • Population of Metadata
        • Adding Custom Metadata
        • Data Versioning
    • Alignment Engine
      • BAM Index
      • Alignment Read Query
      • Working with Coverage
      • Alignment Analysis
    • Variant Storage Engine
      • Variant Query
      • Variant Aggregation Stats
      • Variant Analysis
    • Clinical Genomics
      • Clinical Interpretation
      • Clinical Analysis
    • Using JupyterLab
    • Administrator
      • User Management
      • Templates / Manifest
  • About
    • Roadmap
    • Release Notes
    • Community
Powered by GitBook
On this page
  • Overview
  • Catalog Entities
  • Design Principles

Was this helpful?

Data Models

PreviousAnalysis FrameworkNextUser

Last updated 3 years ago

Was this helpful?

Overview

This section describes the most relevant entities. For more detailed information about the data models such as Java source code, examples or the JSON Schemas you can visit the official code in GitHub.

A schematic diagram with the relation between OpenCGA Catalog Data Models is shown below:

Catalog Entities

The most relevant entities in OpenCGA Catalog are:

  • User:

Users represent the physical persons that are granted access to the database. Contains the data related to the user account.

Project

Projects represent the first physical separation of the data in OpenCGA. A Project entity contains information of a project, covering as many related studies as necessary.

Study

Studies represent the main space set environment. The Study is the parent of all the entities except project and user . It is important because most entities are defined at a study level (see the diagram above).

File:

Files represent the metadata about the files uploaded or linked to OpenCGA. This entity contains information regarding a submitted or generated file.

Sample

Information regarding the sample. Closely related to file entity.

Individual

Contain the information regarding the individual from whom the sample has been taken.

Cohort

Group sets of samples with some common feature(s).

Disease panel:

Define a disease panel containing the variants, genes and/or regions of interest.

Job

Job analysis launched using any of the files or samples.

Design Principles

All OpenCGA Data Models have been designed to follow a list of principles. This principles are agnostic (not entity-dependant) and thus apply to all the entities. Knowing these principles allows you understand the mechanism used by Catalog to represent real world clinical metadata and to infer the structure of the data associated with any entity:

  1. Parent-Child List Relationship

  2. Chid-Parent reference

  3. Annotation Sets: Catalog offers the option is to define user-custom annotation sets at any entity level.

OpenCGA Catalog Data Models
Catalog entities and their relations.