OpenCGA
OverviewInstallationUsing OpenCGA
v2.1
v2.1
  • Home
  • Overview
    • Features
    • Architecture Diagram
    • Security
    • Metadata and Clinical Data
    • Data Models
      • Sample
      • Individual
    • Alignment and Coverage
    • Variant Query and Analysis
    • Clinical Data Analysis
    • Running Analysis
    • Scalability and Performance
    • Sizing OpenCGA
  • OpenCGA Architecture
    • Catalog
    • Alignment and Coverage
    • Variant Storage Engine
      • Variant Data Model
    • Clinical Analysis
    • Analysis Framework
  • Data Models
    • User
    • Project
    • Study
    • File
    • Sample
    • Individual
    • Family
    • Cohort
    • Job
    • Clinical Analysis
  • Case Studies
    • Genomics England Research
    • Microsoft Azure
  • User Manual
    • Installation
      • Kubernetes Cluster
        • Azure
        • Configuration
      • On-Premise HPC Cluster
        • Server Configuration
        • Getting OpenCGA
        • Choosing Variant Storage Engine
        • Configuration
      • Running Docker
    • Login
    • Using OpenCGA
      • REST Web Service API
      • IVA Web App
      • Client Libraries
        • pyopencga - Python Library
        • opencgaR - R library
        • Java
        • JavaScript
      • Command Line
        • Configuration
      • Public Demo
    • Managing Data
      • Sharing and Permissions
        • Users and Groups
        • Study ACLs
      • Create Projects and Studies
      • Load VCF Files to a Study
      • Working with Files
      • Population of Metadata
        • Adding Custom Metadata
        • Data Versioning
    • Alignment Engine
      • BAM Index
      • Alignment Read Query
      • Working with Coverage
      • Alignment Analysis
    • Variant Storage Engine
      • Variant Query
      • Variant Aggregation Stats
      • Variant Analysis
    • Clinical Genomics
      • Clinical Interpretation
      • Clinical Analysis
    • Using JupyterLab
    • Administrator
      • User Management
      • Templates / Manifest
  • About
    • Roadmap
    • Release Notes
    • Community
Powered by GitBook
On this page
  • Overview
  • Summary
  • Data Model
  • Sample
  • SampleInternal
  • ProjectInternal
  • CohortInternal
  • StudyInternal
  • Example

Was this helpful?

  1. Data Models

Sample

Overview

Sample data model hosts information about any biological material, normally extracted from an Individual, that is used for a particular analysis. This is the main data model, it stores the most basic and important information.

Summary

Field

create

update

unique

required

id

processing

collection

qualityControl

description

somatic

phenotypes

individualId

fileIds

status

attributes

uuid

release

version

creationDate

modificationDate

internal

Data Model

Sample

Field

Description

id String

Sample ID in the study, this must be unique in the study but can be repeated in different studies. This is a mandatory parameter when creating a new sample, this ID cannot be changed at the moment.Tags: required, immutable, unique

uuid String

Global unique ID at the whole OpenCGA installation. This is automatically created during the sample creation and cannot be changed.Tags: internal, unique, immutable

Describes how the sample was processed in the lab.

Describes how the sample was collected.Note: The sample collection is a list of samples

release int

An integer describing the current data release.Tags: internal

version int

An integer describing the current version.Tags: internal

creationDate String

String representing when the sample was created, this is automatically set by OpenCGA.Tags: internal

modificationDate String

String representing when was the last time the sample was modified, this is automatically set by OpenCGA.Tags: internal

description String

An string to describe the properties of the sample.

somatic boolean

Indicates if the sample is somatic or germline (default)

phenotypes List<Phenotype>

A List with related phenotypes.

individualId String

fileIds List<String> Deprecated

List of File ID containing this sample, eg BAM, VCF, QC images, ...

An object describing the status of the Sample.

An object describing the internal information of the Sample. This is managed by OpenCGA.Tags: internal

attributes Map<Object,String>

You can use this field to store any other information, keep in mind this is not indexed so you cannot search by attributes.

SampleInternal

Field

Description

ProjectInternal

Field

Description

CohortInternal

Field

Description

StudyInternal

Field

Description

Example

This is a full JSON example:

{
  id: "ISDBM322015",
  uuid: "eba13afe-0172-0004-0001-d4c92fd95e0a",
  individualId: "ISDBM322015",
  fileIds: [
    "data:quartet.variants.annotated.vcf.gz",
    "SonsAlignedBamFile.bam"
  ],
  annotationSets: [],
  description: "",
  somatic: false,
  qualityControl: {
    fileIds: [],
    comments: [],
    alignmentMetrics: [
      {
        bamFileId: SonsAlignedBamFile.bam,
        fastQc: {13 items},
        samtoolsFlagstats: {14 items},
        geneCoverageStats: [2 items]
      }
    ],
    variantMetrics: {
      variantStats: [1 item],
      signatures: [],
      vcfFileIds: []
    }
  },
  release: 1,
  version: 5,
  creationDate: "20200625131831",
  modificationDate: "20200709003738",
  phenotypes: [
    {
      id: "HP:0000545",
      name: "Myopia",
      source: "HPO"
    }
  ],
  status: {
    name: "",
    description: "",
    date: ""
  },
  internal: {
    status: {
      name: "READY",
      date: "20200625131831",
      description: ""
    }
  },
  attributes: {
    OPENCGA_INDIVIDUAL: {
      id: "ISDBM322015",
      name: "ISDBM322015",
      uuid: "eba13738-0172-0006-0001-283471b7ae69",
      father: {4 items},
      mother: {4 items},
      location: {},
      qualityControl: {4 items},
      sex: "MALE",
      karyotypicSex: "XY",
      ethnicity: "",
      population: {},
      release: 1,
      version: 6,
      creationDate: "20200625131830",
      modificationDate: "20201027004616",
      lifeStatus: "ALIVE",
      phenotypes: [2 items],
      disorders: [1 item],
      parentalConsanguinity: false,
      status: {3 items},
      internal: {1 item},
      attributes: {}
    }
  }
}
PreviousFileNextIndividual

Last updated 3 years ago

Was this helpful?

You can find the Java code .

processing

collection since: 2.1

qualityControl since: 2.1

Contains different metrics to evaluate the quality of the sample.Note: The sample collection is a list of samples</br>More info at:

A reference to the Individual containing this sample. Notice that samples can exist without and Individual ID, this field is not mandatory..More info at:

status

internal

You can find the Java code .

status

rga

You can find the Java code .

datastores

cellbase

status

You can find the Java code .

status

You can find the Java code .

status

configuration

here
here
here
here
here
SampleProcessing
SampleCollection
SampleQualityControl
ZetaGenomics
ZetaGenomics
CustomStatus
SampleInternal
Status
RgaIndex
Datastores
CellBaseConfiguration
Status
CohortStatus
Status
StudyConfiguration