Templates / Manifest
Since OpenCGA v2.1.0, OpenCGA users with administration roles have the possibility to work with templates. Templates are a set of files with a defined specification that allow the user to perform a series of different operations, related to the ingestion of metadata in OpenCGA, e.g: define the samples, individuals, permission groups, etc. For more information on how OpenCGA stores metadata in Catalog here.
Templates are defined at a study level and could be provided in different formats accordingly to the user's needs. The file format and some common use cases are illustrated in the following section.
Remember that OpenCGA is highly configurable, and the use of templates constitutes a useful resource to reduce some common artifacts on the ingestion of metadata, but you can always use the OpenCGA clients (Client Libraries), command line (Command Line) or REST Web Service API to perform different operations in OpenCGA.
How it Works
The templates define a way to easily ingest metadata into OpenCGA. You need different things:
Manfiest: There is only one required file that you'd need to provide to use the template-related operations. This is a
json
ORyml
file namedmanifest.{json|yaml}
containing the specific configuration applied to the template. This file will define the root (i.e: the study where you will perform the operation). An example is provided below
Metadata and Clinical Data: You might need to provide a file per entity, where entities corresponds to one of the different comprehensive data models supported by OpenCGA Catalog (individuals, samples, files, families, cohorts, clinical_analysis). Each file will contain the entity-related information that you want to ingest into Catalog. For usability purposes two main specifications will be accepted. You can find the file structures accepted below:
NOTE: All the fields within each entity that are subjected to be manipulable by the users are clearly stated in the documentation of the entity data model. Please refer to Data Models.
JSON/YAML Files
You might want to provide a single JSON or YAML file per entity. In the case of using JSON you should write one JSON per line, if YAML is used then you can just concat them separating by '---'.
The following entities are supported.
For Individual:
individuals.{json|yaml}
For Sample:
samples.{json|yaml}
For File:
files.{json|yaml}
For Family:
families.{json|yaml}
For Cohort:
cohorts.{json|yaml}
For Clinical Analysis:
clinical_anaysis.{json|yaml}
TAB Text Files
You can load data for the entities using TAB separated .txt files. There are some simple construction rules that the user needs to follow for the ingestion to be successful:
First line starting with # symbol containing the exact name of the corresponding data model
The column name must correspond to the field reserved in the entity data model. Refer to each entity data model documentation for checking the accepted fields.
The order of the columns is not relevant.
Last updated