# Roadmap

In this section, you can find only the main top-level features planned for major releases. For a more detailed list, you can go to GitHub Issues at <https://github.com/opencb/opencga/issues>.

## OpenCGA 2.x Releases

{% hint style="info" %}
From OpenCGA version 2.0.0 we follow **time-based releases**, two minor releases a year will be scheduled in April and October.&#x20;
{% endhint %}

### 2.1.0 (Apr 2021)

You can track GitHub issues at [GitHub Issues 2.1.0](https://github.com/opencb/opencga/issues?q=is%3Aopen+is%3Aissue+milestone%3Av2.1.0). You can follow the development at [GitHub Projects](https://github.com/opencb/opencga/projects).

#### General

* Implement a **Centralised Log** analytic solution, we are planning to use Kibana<br>

#### Catalog

* Implement a new **Action** system, Catalog will notify to a message queue *(RabbitMQ, Apache Kafka),* this will allow other applications to know what's going on
* Improve **RESTful** web services by adding standardise **error codes** to the response, this will improve debugging

**Variant Storage Engine**

* Extend consequence type and population frequency filter in the sample genotype index
* Improve sample genotype index for clinical and cancer by filtering by cosmic or VAF
* Allow the index of custom INFO or FORMAT fields&#x20;
* Implement a new **Cache** functionality, some sample and family-based variant queries and analysis can take up to few seconds, since this data is read-only this could be easily cached

#### Clinical

#### Analysis Framework

#### Others

* Implement **FIHR Genomics** API, this will allow FIHR applications to query genomic variants in OpenCGA

### 2.0.0 (Oct 2020)

You can track GitHub issues at [GitHub Issues 2.0.0](https://github.com/opencb/opencga/issues?q=is%3Aopen+is%3Aissue+milestone%3Av2.0.0). You can follow the development at [GitHub Projects](https://github.com/opencb/opencga/projects).

#### General

* Improve **Docker** images, now stable versions with the different variant storage are pushed to Docker Hub
* Upgrade **dependencies**: MongoDB 4.2, Solr 8.1.1, JUnit 5.5.1, ...
* **Clean ups** and **remove** deprecated code and APIs

#### Catalog

* Add **ACID Transactions** to all database operations
* Improve **Audit**, extend audit data model and ensure all actions are now audited. Also, make audit *queryable*.
* Implement a new **Task** system, this will be used internally by OpenCGA to schedule some jobs, this new functionality can be also used by external applications
* Improve **RESTful** web services response and **warning/error** notifications
* Prepare OpenCGA for supporting **Federation** in next releases
* Improve **performance** and **test coverage**

#### Storage Engines

**Alignment**

* Support CRAM file

**Variant**

* Implement **structural variant imprecise** queries
* Implement new **Variant Score** to store results from analysis such as GWAS, this can be used when filtering
* Remove any **blocking variant operation**, any variant operation should be able to run at any time in a consistent way
* Improve **HBase sample index**, this will improve the **performance** of some **queries and** **analysis**
* Implement HBase-based **aggregations**
* Support new **HBase 2.0** version
* Improve **testing** and **benchmark** module

#### Analysis

**Framework**

* Develop an **Analysis Framework**, this will allow users to extend and customise OpenCGA with their own analysis
* Implement a **WrappedAnalysis** functionality in this framework to make easy to use any external tool such as Plink (see below in *Varlant Analysis* section)

**Variant**

* Implement on-demand **Variant Stats** and **Variant Sample Stats**
* Add GWAS **variant analysis**, this can optionally be stored and indexed in the new **Variant Score** object
* Add *Plink* as **wrapped analysis**

**Clinical Interpretation**

* Implement **Cancer Tiering** interpretation analysis algorithm
* Network-based clinical interpretation algorithm *(experimental)*
* Implement **Secondary Findings** analysis

#### Clinical

* Network-based clinical interpretation algorithm *(experimental)*

#### Cloud

* Full support for **Microsoft Azure and HDInsight 4.0,** this also includes **Azure AD, Azure Blob** and **Azure Batch**. We would like to **thank very much Microsoft Azure** for their amazing support and help here.
* Add **Kubernetes** for deployment and orchestration

**Note**: some of these features might be released in the Enterprise version coming soon

## OpenCGA 1.x Releases

### 1.4.0 (March 2019)

#### General

* Implement the new **HTSGET 1.0** protocol
* **IVA 0.9.0** will implement a full study and clinical analysis among many other features
* Add many more negative and variant **functional tests**
* **Documentation** improvements with new diagrams and tutorials

#### Catalog

* Complete and test all **delete** operations and implement *delete by queries* to make easier to delete batches of resources, with this the **REST API** can be considered complete
* Implement a new **admin** REST API, this will allow OpenCGA administrator to execute administrative tasks remotely
* New **PermissionRule** feature, you can define rules for assigning permissions automatically when new data is created, e.g. *set VIEW permission to USER to all samples where HOSPITAL = 'X'*
* New implementation of how **clinical data** (*annotation sets*) are store in the database, this new physical schema significantly improves querying annotations (even with nested objects or arrays), *group by* aggregations, *include/exclude* filtering and allow to *flatten* the annotations&#x20;
* Complete ***ClinicalAnalysis and*** ***ClinicalInterpretation*** data models and functionality
* Add **DiseasePanel** entity to manage panels

#### Variant Storage

* Final **HBase variant storage** implementation. New architecture should scale to few million of genomes and billion of variants.
* Support the last pending structural variant: **Translocation**. With this all structural variants are properly represented and stored
* Improve **variant stats** and add **simple variant analysis** such as association or Hardy-Weinberg test, this will be stored and indexed in the new ***VariantScore*** object
* Add INDEL **left-alignment** normalisation to *VariantNormaliser*
* **Variant Benchmark suite** to study scalability and performance
* Add a native implementation of Genomics England Tiering analysis

### 1.3.0 (November 2017)

#### General

* CLI **autocompletion** implemented
* New single CLI for execute **migrations** automatically
* New and fully functional **R client library** for REST web services, with this the four client libraries are completed
* New **IVA 0.9.0** is developed coordinately to exploit all the new features, they will be released together
* Many more **functional tests** added to test all new functionality described below
* Review and improve **Swagger** documentation and descriptions
* **Documentation** improvements with new diagrams and tutorials

#### Catalog

* New ***Family*** data model finished, now it is production ready, this completes and integrates three related data models: *Sample, Individual* and *Family*
* New ***Versioning*** feature implemented for *Sample, Individual* and *Family*. Now you can track any change in those data models, users can query o review any *version* of those documents
* New ***Export*** functionality implemented, this allows to export a *Project* as it was at any specific release, this can then imported in a new OpenCGA server
* New Study administrative group called ***admins***, all users in this group will be granted some special permissions at Study level such as *create groups* or *share* data, this will make Study administration much easier
* New ***Confidential*** permission for Variable Sets, now you can make some clinical data private for some users
* New ***ClinicalAnalysis*** data model added, this allows to define and stored different clinical interpretation analysis, this is still experimental and it should not be used in production
* Improvements in ***Group By*** queries, now you can pass a ***count*** parameter and aggregations only use data you can view, this can be useful for summarising data. Also, this has been added to *Individual* and *Family*
* Ensure that all query **GET** REST web services accept **comma-separated list of IDs**, at the moment only few of them accept ID lists, this will reduce the number of REST calls needed improving the performance
* New REST web service to **execute remote scripts** for Catalog, for instance "*move samples from Study*"
* **Performance improvements** when checking permissions (ACL) in *create* and *update* methods, now on average 50% less database queries are needed

#### Variant Storage

* Improve support for **Structural Variants**, in this release we will fully support *Insertion, Deletion* and *Copy Number* variants
* New ***VariantMetadata*** implemented, this is *exported* together with the variant data to be further analysed with other OpenCB projects using Spark
* New ***VariantScore*** object added to Variant data model, this will allow to store variant scores from cohort-related analysis such as association or Hardy-Weinberg tests in the next release
* Implement some **HBase** physical schema improvements and a better integration with Solr
* Support ***Amazon EMR*** Hadoop cluster
* **Performance improvements** when querying variants from samples, this will have a big impact in clinical interpretation analysis

#### Alignment Storage

* Major improvements in **BAM query** engine. New **server-side** filters added, this is a more efficient implementation since the data sent through the network is reduced. The available filters now are: *region, minMapQ, maxNumberMismatches, maxNumberHits,  properlyPaired,  maxInsertSize,  unmmapped* and *duplicated.*
* New **coverage** calculator using **BigWig**. Now coverage is calculated and stored in BigWig format, the *windowSize* is configurable. Also, coverage can now be queried for a *region* and optionally a *windowSize,* the server will **aggregate and compute the average** in *windowSizes.*
* New **REST** and **gRPC** APIs implementing the new query filters and coverage functionality. When using **REST** a JSON string is returned using GA4GH data model. When **gRPC** is used a binary stream is obtained. Note that in both protocols the filters are applied in the server.

## Unscheduled features

The following features have been accepted but no release version has been assigned:

* Add test for the CLI
* Support Slurm
* Add **Reactive Programming** (RxJava) and **Events**, this will allow to be easily integrated into other custom Java-based applications
* New **Gene Expression** database, this will include a Gene Annotation based on CellBase

You can find detailed information for some of them at <https://github.com/opencb/opencga/milestone/10>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opencga.opencb.org/about/roadmap.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
