> For the complete documentation index, see [llms.txt](https://docs.opencga.opencb.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.opencga.opencb.org/develop-2/about/roadmap.md).

# Roadmap

In this section, you can find only the main top-level features planned for major releases. For a more detailed list, you can go to GitHub Issues at <https://github.com/opencb/opencga/issues>.

## OpenCGA 2.x Releases

{% hint style="info" %}
From OpenCGA version 2.0.0 we follow **time-based releases**, two minor releases a year will be scheduled in April and October.
{% endhint %}

### 2.1.0 (Apr 2021)

You can track GitHub issues at [GitHub Issues 2.1.0](https://github.com/opencb/opencga/issues?q=is%3Aopen+is%3Aissue+milestone%3Av2.1.0). You can follow the development at [GitHub Projects](https://github.com/opencb/opencga/projects).

#### General

* Implement a **Centralised Log** analytic solution, we are planning to use Kibana *\*\**

#### Catalog

* Implement a new **Action** system, Catalog will notify to a message queue *(RabbitMQ, Apache Kafka),* this will allow other applications to know what's going on
* Improve **RESTful** web services by adding standardise **error codes** to the response, this will improve debugging

**Variant Storage Engine**

* Extend consequence type and population frequency filter in the sample genotype index
* Improve sample genotype index for clinical and cancer by filtering by cosmic or VAF
* Allow the index of custom INFO or FORMAT fields&#x20;
* Implement a new **Cache** functionality, some sample and family-based variant queries and analysis can take up to few seconds, since this data is read-only this could be easily cached

#### Clinical

#### Analysis Framework

#### Others

* Implement **FIHR Genomics** API, this will allow FIHR applications to query genomic variants in OpenCGA

### 2.0.0 (Oct 2020)

You can track GitHub issues at [GitHub Issues 2.0.0](https://github.com/opencb/opencga/issues?q=is%3Aopen+is%3Aissue+milestone%3Av2.0.0). You can follow the development at [GitHub Projects](https://github.com/opencb/opencga/projects).

#### General

* Improve **Docker** images, now stable versions with the different variant storage are pushed to Docker Hub
* Upgrade **dependencies**: MongoDB 4.2, Solr 8.1.1, JUnit 5.5.1, ...
* **Clean ups** and **remove** deprecated code and APIs

#### Catalog

* Add **ACID Transactions** to all database operations
* Improve **Audit**, extend audit data model and ensure all actions are now audited. Also, make audit *queryable*.
* Implement a new **Task** system, this will be used internally by OpenCGA to schedule some jobs, this new functionality can be also used by external applications
* Improve **RESTful** web services response and **warning/error** notifications
* Prepare OpenCGA for supporting **Federation** in next releases
* Improve **performance** and **test coverage**

#### Storage Engines

**Alignment**

* Support CRAM file

**Variant**

* Implement **structural variant imprecise** queries
* Implement new **Variant Score** to store results from analysis such as GWAS, this can be used when filtering
* Remove any **blocking variant operation**, any variant operation should be able to run at any time in a consistent way
* Improve **HBase sample index**, this will improve the **performance** of some **queries and** **analysis**
* Implement HBase-based **aggregations**
* Support new **HBase 2.0** version
* Improve **testing** and **benchmark** module

#### Analysis

**Framework**

* Develop an **Analysis Framework**, this will allow users to extend and customise OpenCGA with their own analysis
* Implement a **WrappedAnalysis** functionality in this framework to make easy to use any external tool such as Plink (see below in *Varlant Analysis* section)

**Variant**

* Implement on-demand **Variant Stats** and **Variant Sample Stats**
* Add GWAS **variant analysis**, this can optionally be stored and indexed in the new **Variant Score** object
* Add *Plink* as **wrapped analysis**

**Clinical Interpretation**

* Implement **Cancer Tiering** interpretation analysis algorithm
* Network-based clinical interpretation algorithm *(experimental)*
* Implement **Secondary Findings** analysis

#### Clinical

* Network-based clinical interpretation algorithm *(experimental)*

#### Cloud

* Full support for **Microsoft Azure and HDInsight 4.0,** this also includes **Azure AD, Azure Blob** and **Azure Batch**. We would like to **thank very much Microsoft Azure** for their amazing support and help here.
* Add **Kubernetes** for deployment and orchestration

**Note**: some of these features might be released in the Enterprise version coming soon

## OpenCGA 1.x Releases

### 1.4.0 (March 2019)

#### General

* Implement the new **HTSGET 1.0** protocol
* **IVA 0.9.0** will implement a full study and clinical analysis among many other features
* Add many more negative and variant **functional tests**
* **Documentation** improvements with new diagrams and tutorials

#### Catalog

* Complete and test all **delete** operations and implement *delete by queries* to make easier to delete batches of resources, with this the **REST API** can be considered complete
* Implement a new **admin** REST API, this will allow OpenCGA administrator to execute administrative tasks remotely
* New **PermissionRule** feature, you can define rules for assigning permissions automatically when new data is created, e.g. *set VIEW permission to USER to all samples where HOSPITAL = 'X'*
* New implementation of how **clinical data** (*annotation sets*) are store in the database, this new physical schema significantly improves querying annotations (even with nested objects or arrays), *group by* aggregations, *include/exclude* filtering and allow to *flatten* the annotations&#x20;
* Complete ***ClinicalAnalysis and*** ***ClinicalInterpretation*** data models and functionality
* Add **DiseasePanel** entity to manage panels

#### Variant Storage

* Final **HBase variant storage** implementation. New architecture should scale to few million of genomes and billion of variants.
* Support the last pending structural variant: **Translocation**. With this all structural variants are properly represented and stored
* Improve **variant stats** and add **simple variant analysis** such as association or Hardy-Weinberg test, this will be stored and indexed in the new ***VariantScore*** object
* Add INDEL **left-alignment** normalisation to *VariantNormaliser*
* **Variant Benchmark suite** to study scalability and performance
* Add a native implementation of Genomics England Tiering analysis

### 1.3.0 (November 2017)

#### General

* CLI **autocompletion** implemented
* New single CLI for execute **migrations** automatically
* New and fully functional **R client library** for REST web services, with this the four client libraries are completed
* New **IVA 0.9.0** is developed coordinately to exploit all the new features, they will be released together
* Many more **functional tests** added to test all new functionality described below
* Review and improve **Swagger** documentation and descriptions
* **Documentation** improvements with new diagrams and tutorials

#### Catalog

* New ***Family*** data model finished, now it is production ready, this completes and integrates three related data models: *Sample, Individual* and *Family*
* New ***Versioning*** feature implemented for *Sample, Individual* and *Family*. Now you can track any change in those data models, users can query o review any *version* of those documents
* New ***Export*** functionality implemented, this allows to export a *Project* as it was at any specific release, this can then imported in a new OpenCGA server
* New Study administrative group called ***admins***, all users in this group will be granted some special permissions at Study level such as *create groups* or *share* data, this will make Study administration much easier
* New ***Confidential*** permission for Variable Sets, now you can make some clinical data private for some users
* New ***ClinicalAnalysis*** data model added, this allows to define and stored different clinical interpretation analysis, this is still experimental and it should not be used in production
* Improvements in ***Group By*** queries, now you can pass a ***count*** parameter and aggregations only use data you can view, this can be useful for summarising data. Also, this has been added to *Individual* and *Family*
* Ensure that all query **GET** REST web services accept **comma-separated list of IDs**, at the moment only few of them accept ID lists, this will reduce the number of REST calls needed improving the performance
* New REST web service to **execute remote scripts** for Catalog, for instance "*move samples from Study*"
* **Performance improvements** when checking permissions (ACL) in *create* and *update* methods, now on average 50% less database queries are needed

#### Variant Storage

* Improve support for **Structural Variants**, in this release we will fully support *Insertion, Deletion* and *Copy Number* variants
* New ***VariantMetadata*** implemented, this is *exported* together with the variant data to be further analysed with other OpenCB projects using Spark
* New ***VariantScore*** object added to Variant data model, this will allow to store variant scores from cohort-related analysis such as association or Hardy-Weinberg tests in the next release
* Implement some **HBase** physical schema improvements and a better integration with Solr
* Support ***Amazon EMR*** Hadoop cluster
* **Performance improvements** when querying variants from samples, this will have a big impact in clinical interpretation analysis

#### Alignment Storage

* Major improvements in **BAM query** engine. New **server-side** filters added, this is a more efficient implementation since the data sent through the network is reduced. The available filters now are: *region, minMapQ, maxNumberMismatches, maxNumberHits,  properlyPaired,  maxInsertSize,  unmmapped* and *duplicated.*
* New **coverage** calculator using **BigWig**. Now coverage is calculated and stored in BigWig format, the *windowSize* is configurable. Also, coverage can now be queried for a *region* and optionally a *windowSize,* the server will **aggregate and compute the average** in *windowSizes.*
* New **REST** and **gRPC** APIs implementing the new query filters and coverage functionality. When using **REST** a JSON string is returned using GA4GH data model. When **gRPC** is used a binary stream is obtained. Note that in both protocols the filters are applied in the server.

## Unscheduled features

The following features have been accepted but no release version has been assigned:

* Add test for the CLI
* Support Slurm
* Add **Reactive Programming** (RxJava) and **Events**, this will allow to be easily integrated into other custom Java-based applications
* New **Gene Expression** database, this will include a Gene Annotation based on CellBase

You can find detailed information for some of them at <https://github.com/opencb/opencga/milestone/10>