Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update roadmap.md #6475

Merged
merged 3 commits into from
Jan 6, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 8 additions & 31 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,39 +9,16 @@ This document highlights some ideas for major features we'd like to implement in
To get a more complete overview of planned features and current work, see the [issue tracker](https://github.com/cortexproject/cortex/issues).
Note that these are not ordered by priority.

## Helm charts and other packaging
## CNCF Graduation Status

We have a [helm chart](https://github.com/cortexproject/cortex-helm-chart) but it needs work before it can be effectively utilised by different backends. We also don't provide an official set of dashboards and alerts to our users yet. This is one of the most requested features and something we will tackle in the immediate future. We also plan on publishing debs, rpms along with guides on how to run Cortex on bare-metal.
Cortex was accepted to the CNCF on September 20, 2018 and moved to the Incubating maturity level on August 20, 2020. The Cortex maintainers are working towards promoting the project to the graduation status. See [issue #6075](https://github.com/cortexproject/cortex/issues/6075) for tracking this progress.

## Auth Gateway
## Support for Prometheus Remote Write 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention it is a short term roadmap, IIUC?
This sounds like something we can support soon as there is already PR for it now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have roadmap items for longer terms

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a new section for longer term road map items (2+ months) and add a date to the top for when the document was last updated to provide an anchor for the reader on when the expected date of completion should be.


Cortex server has a simple authentication mechanism (X-Scope-OrgId) but users can't use the multitenancy features out of the box without complicated proxy configuration. It's hard to support all the different authentication mechanisms used by different companies but plan to have a simple but opinionated auth-gateway that provides value out of the box. The configuration could be as simple as:
[Prometheus Remote Write 2.0](https://prometheus.io/docs/specs/remote_write_spec_2_0/) adds

```
tenants:
- name: infra-team
password: basic-auth-password
- name: api-team
password: basic-auth-password2
```
* a new Protobuf Message with new features enabling more use cases and wider adoption on top of performance and cost savings
* deprecates the previous Protobuf Message from a 1.0 Remote-Write specification
* mandatory X-Prometheus-Remote-Write-*-Written HTTP response headers for reliability purposes

## Billing and Usage analytics

We have all the metrics to track how many series, samples and queries each tenant is sending but don't have dashboards that help with this. We plan to have dashboards and UIs that will help operators monitor and control each tenants usage out of the box.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created cortexproject/cortex-jsonnet#64 because this is nice. But it doesn't need to be part of the roadmap


## Downsampling
Downsampling means storing fewer samples, e.g. one per minute instead of one every 15 seconds.
This makes queries over long periods more efficient. It can reduce storage space slightly if the full-detail data is discarded.
Comment on lines -32 to -34
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have #4322 but it doesn't need to be part of the roadmap. I don't see this with priority

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think downsampling support is a nice feature for our long term roadmap

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this back in the longer-term roadmap so that attention can still be brought towards it in the future.


## Per-metric retention

Cortex blocks storage supports deleting all data for a tenant after a time period (e.g. 3 months, 1 year), but we would also like to have custom retention for subsets of metrics (e.g. delete server metrics but retain business metrics).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed. You can use a different tenant for different metrics with different retention today


## Exemplar support
[Exemplars](https://docs.google.com/document/d/1ymZlc9yuTj8GvZyKz1r3KDRrhaOjZ1W1qZVW_5Gj7gA/edit)
let you link metric samples to other data, such as distributed tracing.
As of early 2021 Prometheus will collect exemplars and send them via remote write, but Cortex needs to be extended to handle them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, exemplar support is already there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a effort on prometheus to create a storage for exemplars, but i agree, still early to put on the roadmap.


## Scalability

Scalability has always been a focus for the project, but there is a lot more work to be done. We can now scale to 100s of Millions of active series but 1 Billion active series is still an unknown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Billion active series in single tenant is doable today.
There are of course other scalability points we could still implement. I don't think we should put them in the roadmap, they are not that critical

For more information tracking this, please see [issue #6116](https://github.com/cortexproject/cortex/issues/6116).
Loading