Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic endpoints #967

Closed
wants to merge 44 commits into from
Closed

Conversation

TomBebb
Copy link

@TomBebb TomBebb commented Oct 21, 2024

Fixes #370

Description

Adds support for dynamic queries by periodically re-fetching the query given.

Associates each dynamic query with a UUID.
Uses a static ref map to associate the UUIDs with the latest result.


This PR has:

  • [ y] been tested to ensure log ingestion and log query works.
  • [y ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [ y] added documentation for new or modified features or behaviors.

/claim #370

Copy link
Contributor

github-actions bot commented Oct 21, 2024

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@TomBebb
Copy link
Author

TomBebb commented Oct 21, 2024

I have read the CLA Document and I hereby sign the CLA

nitisht added a commit to parseablehq/.github that referenced this pull request Oct 21, 2024
@nitisht
Copy link
Member

nitisht commented Oct 22, 2024

Can you look at the CI failures @TomBebb ?

@@ -106,6 +106,7 @@ path-clean = "1.0.1"
prost = "0.13.3"
prometheus-parse = "0.2.5"
sha2 = "0.10.8"
uuid = { version = "1.11.0", features = ["v4", "fast-rng"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use ulid instead - already being used in other places

@TomBebb
Copy link
Author

TomBebb commented Oct 22, 2024

Done

@nikhilsinhaparseable
Copy link
Contributor

@TomBebb we need to add a few validations and update a few things in this PR -

  1. max cache duration should be 60 mins
  2. max number of unique url server stores should be 10
  3. results should be cached in disk not in memory - for this, you can expose an env var where user can provide a directory path and you can use this path to store the results in parquet with the file name uuid.parquet
  4. server should refresh the data every minute , delete the old parquet and write the new parquet in the disk so that at any point when client does a GET /uuid call, server has the preprocessed data and can return the data from the disk.

do let me know if you need further clarification to the points mentioned here

Thanks!

@TomBebb
Copy link
Author

TomBebb commented Oct 23, 2024

@nikhilsinhaparseable Cool, working on that now

@TomBebb
Copy link
Author

TomBebb commented Oct 24, 2024

@TomBebb we need to add a few validations and update a few things in this PR -

1. max cache duration should be 60 mins

2. max number of unique url server stores should be 10

3. results should be cached in disk not in memory - for this, you can expose an env var where user can provide a directory path and you can use this path to store the results in parquet with the file name uuid.parquet

4. server should refresh the data every minute , delete the old parquet and write the new parquet in the disk so that at any point when client does a GET /uuid call, server has the preprocessed data and can return the data from the disk.

do let me know if you need further clarification to the points mentioned here

Thanks!

What should the difference be between hitting the cache duration and waiting a minute be?

@TomBebb
Copy link
Author

TomBebb commented Oct 25, 2024

Implemented using advised notes, please can someone re-review?

@nikhilsinhaparseable
Copy link
Contributor

@TomBebb below are the review comments -

  1. the env DYNAMIC_QUERY_RESULTS_CACHE_PATH_ENV should be optional, if user provides, dynamic query endpoints work else should give error as env not set
  2. return the uuid as response first and separate thread processes the query and write parquet to disk, handler should not wait for query to be processed before returning the uuid
  3. query fails when I use aggregate query like Select count(*) from app2000 order by p_timestamp (please test with other aggregate queries as well)
thread 'actix-rt|system:0|arbiter:1' panicked at server/src/dynamic_query.rs:114:69:
called `Result::unwrap()` on an `Err` value: Datafusion(SchemaError(FieldNotFound { field: Column { relation: Some(Bare { table: "app2000" }), name: "p_timestamp" }, valid_fields: [Column { relation: None, name: "count(*)" }] }, Some("")))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
  1. BTreeMap<Ulid, DynamicQuery> - store this in disk in root path of DYNAMIC_QUERY_RESULTS_CACHE_PATH_ENV so that you can load to memory at server start

I will update if I find anything else.

@TomBebb
Copy link
Author

TomBebb commented Oct 28, 2024

@nikhilsinhaparseable I cannot reproduce the aggregate query issue on my commit before master merge, but can in commits since and on master.
image

@TomBebb
Copy link
Author

TomBebb commented Oct 30, 2024

@nikhilsinhaparseable There is a datafusion byte serialization crate datafusion_proto but it only supports expression conversion to / from bytes for now. I can work around that or just shove the raw text query in the parquet file.

@nikhilsinhaparseable
Copy link
Contributor

@TomBebb please update if the PR is ready for review

@nitisht nitisht closed this Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: dynamic endpoints based on pre-defined sql queries
3 participants