Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Enhance memory validation #1380

Open
amitgalitz opened this issue Dec 4, 2024 · 0 comments
Open

[FEATURE] Enhance memory validation #1380

amitgalitz opened this issue Dec 4, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@amitgalitz
Copy link
Member

Is your feature request related to a problem?

We should enhance the Validate API for the OpenSearch Anomaly Detection plugin to provide users with better insights into the potential model size and memory requirements before creating a detector. This feature will help users make informed decisions about their detector configuration and resource allocation.

Current Situation:

  • Today we have lots of logic on the model estimation done after detector creation, and we run estimation across the entire cluster to see if we have enough space.
  • Users don't have a clear understanding of potential memory requirements before committing to a configuration.
  • There's no automatic estimation of cluster-wide resource needs based on the number of unique entities.

What solution would you like?

Option 1: Full Validation

  • Add the existing model size estimation logic that gets executed during detector creation to the model validation, to check if we have enough memory.
  • Implement a method to estimate the total memory required based on the estimated number of entities in the data for HC detectors and tell user if they have enough memory in the cluster and what the needed memory is.

Option 2: Estimates Only

  • Skip the full validation in MemoryTracker.
  • Provide users with estimated model size and total memory needed based on entity estimates.

Additional context:
Today we already have a lot of the memory estimation and validation already in place, adding this to the validation API might be unneeded but what might be of more help to the user and easier to implement is just giving some estimates directly to the user on model size, estimated entities to be seen per the given interval and estimated total memory needed so customers can have a better idea if their cluster is large enough or if way too large.

Implementation Details:

  • For single entity the model may land in any node and not necessarily the node with the least memory so we can check if the node with the most memory can at least hold the model. This is can be a starting safety check cause if even the node with the most memory can't hold the model, we should give a warning at the least.
  • I can use getPartitionedForestSizes and since we wont create the RCF model yet I can pass in the RCF parameters via estimateModelSize().
  • For HC I can use either the logic we have today in MemoryTracker for the initial validation but I can also execute some queries against any historical data we have available to have a better estimate of how many nodes or total memory we need in the cluster:

For example I can use a query like this to get some estimate of how many entities we are expecting if we are using 10 minute interval based on the last week of data.
query

Cardinality query example

{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "start_time",
        "lte": "end_time"
      }
    }
  },
  "aggs": {
    "by_interval": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "detector_interval"
      },
      "aggs": {
        "dimension": {
          "cardinality": {
            "field": "entity_field"
          }
        },
        "multi_buckets_sort": {
          "bucket_sort": {
            "sort": [{"dimension": {"order": "desc"}}],
            "size": 1
          }
        }
      }
    }
  }
}
  

Caveats:

  1. For single entity detectors, we should check if at least one node has enough memory to host the model.
  2. Consider use cases where user wants to see the estimated model size, total memory, # entities info provided on the frontend. This might be needed if users just want to better understand their data and configurations to downsize there cluster, telling them we have enough space might not be good enough
@amitgalitz amitgalitz added enhancement New feature or request untriaged and removed untriaged labels Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant