You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should enhance the Validate API for the OpenSearch Anomaly Detection plugin to provide users with better insights into the potential model size and memory requirements before creating a detector. This feature will help users make informed decisions about their detector configuration and resource allocation.
Current Situation:
Today we have lots of logic on the model estimation done after detector creation, and we run estimation across the entire cluster to see if we have enough space.
Users don't have a clear understanding of potential memory requirements before committing to a configuration.
There's no automatic estimation of cluster-wide resource needs based on the number of unique entities.
What solution would you like?
Option 1: Full Validation
Add the existing model size estimation logic that gets executed during detector creation to the model validation, to check if we have enough memory.
Implement a method to estimate the total memory required based on the estimated number of entities in the data for HC detectors and tell user if they have enough memory in the cluster and what the needed memory is.
Option 2: Estimates Only
Skip the full validation in MemoryTracker.
Provide users with estimated model size and total memory needed based on entity estimates.
Additional context:
Today we already have a lot of the memory estimation and validation already in place, adding this to the validation API might be unneeded but what might be of more help to the user and easier to implement is just giving some estimates directly to the user on model size, estimated entities to be seen per the given interval and estimated total memory needed so customers can have a better idea if their cluster is large enough or if way too large.
Implementation Details:
For single entity the model may land in any node and not necessarily the node with the least memory so we can check if the node with the most memory can at least hold the model. This is can be a starting safety check cause if even the node with the most memory can't hold the model, we should give a warning at the least.
For HC I can use either the logic we have today in MemoryTracker for the initial validation but I can also execute some queries against any historical data we have available to have a better estimate of how many nodes or total memory we need in the cluster:
For example I can use a query like this to get some estimate of how many entities we are expecting if we are using 10 minute interval based on the last week of data.
query
For single entity detectors, we should check if at least one node has enough memory to host the model.
Consider use cases where user wants to see the estimated model size, total memory, # entities info provided on the frontend. This might be needed if users just want to better understand their data and configurations to downsize there cluster, telling them we have enough space might not be good enough
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem?
We should enhance the Validate API for the OpenSearch Anomaly Detection plugin to provide users with better insights into the potential model size and memory requirements before creating a detector. This feature will help users make informed decisions about their detector configuration and resource allocation.
Current Situation:
What solution would you like?
Option 1: Full Validation
Option 2: Estimates Only
Additional context:
Today we already have a lot of the memory estimation and validation already in place, adding this to the validation API might be unneeded but what might be of more help to the user and easier to implement is just giving some estimates directly to the user on model size, estimated entities to be seen per the given interval and estimated total memory needed so customers can have a better idea if their cluster is large enough or if way too large.
Implementation Details:
For example I can use a query like this to get some estimate of how many entities we are expecting if we are using 10 minute interval based on the last week of data.
query
Cardinality query example
Caveats:
The text was updated successfully, but these errors were encountered: