Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Invalid Field Names in Metric Aggregation Queries that use star tree returns 500 Internal Server Error #16473

Closed
expani opened this issue Oct 24, 2024 · 3 comments · Fixed by #16481
Assignees
Labels
bug Something isn't working Search:Aggregations Search Search query, autocomplete ...etc

Comments

@expani
Copy link
Contributor

expani commented Oct 24, 2024

Describe the bug

When running Metric aggregation queries over indices that have star tree enabled, using invalid field names in the query leads to a 500 Internal Server Error.

Related component

Search

Reproduction Steps

Checkout the main branch of OpenSearch.

We need to enable the star tree feature flag
Add this line in OpenSearchNode#createConfiguration() which is the default config used with gradlew run.

baseConfig.put("opensearch.experimental.feature.composite_index.star_tree.enabled", "true");

Build the checkout and run the server ./gradlew run

Enable the Cluster setting indices.composite_index.star_tree.enabled for indexing

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'{
  "persistent" : {
    "indices.composite_index.star_tree.enabled" : "true"
  }
}'

Create Index

curl -XPUT -H'Content-type: application/json' -kv localhost:9200/logs -d '{
  "settings": {
    "index.number_of_shards": 1,
    "index.number_of_replicas": 0,
    "index.composite_index": true
  },
  "mappings": {
    "composite": {
      "startree1": {
        "type": "star_tree",
        "config": {
          "ordered_dimensions": [
            {
              "name": "status"
            },
			{
              "name": "port"
            }
          ],
          "metrics": [
            {
              "name": "size",
              "stats": [
                "sum"
              ]
             },
             {
              "name": "latency",
              "stats": [
                "avg"
              ]
            }
          ]
        }
      }
    },
    "properties": {
      "status": {
        "type": "integer"
      },
      "port": {
        "type": "integer"
      },
      "size": {
        "type": "integer"
      },
      "latency": {
        "type": "scaled_float",
        "scaling_factor": 10
      }
    }
  }
}'

Run a search query containing an invalid field name. request_size in this case.

curl -XPOST -H'Content-type: application/json' -kv localhost:9200/logs/_search -d '{
  "size": 0,
  "aggs": {
    "sum_request_size": {
      "sum": {
        "field": "request_size"
      }
    }
  }
}'

A 500 Internal Server Error is thrown with the below exception seen in logs.

[2024-10-24T19:56:56,719][WARN ][r.suppressed             ] [runTask-0] path: /logs/_search, params: {index=logs}
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:775) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:395) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:815) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:548) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:316) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:760) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1719) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1505) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1619) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1593) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:81) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:75) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:982) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
        at java.base/java.lang.Thread.run(Thread.java:1570) [?:?]
Caused by: org.opensearch.OpenSearchException$3: Cannot invoke "org.opensearch.search.aggregations.support.FieldContext.field()" because the return value of "org.opensearch.search.aggregations.support.ValuesSourceConfig.fieldContext()" is null
        at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:710) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:393) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        ... 23 more
Caused by: java.lang.NullPointerException: Cannot invoke "org.opensearch.search.aggregations.support.FieldContext.field()" because the return value of "org.opensearch.search.aggregations.support.ValuesSourceConfig.fieldContext()" is null
        at org.opensearch.search.aggregations.support.ValuesSourceAggregatorFactory.getField(ValuesSourceAggregatorFactory.java:107) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.compositeindex.datacube.startree.utils.StarTreeQueryHelper.validateStarTreeMetricSupport(StarTreeQueryHelper.java:153) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.compositeindex.datacube.startree.utils.StarTreeQueryHelper.getStarTreeQueryContext(StarTreeQueryHelper.java:77) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.SearchService.parseSource(SearchService.java:1550) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.SearchService.createContext(SearchService.java:1108) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:705) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:678) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        ... 8 more

Expected behavior

400 Bad Request should be returned with appropriate error message.

@expani
Copy link
Contributor Author

expani commented Oct 24, 2024

Field Context is retrieved from the Shard Context mapper here and remains null if it's not found.

If we handle null check here and throw appropriate exception here I think the issue can be solved.

Let me know your thoughts on this. @bharath-techie @sandeshkr419 I

@expani
Copy link
Contributor Author

expani commented Oct 24, 2024

The above checks fixed the issue. But, there are other issues like validation of field name present in query against metrics and dimensions (CompositeDataCubeFieldType ) to ensure it can make use of star tree index.

Will take a deeper look into it later.

@expani
Copy link
Contributor Author

expani commented Oct 25, 2024

There is no error in the query when Star Tree feature flag is disabled. It just returns empty hits.

{"took":4,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]},"aggregations":{"average_latency":{"value":null},"sum_request_size":{"value":0.0}}}

So, we were never validating fields used in aggregation against the actual fields of index. Most of the existing aggregation queries without star tree have the same limitation.

I was able to manage the same response for metric aggregation queries using star tree with these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Aggregations Search Search query, autocomplete ...etc
Projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants