-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include summary of data streams in /search
endpoint responses
#1252
Comments
This looks exactly like what we are looking for yes! |
Thanks for flagging, @jsoriano - happy to get this scheduled. Looks like this isn't too much effort to surface this data in the EPR API? |
Yes, I think so. I think we can avoid reindexing in the public EPR because this information is already indexed, so it will probably be a change only in package-registry. |
@kpollich @jsoriano . Do you think we could modify the PackageClient on the fleet code to support this optional format? In a way that its behind a flag or something, so it does not require any changes to any existing code elsewhere. The reason is we would like to continue to use the PackageClient like some other kibana plugins does, as we do not have access to things like if the user sets a custom EPR endpoint address and such. I believe the only change would be an additional flag argument to |
I think these changes are necessary but we also might need to update the flow where installed packages are loaded from Elasticsearch instead of EPR to store + fetch the same data stream data. cc @nchaulet to keep me honest. |
Yes I think making a change to the packageClient seems good, there will probably be a special case for uploaded package load from ES there. Could the response size be an issue? the search endpoint is not paginated and having all datastreams of all packages will probably create a huge response, what will happens when we add more packages? |
I was hoping that maybe this would not be the default behavior at the moment, so that there is less needs for changes to the current behavior, and we could work on the performance sides of it moving forward? I think pagination would be good. We are not including ALL the metadata from the datastreams, so I believe its only a few lines extra per datastream, which is still some extra, but not as much as the whole rest of the metadata would be. |
Support for pagination in the |
Security is implementing a functionality that creates RAG searches of our integrations to be used with the LLM. They need for this the package names, and the data stream names and titles. This information is available now in the registry, but the data streams are only included in responses of the
/package
endpoint, so it is needed to make a request per package. The information should be automatically updated. As this is intended to be executed on each deployment, this can be too many queries for a couple of fields.By now they are building their own index from the integrations repository, but this is also not an option for GA.
The most direct solution is to include this information in the
/search
responses, the same way as policy templates are included. This could be optionally selected using a parameter, to don't modify the current queries.The content added would be something like a subset of the content in the
/packages
response, for each package something like this:If more data is needed in the future, maybe we could prepare a "full index" that can be downloaded, but looks overkill at this point.
cc @P1llus
The text was updated successfully, but these errors were encountered: