Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Commit

Permalink
Merge pull request #173 from o19s/2.14.0-documentation
Browse files Browse the repository at this point in the history
Updating documentation
  • Loading branch information
jzonthemtn authored May 28, 2024
2 parents 262e416 + 19bdb5d commit 4a3d7e9
Show file tree
Hide file tree
Showing 11 changed files with 324 additions and 843 deletions.
333 changes: 322 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,344 @@
# OpenSearch User Behavior Insights

This project is the OpenSearch plugin for storing and managing search requests and responses along with events produced client-side.
This repository contains the OpenSearch plugin for the User Behavior Insights (UBI) capability. This plugin
facilitates persisting client-side events (e.g. item clicks, scroll depth) and OpenSearch queries for the purpose of analyzing the data
to improve search relevance and user experience.

Related Links:
* OpenSearch UBI RFC - https://github.com/opensearch-project/OpenSearch/issues/12084
UBI and this plugin project was originally proposed in the [OpenSearch UBI RFC](https://github.com/opensearch-project/OpenSearch/issues/12084).

## Building the Plugin
## UBI Schemas

Build the plugin
Please note that this repository is the implementation of the UBI plugin for OpenSearch.

[!IMPORTANT] This release targets to the 1.0 version of the UBI Specification, which will be released when this plugin is released and the below links will be updated.

For details on the JSON Schema used by UBI to send and receive queries and events please see the [UBI](https://github.com/o19s/ubi) repository and the links below.
* [Query Request Schema](https://o19s.github.io/ubi/docs/html/query.request.schema.html)
* [Query Response Schema](https://o19s.github.io/ubi/docs/html/query.response.schema.html)
* [Event Schema](https://o19s.github.io/ubi/docs/html/event.schema.html)

## Getting Help

* For questions or help getting started, please find us in the [OpenSearch Slack](https://opensearch.org/slack.html) in the `#plugins` channel.
* For bugs or feature requests, please create [a new issue](https://github.com/o19s/opensearch-ubi/issues/new/choose).

## Useful Commands

* Get the indexed queries: `curl http://localhost:9200/ubi_queries/_search | jq`
*

## User Quick Start

### Installing the Plugin

To get started, download the plugin zip file from the [releases](https://github.com/o19s/opensearch-ubi/releases). Next, install the plugin into OpenSearch with the following command:

```
bin/opensearch-plugin install file:/opensearch-ubi-1.0.0-os2.14.0.zip
```

To create the UBI indexes called `ubi_queries` and `ubi_events`, send a query to an OpenSearch index with the `ubi` query block added:

```
curl -s http://localhost:9200/your-index/_search -H "Content-Type: application/json" -d'
{
"ext": {
"ubi": {
}
},
"query": {
"match": {
"name": "toner"
}
}
}'
```

These indexes can also be created manually by using the [mapping files](https://github.com/o19s/opensearch-ubi/tree/2.14.0/src/main/resources):

```
curl -s -X PUT "http://localhost:9200/ubi-events" -H "Content-Type: application/json" --data-binary @/path/to/events-mapping.json
curl -s -X PUT "http://localhost:9200/ubi-queries" -H "Content-Type: application/json" --data-binary @/path/to/events-queries.json
```

### Capturing Queries

Queries sent to OpenSearch containing a `ubi` section in the `ext` block will be captured by the plugin and stored in the `ubi_queries` index. For example:

```
curl -s http://localhost:9200/your-index/_search -H "Content-Type: application/json" -d'
{
"ext": {
"ubi": {
"query_id": "12300d16cb-b6f1-4012-93ebcc49cac90426"
}
},
"query": {
"match": {
"name": "toner"
}
}
}'
```

As shown in the query above, the presence of the `ubi` block in the `ext` section causes the UBI plugin to capture the query and store it in the `ubi_queries` index.
The `ubi` block can contain the following parameters. All parameters are optional.

* `query_id` - A unique identifier for the query. If not provided, the plugin will generate a random UUID for the query and return the UUID in the query response. The `query_id` can be in any format but we use UUIDs in the examples.
* `user_query` - The user-entered query. This is meant to be the actual text the user provided to initiate the search.
* `client_id` - A unique identifier for the originator of the query. The client may be a user, an application, or any other originator of the query. The `client_id` can be in any format but we use UUIDs in the examples.
* `object_id_field` - The name of a field in the index that contains a unique identifier for each result. If not provided, the `_id` field is used.

Following is an example query that provides the parameters:

```
curl -s http://localhost:9200/your-index/_search -H "Content-Type: application/json" -d'
{
"ext": {
"ubi": {
"query_id": "12300d16cb-b6f1-4012-93ebcc49cac90426",
"user_query": "Toner",
"client_id": "c4af7ca2-d6d2-4326-a41f-e616ebdd3d7b",
"object_id_field": "product_name"
}
},
"query": {
"match": {
"name": "toner"
}
}
}'
```

Similar to a query request, query responses will also include a `ubi` section in the `ext` block.

```
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 2.2413535,
"hits": [
{
"_index": "ecommerce",
"_id": "968447",
"_score": 2.2413535,
"_source": {
"id": "968447",
"name": "Cyan Toner Cartridge"
}
}
]
},
"ext": {
"ubi": {
"query_id": "12300d16cb-b6f1-4012-93ebcc49cac90426"
}
}
}
```

The only field present in the query response will be the `query_id`. The value of the `query_id` is the same as the `query_id` provided in the query, or, if not provided, a random UUID.

### Indexed Queries

To see how the query is indexed, you can search the `ubi_queries` index:

```
curl -s http://localhost:9200/ubi_queries/_search -H "Content-Type: application/json" | jq
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "ubi_queries",
"_id": "6CrooY8BzxoaOvIPKtWj",
"_score": 1,
"_source": {
"query_response_id": "e4bdd289-0875-421f-bc34-aa71eb8e1cb3",
"user_query": "toner",
"query_id": "12300d16cb-b6f1-4012-93ebcc49cac90426",
"query_response_object_ids": [
"9"
],
"client_id": null,
"query": "{\"query\":{\"match\":{\"name\":{\"query\":\"toner\",\"operator\":\"OR\",\"prefix_length\":0,\"max_expansions\":50,\"fuzzy_transpositions\":true,\"lenient\":false,\"zero_terms_query\":\"NONE\",\"auto_generate_synonyms_phrase_query\":true,\"boost\":1.0}}},\"ext\":{\"query_id\":\"12300d16cb-b6f1-4012-93ebcc49cac90426\",\"user_query\":\"toner\",\"client_id\":null,\"object_id_field\":null,\"query_attributes\":{\"system\":\"my_system\",\"experiment\":\"exp_1\"}}}",
"timestamp": 1716408298072
}
}
]
}
```

Each indexed query will have the following fields:

* `query_response_id` - A unique identifier for the query response.
* `user_query`- Corresponds to the `user_query` in the query request.
* `query_id` - Corresponds to the `query_id` in the query request, or a random UUID if a `query_id` was not provided in the query request.
* `query_response_object_ids` - A list of the values of the `object_id_field` field in the document.
* `client_id` - Corresponds to the `client_id` in the query request.
* `query` - The raw query that was provided to OpenSearch.
* `timestamp` - The Unix timestamp when the query was indexed.

### Capturing Events

The UBI plugin does *not* provide a way to capture client-side events. Sending client-side events requires action on the client to send the events to OpenSearch for indexing.

The following is an example of a client-side event. [Additional examples are available.](https://github.com/o19s/ubi/tree/main/samples)

```
curl -s -X POST http://localhost:9200/ubi_events/_doc/ -H "Content-Type: application/json" -d'
{
"action_name": "page_exit",
"user_id": "1821196507152684",
"query_id": "00112233-4455-6677-8899-aabbccddeeff",
"session_id": "c3d22be7-6bdc-4250-91e1-fc8a92a9b1f9",
"page_id": "/docs/latest/",
"timestamp": "2024-05-16T12:34:56.789Z",
"message_type": "INFO",
"message": "On page /docs/latest/ for 3.35 seconds",
"event_attributes": {
"position":{},
"object": {
"idleTimeoutMs": 5000,
"currentIdleTimeMs": 250,
"checkIdleStateRateMs": 250,
"isUserCurrentlyOnPage": true,
"isUserCurrentlyIdle": false,
"currentPageName": "http://localhost:4000/docs/latest/",
"timeElapsedCallbacks": [],
"userLeftCallbacks": [],
"userReturnCallbacks": [],
"visibilityChangeEventName": "visibilitychange",
"hiddenPropName": "hidden"
}
}
}'
```

## Analyzing Queries and Client-Side Events

With the queries and client-side events we can use OpenSearch's SQL capability to analyze the data.

### Queries with zero results

We can identify queries with zero results by querying either the `ubi_queries` or `ubi_events` indexes as shown below. Both queries should return the same value.

```sql
select
count(0)
from ubi_queries
where query_response_object_ids is null
```

```sql
select
count(0)
from ubi_events
where action_name='on_search' and event_attributes.data.data_detail.query_data.query_response_object_ids is null
order by timestamp
```

### Most Common Client-Side Events

Find the most common client-side events:

```sql
select
action_name, count(0) Total
from ubi_events
group by action_name
order by Total desc
```

action_name|Total
|---|---|
on_search|3199
brand_filter|3112
button_click|3150
type_filter|3149
product_hover|3132
product_sort|3115
login|2458
logout|1499
new_user_entry|208

### Client-Side Events Associated ith Queries

All client-side events that are associated with a query should have the same `query_id`.

```sql
select
action_name, count(0) Total
from ubi_events
where query_id is not null
group by action_name
order by Total desc
```
action_name|Total
|---|---|
on_search|1329
brand_filter|669
button_click|648
product_hover|639
product_sort|625
type_filter|613
logout|408

## Development

If you find bugs or want to request a feature, please create [a new issue](https://github.com/o19s/opensearch-ubi/issues/new/choose). For questions or to discuss how UBI works, please find us in the [OpenSearch Slack](https://opensearch.org/slack.html) in the `#plugins` channel.

The plugin provides an implementation of an `ActionFilter` plugin that can capture and index queries, a `SearchExtBuilder` that provides the UBI parameters, and the object classes used to index the queries. Testing is done by YAML rest tests and unit tests.

### Building and Testing

The plugin can be built using Gradle:

```
./gradlew build
```

Build the OpenSearch docker image and add the plugin, then start the containers:
To test and debug, build the OpenSearch docker image that contains the built plugin and then start the containers:

```
docker compose build
docker compose up
docker compose build && docker compose up
```

See the [Quick Start](documentation/documentation.md#quick-start) section in the documentation for a complete example of how to initialize the plugin and store events and queries.
Or to start a three-node OpenSearch cluster:

## Releasing
```
docker compose build
docker compose -f docker-compose-cluster.yaml up
```

### Releasing
Run `tag-and-release.sh` to create and push a tag to GitHub. The tag will kick off the GitHub Action to build the release. After running `tag-and-release.sh`, update the version numbers in `gradle.properties` and the `Dockerfile` and commit the changes.

## License
This code is licensed under the Apache 2.0 License. See [LICENSE.txt](LICENSE.txt).

## Copyright
Copyright OpenSearch Contributors. See [NOTICE.txt](NOTICE.txt) for details.

Loading

0 comments on commit 4a3d7e9

Please sign in to comment.