Skip to content

Commit

Permalink
[MongoDB Atlas] Disk data stream (#10555)
Browse files Browse the repository at this point in the history
* add disk data stream

* update readme

* update readme

* address review coments

* address review comments

* address review comments

* address review comments
  • Loading branch information
niraj-elastic authored Oct 30, 2024
1 parent 881c579 commit e642c8e
Show file tree
Hide file tree
Showing 25 changed files with 2,210 additions and 56 deletions.
23 changes: 19 additions & 4 deletions packages/mongodb_atlas/_dev/build/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

Use the MongoDB Atlas integration to:

- Collect MongoDB Atlas alert, mongod audit, mongod database, organization, and project logs, along with hardware and process metrics for comprehensive monitoring and analysis.
- Collect MongoDB Atlas alert, mongod audit, mongod database, organization, and project logs, along with disk, hardware and process metrics for comprehensive monitoring and analysis.
- Create informative visualizations to track usage trends, measure key metrics, and derive actionable business insights.
- Set up alerts to minimize Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) by quickly referencing relevant logs during troubleshooting.

Expand All @@ -16,10 +16,11 @@ The MongoDB Atlas integration collects logs and metrics.

Logs help you keep a record of events that happen on your machine. The `Log` data stream collected by MongoDB Atlas integration are `alert`, `mongod_audit`, `mongod_database`, `organization`, and `project`.

Metrics give you insight into the statistics of the MongoDB Atlas. The `Metric` data stream collected by the MongoDB Atlas integration are `process` and `hardware` so that the user can monitor and troubleshoot the performance of the MongoDB Atlas instance.
Metrics give you insight into the statistics of the MongoDB Atlas. The `Metric` data stream collected by the MongoDB Atlas integration are `disk`, `hardware`, and `process` so that the user can monitor and troubleshoot the performance of the MongoDB Atlas instance.

Data streams:
- `alert`: This data stream collects alerts generated by the MongoDB Atlas. Alerts cover a wide range of metrics and events, such as resource utilization thresholds (CPU, memory, disk space), database operations, security issues, and configuration changes.
- `disk`: This data stream collects disk or partition metrics for all the hosts in the specified group. Metrics like measurements for the disk, such as I/O operations, read and write latency, and space usage.
- `hardware`: This data stream collects all the Atlas search hardware and status data series within the provided time range for one process in the specified project.
- `mongod_audit`: The auditing facility allows administrators and users to track system activity for deployments with multiple users and applications. Mongod Audit logs capture events related to database operations such as insertions, updates, deletions, user authentication, etc., occurring within the mongod instances.
- `mongod_database`: This data stream collects a running log of events, including entries such as incoming connections, commands run, and issues encountered. Generally, database log messages are useful for diagnosing issues, monitoring your deployment, and tuning performance.
Expand Down Expand Up @@ -149,8 +150,21 @@ Please refer to the following [document](https://www.elastic.co/guide/en/ecs/cur

## Metrics reference

### Disk

This is the `disk` data stream. This data stream collects a detailed overview of disk usage and captures important data about I/O operations, read and write latency, and space utilization. To collect disk metrics, the API Key making the request must have the `Project Read Only` role.

{{event "disk"}}

**ECS Field Reference**

Please refer to the following [document](https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html) for detailed information on ECS fields.

{{fields "disk"}}

### Hardware
This data stream collects hardware and status metrics for each process in the specified group. It includes measurements such as CPU usage, memory consumption, JVM memory usage, disk usage, etc.

This is the `hardware` data stream. This data stream collects hardware and status metrics for each process in the specified group. It includes measurements such as CPU usage, memory consumption, JVM memory usage, disk usage, etc. To collect hardware metrics, the requesting API Key must have the `Project Read Only` or higher role.

{{event "hardware"}}

Expand All @@ -161,7 +175,8 @@ Please refer to the following [document](https://www.elastic.co/guide/en/ecs/cur
{{fields "hardware"}}

### Process
This data stream collects host metrics per process for all the hosts of the specified group. Metrics like measurements for the host, such as CPU usage, number of I/O operations and memory are available on this data stream. To collect process metrics, the requesting API Key must have the `Project Read Only` role.

This is the `process` data stream. This data stream collects host metrics per process for all the hosts of the specified group. Metrics like measurements for the host, such as CPU usage, number of I/O operations and memory are available on this data stream. To collect process metrics, the requesting API Key must have the `Project Read Only` role.

{{event "process"}}

Expand Down
Binary file modified packages/mongodb_atlas/_dev/deploy/docker/mongodb_atlas/test
Binary file not shown.
7 changes: 6 additions & 1 deletion packages/mongodb_atlas/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "0.0.9"
changes:
- description: MongoDB Atlas integration package with "disk" data stream.
type: enhancement
link: https://github.com/elastic/integrations/pull/10555
- version: "0.0.8"
changes:
- description: MongoDB Atlas integration package with "alert" data stream.
Expand All @@ -21,7 +26,7 @@
link: https://github.com/elastic/integrations/pull/9754
- version: "0.0.4"
changes:
- description: Add "hardware" data stream to MongoDB Atlas package.
- description: MongoDB Atlas integration package with "hardware" data stream.
type: enhancement
link: https://github.com/elastic/integrations/pull/9689
- version: "0.0.3"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
dynamic_fields:
"event.ingested": ".*"
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"events": [
{
"hostId": "host-1",
"groupId": "group-1",
"response": {
"MAX_DISK_PARTITION_IOPS_TOTAL": 0.2138048,
"DISK_PARTITION_IOPS_READ": 0.33244343545,
"MAX_DISK_PARTITION_IOPS_READ": 0.3422309,
"DISK_PARTITION_IOPS_TOTAL": 7.38912,
"DISK_PARTITION_IOPS_WRITE": 5.2244123,
"MAX_DISK_PARTITION_IOPS_WRITE": 44.125672,
"MAX_DISK_PARTITION_LATENCY_READ": 1.321539,
"DISK_PARTITION_LATENCY_READ": 1.500023,
"MAX_DISK_PARTITION_LATENCY_WRITE": 1.2394742,
"DISK_PARTITION_LATENCY_WRITE": 0.5678899,
"DISK_PARTITION_SPACE_FREE": 6.33455633344,
"MAX_DISK_PARTITION_SPACE_FREE": 6.182233244,
"MAX_DISK_PARTITION_SPACE_PERCENT_FREE": 73.2324,
"DISK_PARTITION_SPACE_PERCENT_FREE": 71.23445789,
"DISK_PARTITION_SPACE_USED": 1.27898776,
"MAX_DISK_PARTITION_SPACE_USED": 25.127876554,
"MAX_DISK_PARTITION_SPACE_PERCENT_USED": 25.197166,
"DISK_PARTITION_SPACE_PERCENT_USED": 25.23388449
},
"partitionName": "data",
"processId": "hostname-1"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
{
"expected": [
{
"ecs": {
"version": "8.11.0"
},
"event": {
"category": [
"database"
],
"kind": "metric",
"module": "mongodb_atlas",
"type": [
"info"
]
},
"group": {
"id": "group-1"
},
"mongodb_atlas": {
"disk": {
"read": {
"iops": {
"max": {
"throughput": 0.3422309
},
"throughput": 0.33244343545
},
"latency": {
"max": {
"ms": 1.321539
},
"ms": 1.500023
}
},
"space": {
"free": {
"bytes": 6.33455633344,
"max": {
"bytes": 6.182233244,
"pct": 73.2324
},
"pct": 71.23445789
},
"used": {
"bytes": 1.27898776,
"max": {
"bytes": 25.127876554,
"pct": 25.197166
},
"pct": 25.23388449
}
},
"total": {
"iops": {
"max": {
"throughput": 0.2138048
},
"throughput": 7.38912
}
},
"write": {
"iops": {
"max": {
"throughput": 44.125672
},
"throughput": 5.2244123
},
"latency": {
"max": {
"ms": 1.2394742
},
"ms": 0.5678899
}
}
},
"host_id": "host-1",
"partition_name": "data",
"process_id": "hostname-1"
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
vars:
url: http://{{Hostname}}:{{Port}}
public_key: admin
private_key: MongoDB@123
data_stream:
vars:
groupId: mongodb-group1
input: cel
service: mongodbatlas
166 changes: 166 additions & 0 deletions packages/mongodb_atlas/data_stream/disk/agent/stream/input.yml.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
config_version: 2
interval: {{period}}
{{#if enable_request_tracer}}
resource.tracer.filename: "../../logs/cel/http-request-trace-*.ndjson"
{{/if}}
{{#if ssl}}
resource.ssl: {{ssl}}
{{/if}}
{{#if http_client_timeout}}
resource.timeout: {{http_client_timeout}}
{{/if}}
tags:
{{#if preserve_original_event}}
- preserve_original_event
{{/if}}
{{#each tags as |tag|}}
- {{tag}}
{{/each}}
{{#contains "forwarded" tags}}
publisher_pipeline.disable_host: true
{{/contains}}
{{#if processors}}
processors:
{{processors}}
{{/if}}
auth.digest:
user: {{public_key}}
password: {{private_key}}
resource.url: {{url}}
state:
group_id: {{groupId}}
want_more: false
page_num: 1
disk_page_num: 1
query: /measurements?granularity=PT{{period}}&period=PT{{period}}
redact:
fields: ~
program: |
(
(has(state.host_list) && size(state.host_list) > 0) ?
state
:
state.with(
request(
"GET",
state.url.trim_right("/") + "/api/atlas/v2/groups/" + state.group_id + "/processes?" + {
"pageNum": [string(state.page_num)],
"itemsPerPage": ["100"],
}.format_query()
).with({
"Header": {
"Accept": ["application/vnd.atlas." + string(timestamp(now).getFullYear()) + "-01-01+json"],
},
}).do_request().as(resp, (resp.StatusCode == 200) ?
bytes(resp.Body).decode_json().as(body,
{
"host_list": body.results.map(e, state.url.trim_right("/") + "/api/atlas/v2/groups/" + state.group_id + "/processes/" + e.id + "/disks/"),
"next": 0,
"page_num": body.links.exists_one(res, res.rel == "next") ? (int(state.page_num) + 1) : 1,
})
:
{
"events": {
"error": {
"code": string(resp.StatusCode),
"id": string(resp.Status),
"message": "GET:" +
(
(size(resp.Body) != 0) ?
string(resp.Body)
:
string(resp.Status) + " (" + string(resp.StatusCode) + ")"
),
},
},
"want_more": false,
}
)
)
).as(state, (state.next >= size(state.host_list)) ? {} :
(
(has(state.disk_list) && size(state.disk_list) > 0) ?
state
:
state.with(
request("GET", string(state.host_list[state.next] + "?pageNum=" + string(state.disk_page_num) + "&itemsPerPage=100"))
.with({
"Header": {
"Accept": ["application/vnd.atlas." + string(timestamp(now).getFullYear()) + "-01-01+json"],
},
}).do_request().as(resp, (resp.StatusCode == 200) ?
bytes(resp.Body).decode_json().as(body,
{
"disk_list": body.results.map(e, e.partitionName),
"disk_next": 0,
"disk_page_num": body.links.exists_one(res, res.rel == "next") ? (int(state.disk_page_num) + 1) : 1,
}
)
:
{
"events": {
"error": {
"code": string(resp.StatusCode),
"id": string(resp.Status),
"message": "GET:" +
(
(size(resp.Body) != 0) ?
string(resp.Body)
:
string(resp.Status) + " (" + string(resp.StatusCode) + ")"
),
},
},
"want_more": false,
}
)
)
).as(state, (state.disk_next >= size(state.disk_list)) ? {} :
request("GET", string(state.host_list[state.next] + state.disk_list[state.disk_next] + state.query))
.with({
"Header": {
"Accept": ["application/vnd.atlas." + string(timestamp(now).getFullYear()) + "-01-01+json"],
},
}).do_request().as(res, (res.StatusCode == 200) ?
{
"events": bytes(res.Body).decode_json().as(f,
f.with(
{
"response": zip(
// Combining measurement names and actual values of measurement to generate `key : value` pairs.
f.measurements.map(m, m.name),
f.measurements.map(m, m.dataPoints.map(d, d.value).as(v, (size(v) == 0) ? null : (v[0])))
),
}
).drop(["measurements", "links"])
),
"disk_list": (int(state.disk_next) + 1 < size(state.disk_list)) ? state.disk_list : [],
"disk_next": (int(state.disk_next) + 1 < size(state.disk_list)) ? (int(state.disk_next) + 1) : 0,
"disk_page_num": state.disk_page_num,
"host_list": (int(state.next) + 1 >= size(state.host_list) && int(state.disk_page_num) == 1 && int(state.disk_next) + 1 >= size(state.disk_list)) ? [] : state.host_list,
"next": (int(state.disk_next) + 1 >= size(state.disk_list) && int(state.disk_page_num) == 1 && int(state.next) + 1 < size(state.host_list)) ? (int(state.next) + 1) : int(state.next),
"want_more": int(state.next) + 1 < size(state.host_list) || int(state.page_num) != 1 || int(state.disk_next) + 1 < size(state.disk_list) || int(state.disk_page_num) != 1,
"page_num": state.page_num,
"group_id": state.group_id,
"query": state.query,
}
:
{
"events": {
"error": {
"code": string(res.StatusCode),
"id": string(res.Status),
"message": "GET:" +
(
(size(res.Body) != 0) ?
string(res.Body)
:
string(res.Status) + " (" + string(res.StatusCode) + ")"
),
},
},
"want_more": false,
}
)
)
)
Loading

0 comments on commit e642c8e

Please sign in to comment.