Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP-DEVOPS-7488: update node-exporter #8

Open
wants to merge 870 commits into
base: master
Choose a base branch
from

Conversation

kirandark
Copy link

No description provided.

SuperQ and others added 30 commits September 4, 2020 11:15
Fix capitalization of CPU acronym throughout
…etheus#1835)

* Add: configure 2 thresholds for NodeFilesystemAlmostOutOfSpace alert

Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>
* Bump Go modules to latest.
* Update to Go 1.15.
* Remove obsolete darwin/386 build.

Signed-off-by: Ben Kochie <[email protected]>
…#1810)

* Expose metric for state=check for node_md_state
* Added new e2e output fixture including md201 which is in checking state and a the new state=check labeled metric for all other md

Signed-off-by: Christian Rohmann <[email protected]>
We've gathered enough evidence that the CPU counter bug workaround is
working as intended. Downgrade the message from Warning to Debug.

Signed-off-by: Ben Kochie <[email protected]>
docs/node-mixin/alerts: use ratio for network alerts
This should be the way forward when importing libraries in jsonnet. It's
closer to how Go imports look and makes it more obvious where packages
live.

This is not breaking anything, as the old imports were already symlinks
to the now directly used directories.

Signed-off-by: Matthias Loibl <[email protected]>
…lute-import-paths

Use absolute jsonnet import paths
Create a metric node_zfs_zpool_state.

Signed-off-by: Artur Molchanov <[email protected]>
this change fixes the logging message for the filesystem ignored-fs-types flag to output the flag instead of the mountpoints flag.

Signed-off-by: xinau <[email protected]>
Signed-off-by: Trey Dockendorf <[email protected]>
I have rewritten all CGO dependencies for OpenBSD amd64
using pure go, be able to crosscompile node_exporter.

Signed-off-by: ston1th <[email protected]>
Signed-off-by: ston1th <[email protected]>
Signed-off-by: ston1th <[email protected]>
new type: `netDevStats map[string]map[string]uint64`

Signed-off-by: ston1th <[email protected]>
The txt was changed to rst:

    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/accounting/psi.rst

But it's probably better to link to the rendered docs, since the link
should be more stable.

Signed-off-by: Louis Taylor <[email protected]>
* Modest doc improvements

Signed-off-by: Anthony D'Atri <[email protected]>
Move end-user install instructions to the top of the README.
* Add a Docker Compose example.
* Improve some wording.
* Link to the Cloud Alchemy Ansible role.
* Update to git clone method for dev/building

Signed-off-by: Ben Kochie <[email protected]>
Signed-off-by: kamijin_fanta <[email protected]>
jordy1024 and others added 23 commits October 24, 2021 12:48
Use `time.NewTimer()` and explicit `Stop()` to avoid memory bloat / GC problems with `time.After()` in the Linux filesystem collector timeout handling.

Signed-off-by: bawenmao <[email protected]>
Upstream is replacing `golint` with `revive`.
* Cleanup unused mixin go files.

Signed-off-by: Ben Kochie <[email protected]>
* Exclude mountpoints under /run/credentials

Signed-off-by: ml <[email protected]>
Add a DMI collector to expose the Desktop Management Interface (DMI)
info from `/sys/class/dmi/id/`. This will expose information about the
BIOS, mainboard, chassis, and product.

Closes: prometheus#303
Signed-off-by: Benjamin Drung <[email protected]>
* feat: new collector about thermal conditions on macos

Signed-off-by: STRRL <[email protected]>
* Extract powersupply linux code from collector common file.
* Add Darwin powersupply collector.

Signed-off-by: Alessio Caiazza <[email protected]>
The master branch of `ethtool` merged the fix for
safchain/ethtool#39

Signed-off-by: Benjamin Drung <[email protected]>
Disable `collector/zfs_linux_test.go` in case `!nozfs` is set to
completely disable ZFS.

Signed-off-by: Benjamin Drung <[email protected]>
Add test case for ethtool metrics with leading spaces reported in prometheus#2185:

```
$ ethtool -S
NIC statistics:
     Tx Queue#: 0
       TSO pkts tx: 0
       TSO bytes tx: 0
       ucast pkts tx: 20487
       ucast bytes tx: 1908107
       mcast pkts tx: 83
       mcast bytes tx: 5906
       bcast pkts tx: 4
       bcast bytes tx: 168
       pkts tx err: 0
       pkts tx discard: 0
       drv dropped tx total: 0
          too many frags: 0
          giant hdr: 0
          hdr err: 0
          tso: 0
       ring full: 0
       pkts linearized: 0
       hdr cloned: 0
       giant hdr: 0
     Rx Queue#: 0
       LRO pkts rx: 0
       LRO byte rx: 0
       ucast pkts rx: 25086
       ucast bytes rx: 2404103
       mcast pkts rx: 0
       mcast bytes rx: 0
       bcast pkts rx: 0
       bcast bytes rx: 0
       pkts rx OOB: 0
       pkts rx err: 0
       drv dropped rx total: 0
          err: 0
          fcs: 0
       rx buf alloc fail: 0
     tx timeout count: 0
```

Bug: prometheus#2185
Signed-off-by: Benjamin Drung <[email protected]>
'iowait' and 'steal' indicate specific idle/wait states, which shouldn't
be counted into CPU Utilisation. Also see
prometheus-operator/kube-prometheus#796 and
kubernetes-monitoring/kubernetes-mixin#667.

Per the iostat man page:

%idle
    Show the percentage of time that the CPU or CPUs were idle and the
    system did not have an outstanding disk I/O request.

%iowait
     Show the percentage of time that the CPU or CPUs were idle during
     which the system had an outstanding disk I/O request.

%steal
     Show the percentage of time spent in involuntary wait by the
     virtual CPU or CPUs while the hypervisor was servicing another
     virtual processor.

Signed-off-by: Julian Wiedmann <[email protected]>
LLVM/Clang 11.0 adds a `-Wundef-prefix=TARGET_OS_` build flag which
breaks this build flag.

Signed-off-by: Ben Kochie <[email protected]>
* Add clocksource metrics to time collector

This closes prometheus#1336

Signed-off-by: Johannes 'fish' Ziemke <[email protected]>
Use SysctlTimeval from the golang.org/x/sys/unix package to
simplify the implementation of the boottime collector for the BSDs and
allows to build it without cgo.

Tested on macOS 11.6, FreeBSD 13 and OpenBSD 7.

Signed-off-by: Tobias Klauser <[email protected]>
Sanitizing the metric names can lead to duplicate metric names:

```
caller=level.go:63 level=error caller="error gathering metrics: [from Gatherer #2] collected metric \"node_ethtool_giant_hdr\" { label:<name:\"device\" value:\"ens192\" > untyped:<value:0" msg=" > } was collected before with the same name and label values"
```

Generate a map from the sanitized metric names to the metric names from
ethtool. In case of duplicate sanitized metric names drop both metrics,
because it is unknown which one to take.

Fixes: prometheus#2185
Signed-off-by: Benjamin Drung <[email protected]>
ethtool got its first release.

Signed-off-by: Benjamin Drung <[email protected]>
The new `lnstat` collector produces a high number of metrics, per-cpu,
and results in approximately double the number of metrics previously
scraped. For example, a typical server with 64 cores produces 3832
lnstat metrics compared to 4147 metrics for the remaining collectors.

Therefore disable the `lnstat` collector by default.

Signed-off-by: Benjamin Drung <[email protected]>
TCP timeouts count is a useful signal to show
abnormal network performance and is another
signal to aid debugging. This metric can be
used to generate proactive alerts for host
network namespace workloads.

Signed-off-by: Martin Kennelly <[email protected]>
NOTE: In order to support globs in the textfile collector path, filenames exposed by
      `node_textfile_mtime_seconds` now contain the full path name.

* [CHANGE] Add path label to rapl collector prometheus#2146
* [CHANGE] Exclude filesystems under /run/credentials prometheus#2157
* [FEATURE] Add lnstat collector for metrics from  /proc/net/stat/ prometheus#1771
* [FEATURE] Add darwin powersupply collector prometheus#1777
* [FEATURE] Add support for monitoring GPUs on Linux prometheus#1998
* [FEATURE] Add Darwin thermal collector prometheus#2032
* [FEATURE] Add os release collector prometheus#2094
* [FEATURE] Add netdev.address-info collector prometheus#2105
* [ENHANCEMENT] Support glob textfile collector directories prometheus#1985
* [ENHANCEMENT] ethtool: Expose node_ethtool_info metric prometheus#2080
* [ENHANCEMENT] Use include/exclude flags for ethtool filtering prometheus#2165
* [ENHANCEMENT] Add flag to disable guest CPU metrics prometheus#2123
* [ENHANCEMENT] Add DMI collector prometheus#2131
* [ENHANCEMENT] Add threads metrics to processes collector prometheus#2164
* [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector prometheus#2169
* [BUGFIX] ethtool: Sanitize metric names prometheus#2093
* [BUGFIX] Fix ethtool collector for multiple interfaces prometheus#2126
* [BUGFIX] Fix possible panic on macOS prometheus#2133
* [BUGFIX] Collect flag_info and bug_info only for one core prometheus#2156

Signed-off-by: Ben Kochie <[email protected]>
@kirandark kirandark requested a review from lbaklan-af January 10, 2022 14:36
@kirandark kirandark force-pushed the DEVOPS-7488-update-node-exporter branch from 6373ef9 to a141dca Compare January 10, 2022 14:43
@kirandark kirandark force-pushed the DEVOPS-7488-update-node-exporter branch 2 times, most recently from 5dec9a3 to baaaf91 Compare January 11, 2022 14:49
@kirandark kirandark force-pushed the DEVOPS-7488-update-node-exporter branch from baaaf91 to 78d7730 Compare January 12, 2022 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.