Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add service cmd flag and custom labels, add process metric (process_gorup_count) add process custom labels #1194

Conversation

peekjef72
Copy link
Contributor

@peekjef72 peekjef72 commented Apr 22, 2023

replace previous PR PR#1180, PR#1185

This PR adds some feature to service collector: (see service)

  • add config parameter collector.service.services-list with a comma separated list of service names:
collector:
 service:
   services-list: windows_exporter, winRM, pushprox_client, Dhcp

the param will only be checked if services-where is not set.
the list will be used to build a service-where query based on Name and is equivalent to:

collector:
  service:
    services-where: "Name = 'elmt_1' or Name = "elmt_X' or ..."
  • add a new parameter that can only be set in config file:
    It allows to set any number of custom labels value for each service:
    e.g.:
collector:
  service:
    services:
      windows_exporter:
        application: prometheus
        custom1: val1
      pushprox_client:
        application: prometheus
        custom1: val1
      winRM:
        application: windows
        custom1: val2
      Dhcp:
        application: windows
        custom1: val3

Use case: allow to build a generic Prometheus alert on not running service and to use the specified associated labels to drive a specific behavior. For me they are used to route for a specific documentation based on context.

Label's names must be identical for each service. Not identical labels names are removed !
This parameter as a lower priority than service-where and service-list.
It accepts a dict (see above example) or a list; in this case it behaves like services-list.
e.g.:

collector:
  service:
    services:
      - dhcp
      - pushprox_client
      - windows_exporter
      - winRM

Then generic alert services:

groups:
- name: Window Server Alerts
  rules:

  # Sends an alert when the 'sqlserveragent' service is not in the running state for 3 minutes.
  - alert: WindowServiceNotRunning
    expr: windows_service_state{state="running"} == 0
    for: 3m
    labels:
      severity: high
    annotations:
      summary: "Service {{ $labels.name }} for {{ $labels.application }} down for 3 min."
      description: "Service {{ $labels.name }} for Application {{ $labels.application }} on instance {{ $labels.instance }} has been down for more than 3 minutes."

This PR adds some feature to process collector:
It allows to defined "process group" and to set any number of custom labels value for each group:
e.g.: (see process)

  collector:
    process:
      processes:
      browsers:
        include: "(?i)(firefox|chrome).*"
        exclude: "(?i)safari"
        application: browsers
        custom1: val4
      Visual Studio Code:
        include: "(?i)code.*"
        application: "vscode"
        custom1: val5

the custom labels will be added to each windows_process_metrics

# HELP windows_process_handles Total number of handles the process has open. This number is the sum of the handles currently open by each thread in the process.
# TYPE windows_process_handles gauge
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="11184"} 257
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="11588"} 256
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="12428"} 320
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="12536"} 333
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="12544"} 323
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="13620"} 346
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="16700"} 366
....
windows_process_handles{application="vscode",creating_process_id="10392",custom1="val5",process="Code",process_id="9172"} 2474
windows_process_handles{application="vscode",creating_process_id="22356",custom1="val5",process="Code",process_id="12076"} 182
windows_process_handles{application="vscode",creating_process_id="22356",custom1="val5",process="Code",process_id="13124"} 186
windows_process_handles{application="vscode",creating_process_id="22356",custom1="val5",process="Code",process_id="24212"} 247
windows_process_handles{application="vscode",creating_process_id="9172",custom1="val5",process="Code",process_id="10664"} 296
windows_process_handles{application="vscode",creating_process_id="9172",custom1="val5",process="Code",process_id="11028"} 229
windows_process_handles{application="vscode",creating_process_id="9172",custom1="val5",process="Code",process_id="13516"} 236

It also adds a metric named windows_process_group_count that count number of matching processes for each group:

# HELP windows_process_group_count Number of processes found for the matching patterns.
# TYPE windows_process_group_count gauge
windows_process_group_count{application="browsers",custom1="val4",group="browsers"} 29
windows_process_group_count{application="vscode",custom1="val5",group="Visual Studio Code"} 12

This metric should be used to define a generic alert if occurrence of the metric is equal to 0, instead of a specific alert absent(windows_process_handles{process="firefox"} == 1

@peekjef72 peekjef72 requested a review from a team as a code owner April 22, 2023 14:13
@jkroepke
Copy link
Member

Use case: allow to build a generic Prometheus alert on not running service and to use the specified associated labels to drive a specific behavior. For me they are used to route for a specific documentation based on context.

Why you are not using the relabel config for this?

metric_relabel_configs:
- source_labels: [__name__, name]
  regex: windows_service_state;windows_exporter
  target_label: custom1
  replacement: val1

@peekjef72
Copy link
Contributor Author

Use case: allow to build a generic Prometheus alert on not running service and to use the specified associated labels to drive a specific behavior. For me they are used to route for a specific documentation based on context.

Why you are not using the relabel config for this?

metric_relabel_configs:
- source_labels: [__name__, name]
  regex: windows_service_state;windows_exporter
  target_label: custom1
  replacement: val1

It is easier for us to provide specific configuration datas for each hosts we have: they are generated using ansible and deployed with winRM module.
We have more than 500 hosts, with specific "application" switchs; metrics relabel rules must to be specific for service, and host.
But you are right it is possible to do so!

@peekjef72 peekjef72 closed this Jun 18, 2023
@peekjef72 peekjef72 force-pushed the add_service_custom_labels_v2 branch 2 times, most recently from 11543a9 to 6ba0297 Compare June 18, 2023 09:10
@peekjef72
Copy link
Contributor Author

Can't understand why it was closed...

@peekjef72 peekjef72 reopened this Jun 18, 2023
@ordimans
Copy link

Can we do same thing for process ?
I have several windows exporter on several computer.
But i don't know how filter metrics for one server on grafana, because there is not hostname or custom label for all metrics.
I am wrong ?

@peekjef72
Copy link
Contributor Author

Can we do same thing for process ? I have several windows exporter on several computer. But i don't know how filter metrics for one server on grafana, because there is not hostname or custom label for all metrics. I am wrong ?

Of course it's possible with a little work, but it's useless unless the current PR is accepted!
And right now it doesn't, so it probably never will!
In addition, the base (master branch) has changed since the publication and requires a code review so that the PR can be accepted...

@peekjef72 peekjef72 closed this Oct 21, 2023
@peekjef72 peekjef72 force-pushed the add_service_custom_labels_v2 branch from a2274d0 to 6ba0297 Compare October 21, 2023 15:56
@peekjef72 peekjef72 reopened this Oct 23, 2023
@peekjef72 peekjef72 changed the title add service cmd flag and custom labels add service cmd flag and custom labels, add process metric (process_gorup_count) add process custom labels Oct 23, 2023
@peekjef72
Copy link
Contributor Author

Can we do same thing for process ? I have several windows exporter on several computer. But i don't know how filter metrics for one server on grafana, because there is not hostname or custom label for all metrics. I am wrong ?

Of course it's possible with a little work, but it's useless unless the current PR is accepted! And right now it doesn't, so it probably never will! In addition, the base (master branch) has changed since the publication and requires a code review so that the PR can be accepted...

I've rebased the code/branch for service collector.
I've added a part for process collector (see doc on dev branch).

Signed-off-by: Peekjef72 <[email protected]>
Signed-off-by: Peekjef72 <[email protected]>
@jkroepke
Copy link
Member

Hi, you can use the relabel feature of prometheus to set custom labels. It should be preferred over an custom logic in this exporter.

@peekjef72
Copy link
Contributor Author

Hi, you can use the relabel feature of prometheus to set custom labels. It should be preferred over an custom logic in this exporter.
You as you already mentioned, you are right it is possible to do so. But only if you have 10 hosts to collect or do everything by hand.

Now imagine having 3 thousand hosts! I no longer share your point of view.

Almost every server has its own configuration containing the processes and services to monitor. This configuration is contained in a file, generally generated by ansible from a CMDB (a windows_exporter file and a job part).
So we have two files for each host : one for windows_exporter and one for prometheus config

In the proposed solution, you would therefore need to have a Prometheus configuration file with a gigantic relabel rule for the job containing the labels to be positioned by host and by service evaluated at each scrape for each host!
With the PR solution, which remains optional, it is enough to position the labels once for the host concerned during its initial generation.

To sum up:

  • on the one hand it is necessary:
    • generate the configuration only once for the windows_exporter, for which the labels do not matter and do not interfere with scraping.
  • to add or remove a host, remove the file (file_sd_configs.files) from the Prometheus side
  • no evaluation, therefore additional CPU, therefore electricity (yes green IT you know ;)!)
  • the other
    • for each host, the global rule containing the metric_relabel_configs must be regenerated for the job
  • or have a job per host with a specific rule.
  • in all cases at each scrape the rules must be evaluated costing CPU and therefore electricity.(Modifié)Restaurer la traduction d'origine

@jkroepke
Copy link
Member

jkroepke commented Jan 28, 2024

Hi,

I understand your case.

In summarize grafana-agent is a better solution for you, since you could do relabeling on a local mache + you could eliminate your Prometheus configuration by enable the Remote Write Endpoint. Grafana Agent also has remote config capabilities which reduce the complexity on ansible to install only. That should work for 3000 hosts and that is what we do with approx. ~950 at the moment.

The main purpose of windows-exporter is gathering metrics. Adding additional business logic like the process group increases the complexity. Gathering metrics under windows is already complex enough, since there are a lot API (WMI, perfcounter, Registry) to obverse.

In any case, to continue here, merge conflicts needs to be resolved and custom labels and process group features should be separate pull requests.

@peekjef72
Copy link
Contributor Author

Thanks for the reply.

In any case, to continue here, merge conflicts needs to be resolved and custom labels and process group features should be separate pull requests.

I am not sure to continue the development of the branch if there is no chance that the additions will one day be taken into account.
so should I continue?

@github-actions github-actions bot added the Stale label Apr 28, 2024
@github-actions github-actions bot closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants