Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field validator should check for ECS fields marked as arrays #615

Closed
Tracked by #399
andrewkroh opened this issue Dec 8, 2021 · 6 comments
Closed
Tracked by #399

Field validator should check for ECS fields marked as arrays #615

andrewkroh opened this issue Dec 8, 2021 · 6 comments
Labels
enhancement New feature or request Team:Ecosystem Label for the Packages Ecosystem team

Comments

@andrewkroh
Copy link
Member

andrewkroh commented Dec 8, 2021

It would be helpful if elastic-package could trigger errors on ECS event fields that are not arrays. For example, host.ip is specified to be an array so events that contain the field should be an array of ips.

Screen Shot 2021-12-08 at 8 49 04 AM

The generated YAML specification indicates the array type. https://github.com/elastic/ecs/blob/9b463c9116eb594a3be10db01c3236fd6b44b996/generated/ecs/ecs_flat.yml#L4536-L4543

host.ip:
  dashed_name: host-ip
  description: Host ip addresses.
  flat_name: host.ip
  level: core
  name: ip
  normalize:
  - array
  short: Host ip addresses.
  type: ip
@andrewkroh andrewkroh added the enhancement New feature or request label Dec 8, 2021
@andrewkroh
Copy link
Member Author

Another validation that would be nice to apply is for the allowed_values listed in the schema for fields like event.{kind,type,category}.

https://github.com/elastic/ecs/blob/4cbe6f7f9cb78160153659802eb66dd135fb4677/schemas/event.yml#L75-L76

@andrewkroh
Copy link
Member Author

pattern has now been added to the ECS schema. See elastic/ecs#1741 elastic/ecs#1834.

So the elastic-package tool could now check data against the pattern defined in ECS for a given field.

@jsoriano
Copy link
Member

Hey @andrewkroh,

I am implementing the check for arrays in #765, and I have found that pipeline tests of several packages in the integrations repository (35 out of 140) would fail if we enable this. So I wonder if we should do it.

What would be the advantage of checking that fields are formatted as arrays? In principle for Elasticsearch it is the same.

This was referenced Mar 31, 2022
@andrewkroh
Copy link
Member Author

What would be the advantage of checking that fields are formatted as arrays? In principle for Elasticsearch it is the same.

You're right, for Elasticsearch indexing it doesn't make a difference whether the field is a scalar or an array. ECS added an array indicator to the schema primarily for things other than Elasticsearch that might be producing or consuming the data.

I think it would be advantageous if the _source data in events that claim to use ECS fields actually matches the field definition. That consistency makes it easier for anything that's trying to consume the ECS data. That includes the developers of ingest pipelines as they move between various integrations. And it's one less thing I need to spend time manually inspecting during integration PR reviews.

One of the earliest issues I recall while implementing ECS and having consumers in Kibana was with TypeScipt code in Kibana that was generated/derived from ECS. There was a mismatch between the _source data and the generated types due to arrays. That code now handles both IIRC, but it would nice if we were consistent with our types so future developers can write simpler code to read/process ECS data.

(35 out of 140) would fail if we enable this

Is there a way we can enable this in permissive manner while we fix those 35? It's kind of a general problem for any case where we enable stricter validation (like checking that event.* is defined in fields).

@jsoriano
Copy link
Member

jsoriano commented Apr 6, 2022

That consistency makes it easier for anything that's trying to consume the ECS data.

I was wondering if Elasticsearch provides something to help here, and I have found that there is a meta issue about improving the situation for single vs multi-valued fields. elastic/elasticsearch#80825

Is there a way we can enable this in permissive manner while we fix those 35? It's kind of a general problem for any case where we enable stricter validation (like checking that event.* is defined in fields).

There isn't at the moment, but this is something that has already appeared in other discussions. Marcin recently open an issue about different validation modes (elastic/package-spec#313).

I will put this issue on hold till we have something like this.

@jsoriano
Copy link
Member

jsoriano commented Sep 7, 2022

Implemented as part of package spec 2.0.0. Packages will be able to fix normalization issues as they upgrade to this format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Ecosystem Label for the Packages Ecosystem team
Projects
None yet
Development

No branches or pull requests

3 participants