Skip to content
This repository has been archived by the owner on Mar 11, 2022. It is now read-only.

Add the ability to hardcode values #5

Open
webmat opened this issue Apr 8, 2020 · 4 comments
Open

Add the ability to hardcode values #5

webmat opened this issue Apr 8, 2020 · 4 comments

Comments

@webmat
Copy link
Contributor

webmat commented Apr 8, 2020

Some fields need to be hardcoded per source.

Note that since ecs-mapper doesn't support complex logic (no conditionals), I don't expect this to be used to populate all categorization fields. But it's still very common that a single source log will only ever map to one event.type or that we'll be able to hardcode event.dataset or event.module with it.

@webmat
Copy link
Contributor Author

webmat commented Apr 10, 2020

Note @tonymeehan that here we could simply add support for one more column, perhaps named "static_value".

Then valid lines would no longer be only the ones with both "source_field" and "destination_field":

source_field static_value destination_field outcome
present present valid
present present valid
present present * error
present skipped
present skipped
present skipped

@tonymeehan
Copy link
Contributor

tonymeehan commented Apr 10, 2020

I like the suggestion. I'm thinking about two things.

First, should static_value be on the right of destination_field? In most cases, users will likely be mapping fields instead of setting static values, so it reads a bit easier I think if it's on the right.

I also think there's another error case where all three columns are present since it's ambiguous what to do.

source_field destination_field static_value outcome
present present valid
present present valid
present present error
present present present error
present skipped
present skipped
present skipped

The second thing I'm thinking of is how to handle the static value. I'm thinking this could work:

source_field destination_field static_value outcome
present present "static value" valid
present present [ "static value", "static value 2" ] valid
present present "static value error
present present [ "static value, "static value 2" ] error
present present [ , "static value 2" ] error
present present [ "static value", "static value 2" error

@webmat
Copy link
Contributor Author

webmat commented Apr 10, 2020

Well the order of the columns doesn't matter for the tool. Users are even free to have all of the columns they want, for additional notes of any kind. Only the KNOWN_CSV_HEADERS
are read.

The order we put the columns in the sample spreadsheet can still be adjusted for clarity. It's true that most lines will be meant to handle a source_field => destination_field conversion, and only very few are expected to hardcode.

But I think of the flow of data from left to right:

source_field => format_action => destination_field

And now

static_value => destination_field

So I thought these columns would make sense:

source_field, format_action, static_value, destination_field, copy_action

We can reinforce proper usage by improving the example section, in the example/ directory, too. Give a concrete example that takes all of this thinking into account

@webmat
Copy link
Contributor Author

webmat commented Apr 14, 2020

Looping back on this, I hadn't thought about capturing single values vs arrays of values, when users enter static values. Is this what you're describing with the square brackets and double quotes?

Here we'll need to find something that's really intuitive from the spreadsheet's POV. Then we'll need to look at how the major spreadsheets * manage the encoding to CSV. I could see them getting the details wrong, when we start adding quotes & stuff.

I'm tempted to say let's start with single values and not worry with arrays. Arrays are important for categorization with event.category and event.type. However I don't think ecs-mapper should support conditionals. And I think in most cases a given event stream will contain more than one event category, and different event types. In other words, I don't think users will be able to populate categorization fields properly, from this spreadsheet / CSV. This more fine grained identification of events will have to happen in their actual pipeline, not in this starter tool.

* Those I would consider: Excel, Google Docs, Apple Numbers

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants