-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added github events to s3 operation #24
Added github events to s3 operation #24
Conversation
3a0b8d9
to
59f86e0
Compare
src/call/github-events-to-s3.ts
Outdated
const putObjectCommand = new PutObjectCommand({ | ||
Bucket: 'github-events', | ||
Body: JSON.stringify(context), | ||
Key: `${year}-${month}-${day}/${context.name}/${context.id}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please share a sample events and its s3 prefix ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sample event would look like:
{
name: 'issues',
id: '54518230-811c-11ef-8496-ec9c5edfd972',
payload: {
...
},
}
And the s3 prefix would be: s3://github-events/2024-10-08/issues/54518230-811c-11ef-8496-ec9c5edfd972
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1: With s3://github-events/2024-10-08/issues/54518230-811c-11ef-8496-ec9c5edfd972
Advantages:
- Time-based access: Storing events by date first makes it easier to query or list events for a specific day. If the use case is more focused on querying by date or creating reports based on time, this structure is beneficial.
- Scalability: Grouping by date reduces the number of items in a single folder, which can improve performance with large datasets.
Disadvantages:
If we need to query all events of a particular type (e.g., issues), this format makes it a little more difficult because the event type (issues) comes after the date.
Option 2: with s3://github-events/issues/2024-10-08/54518230-811c-11ef-8496-ec9c5edfd972
Advantages:
- Event type-based access: Storing by event type first makes it easier to query all events of a certain type (e.g., all issues), which is helpful if looking to group events based on their type.
- Organized by type: If we handle multiple event types like issues, pull_requests, etc., this format helps you keep them better organized.
Disadvantages:
Querying or grouping based on date would be slightly less efficient since it's nested under the event type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the PR to match option 2
Also do we need this |
We don't need it for the current design but I'm thinking of removing it after the Maintainer Dashboard is up in case. |
Signed-off-by: Brandon Shien <[email protected]>
59f86e0
to
5dad1fc
Compare
name: Github Events To S3 | ||
|
||
events: | ||
- all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is all events too much at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are looking to create a data lake so the more the better, check out the discussion following from here: opensearch-project/opensearch-metrics#76 (comment)
991e13c
to
725ed68
Compare
Signed-off-by: Brandon Shien <[email protected]>
725ed68
to
d647e3d
Compare
Description
Coming from opensearch-project/opensearch-metrics#76
Created operation to upload all GitHub Events listened as JSON to an S3 Bucket named
github-events
Issues Resolved
Part of opensearch-project/opensearch-metrics#57
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.