Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added github events to s3 operation #24

Merged
merged 2 commits into from
Oct 8, 2024

Conversation

bshien
Copy link
Contributor

@bshien bshien commented Oct 8, 2024

Description

Coming from opensearch-project/opensearch-metrics#76
Created operation to upload all GitHub Events listened as JSON to an S3 Bucket named github-events

Issues Resolved

Part of opensearch-project/opensearch-metrics#57

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

const putObjectCommand = new PutObjectCommand({
Bucket: 'github-events',
Body: JSON.stringify(context),
Key: `${year}-${month}-${day}/${context.name}/${context.id}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please share a sample events and its s3 prefix ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sample event would look like:
{
name: 'issues',
id: '54518230-811c-11ef-8496-ec9c5edfd972',
payload: {
...
},
}

And the s3 prefix would be: s3://github-events/2024-10-08/issues/54518230-811c-11ef-8496-ec9c5edfd972

Copy link
Member

@prudhvigodithi prudhvigodithi Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1: With s3://github-events/2024-10-08/issues/54518230-811c-11ef-8496-ec9c5edfd972

Advantages:

  • Time-based access: Storing events by date first makes it easier to query or list events for a specific day. If the use case is more focused on querying by date or creating reports based on time, this structure is beneficial.
  • Scalability: Grouping by date reduces the number of items in a single folder, which can improve performance with large datasets.

Disadvantages:

If we need to query all events of a particular type (e.g., issues), this format makes it a little more difficult because the event type (issues) comes after the date.

Option 2: with s3://github-events/issues/2024-10-08/54518230-811c-11ef-8496-ec9c5edfd972

Advantages:

  • Event type-based access: Storing by event type first makes it easier to query all events of a certain type (e.g., all issues), which is helpful if looking to group events based on their type.
  • Organized by type: If we handle multiple event types like issues, pull_requests, etc., this format helps you keep them better organized.

Disadvantages:

Querying or grouping based on date would be slightly less efficient since it's nested under the event type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the PR to match option 2

@prudhvigodithi
Copy link
Member

Also do we need this github-activity-events-monitor.ts moving forward ?

@bshien
Copy link
Contributor Author

bshien commented Oct 8, 2024

Also do we need this github-activity-events-monitor.ts moving forward ?

We don't need it for the current design but I'm thinking of removing it after the Maintainer Dashboard is up in case.

@bshien bshien force-pushed the github-event-sink branch from 59f86e0 to 5dad1fc Compare October 8, 2024 18:45
name: Github Events To S3

events:
- all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all events too much at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are looking to create a data lake so the more the better, check out the discussion following from here: opensearch-project/opensearch-metrics#76 (comment)

@bshien bshien force-pushed the github-event-sink branch from 725ed68 to d647e3d Compare October 8, 2024 21:14
@prudhvigodithi prudhvigodithi merged commit abfeaa1 into opensearch-project:main Oct 8, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants