Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribution cron job #70

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion packages/api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,16 @@
"@americanairlines/simple-env": "^1.0.4",
"@mikro-orm/core": "^4.5.9",
"@mikro-orm/postgresql": "^4.5.9",
"@types/cron": "^1.7.3",
"@types/express-session": "^1.17.4",
"@x/web": "*",
"cookie-parser": "^1.4.5",
"cron": "^1.8.2",
"dotenv-flow": "^3.2.0",
"express": "^4.17.1",
"fetch-mock-jest": "^1.5.1",
"express-session": "^1.17.2",
"fetch-mock-jest": "^1.5.1",
"node-cron": "^3.0.0",
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved
"passport": "^0.5.0",
"passport-github2": "^0.1.12",
"readline-sync": "^1.4.10",
Expand Down
1 change: 1 addition & 0 deletions packages/api/src/__mocks__/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ export const env = {
port: '3000',
githubId: 'mock-client-id',
githubSecretId: 'Secrete_secret',
githubToken: 'mock-token',
discordClientId: 'mock-discord-id',
discordSecretId: 'discord-secret',
};
231 changes: 0 additions & 231 deletions packages/api/src/api/contributions.ts

This file was deleted.

2 changes: 0 additions & 2 deletions packages/api/src/api/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ import { health } from './health';
import { users } from './users';
import { videos } from './videos';
import { project } from './project';
import { contributions } from './contributions';

export const api = Router();

Expand All @@ -14,4 +13,3 @@ api.use('/users', users);
api.use('/videos', videos);
api.use('/auth', auth);
api.use('/project', project);
api.use('/contributions', contributions);
118 changes: 118 additions & 0 deletions packages/api/src/cronJobs.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
import { EntityManager } from '@mikro-orm/core';
import { PostgreSqlDriver } from '@mikro-orm/postgresql';
import { Contribution } from './entities/Contribution';
import { Project } from './entities/Project';
import { User } from './entities/User';
import { env } from './env';
import logger from './logger';
import { buildQueryBodyString } from './utils/github/buildContributionQuery';
import { buildDateQuery } from './utils/github/buildDateQuery';
import { buildProjectsQuery } from './utils/github/buildProjectsQuery';
import { buildUsersQuery } from './utils/github/buildUserQuery';

interface ContributionResponse {
data: {
search: {
pageInfo: {
hasNextPage: boolean;
endCursor: string | null;
};
nodes: PullRequestContribution[];
};
};
}

export interface PullRequestContribution {
id: string;
title: string;
permalink: string;
mergedAt: string;
author: {
login: string;
};
}

export const contributionPoll = async (entityManager: EntityManager<PostgreSqlDriver>) => {
try {
const timeRange = new Date();
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved
const today = new Date();
const yesterday = new Date(today);
yesterday.setDate(today.getDate() - 100);
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved
timeRange.setHours(timeRange.getHours() - 5);

const userList = await entityManager.find(User, {
$or: [
{
contributionsLastCheckedAt: {
$lt: timeRange,
},
},
{
contributionsLastCheckedAt: {
$eq: null,
},
},
],
});
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved

if (userList.length === 0) {
return;
}

const projectList = await entityManager.find(Project, {});
const projectsString = await buildProjectsQuery(projectList);
const dateString = buildDateQuery(yesterday, today);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to cause problems... here's an example:

  1. We query for User A on Monday and successfully update the DB
  2. We try again for User A on Tuesday but the call fails and we don't update the DB
    3.We try again for User A on Wednesday but the call fails and we don't update the DB
  3. We try again on Thursday, we use yesterday (Wednesday) as the beginning of the query and we miss all the contributions between the last query and the beginning of yesterday

I think we need to be creating these ranges on a per-user basis using the date saved to the DB. Thoughts?

Copy link
Contributor Author

@TevinBeckwith TevinBeckwith Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that we will have to make call to github on a per user basis which will drastically increase how close we get to our rate limit. Currently we have a rate limit of 5000 points to use per hour. If we do the search in one big chunk, each call is cost 1 point and returns 100 nodes. So we can get a little less than 500,000 nodes per hour (less because we also do a call to get project nodes once). If we do a call per user each call will still be 1 point, but we will return less nodes per call and make more calls reducing the amount of potential nodes we can get per hour. This will get worse when we add in other contribution types in the future.

A compromise to the use a date range that uses the oldest lastChecked time as the base for the search. This will cause us to potentially pick up more nodes the we need, but it should still be better than the per user option.

Rate Limit Documentation: https://docs.github.com/en/graphql/overview/resource-limitations

const usersString = buildUsersQuery(userList);
let cursor = null;
let hasNextPage = true;

while (hasNextPage) {
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved
const queryBodyString = buildQueryBodyString(projectsString, dateString, usersString, cursor);

const fetchRes = await fetch('https://api.github.com/graphql', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `bearer ${env.githubToken}`,
},
body: queryBodyString,
});

const { data: responseData }: ContributionResponse = await fetchRes.json();
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved
for (const pr of responseData.search.nodes.entries()) {
// Only Add new contributions
if ((await entityManager.find(Contribution, { nodeID: { $eq: pr[1].id } })).length === 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's pr[1]? 🤔


So I could be wrong on this, but I think Mikro-orm is smart enough to upsert (insert OR update depending on whether it already exists). Since you already made nodeId unique for this table, can you see what happens when this re-runs and tries to create new ones? I think this won't create duplicates, butttt I could also be totally wrong... try it and see if we can remove this extra call

If we do need this, let's swap find for count - it'll be faster and easier for you to use here (and you don't need .length === 0; just throw a ! in front of the count call and it'll either be truthy for 1+ or falsy for 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the nodeId field unique to prevent the accidental duplication of contributions. This had the unfortunate effect of erroring out the program if a duplicate is attempted to be added however so I had to also implement my own check as well.

Copy link
Contributor Author

@TevinBeckwith TevinBeckwith Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.entries() of responseData.search.nodes.entries() returns each node as an array [number, PullRequestContribution]. So pr[1] is accessing the actual PullRequestContribution object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that last comment, can you add something to that effect as an inline comment?

const user = await entityManager.findOne(User, {
githubUsername: { $eq: pr[1].author.login },
});
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved
if (user) {
const newContribution = new Contribution({
nodeID: pr[1].id,
authorGithubId: user.githubId,
type: 'Pull Request',
description: pr[1].title,
contributedAt: new Date(Date.parse(pr[1].mergedAt)),
score: 100,
});
entityManager.persist(newContribution);
}
}
}
hasNextPage = responseData.search.pageInfo.hasNextPage;
cursor = responseData.search.pageInfo.endCursor;
}
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved

for (const user of userList) {
user.contributionsLastCheckedAt = new Date();
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved
entityManager.persist(user);
}

// Save new info in the db
void entityManager.flush();
TevinBeckwith marked this conversation as resolved.
Show resolved Hide resolved
logger.info('Successfully polled and saved new PR contributions');
return;
} catch (error) {
const errorMessage = 'There was an issue saving contribution data to the database';
logger.error(errorMessage, error);
}
};
Loading