-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GrowThePie API pulls #1157
base: main
Are you sure you want to change the base?
GrowThePie API pulls #1157
Conversation
thanks for this @chuxinh this is fantastic! I'll defer to @MSilb7 for the question around contract labeling.
I guess this depends on the API and whether it has a filter parameter. I'll take a closer look. For defillama we do have to pull all of the history but to save on memory usage and time writing out we keep and only write out the most recent 7 days.
The best way to test it end-to-end is to run the cli command. When I run the cli command for testing I manually change the write location to
That said, I think this is not a very robust approach, so what I'm going to do is automatically set location to local when it detects that the CLI is running not from Github Actions or Kubernetes. 99% of the that is what we want to do anyways. I'll get back to you here with that change. Will take a closer look at the PR and leave comments there. |
Thanks @lithium323 !
I don't think they have any filter there, so far it's just access to their JSON files but you can check out here. And also made some changes for the data pull to run locally. Came across some issues with partition that requires |
summary_df = summary_df.rename({"date": "dt"}) | ||
|
||
GrowThePie.FUNDAMENTALS_SUMMARY.write( | ||
dataframe=summary_df, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataframe=summary_df, | |
dataframe=most_recent_dates(summary_df, n_dates=FUNDAMENTALS_LAST_N_DAYS), |
Here we could only write the most recent data (to avoid many unnecessary writes)
where FUNDAMENTALS_LAST_N_DAYS
could be only the last 7 days.
I saw they have a non-full endpoint, limited to 365 days, but it does not include ethereum so maybe we don't want to use it. The full endpoint is not too bad, it returns quite quickly. To avoid writing out too many dataframes each time we could fliter the results to the last N days before writing, we do that for defillama as well. I'll run the cli locally to reproduce the issues with the partition without dt and will get back. Since you'll be OOO I will also look into merging the PR and running it in github actions. |
Description
Migrating GTP utils to bigquery as part of the Superchain Health Dashboard Pipeline #1093
GrowThePie API documentation: here
Currently getting:
Questions