-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect file extension stats in gh-pages
#12
base: main
Are you sure you want to change the base?
Conversation
Skips filenames without extensions and strips dirs from them
Because there is too much noise with double and more extensions
Firefox repo contains almost 22 million files, so that means
|
Replacing with a single `grep` call. -P - use Perl regexp -o - output only matched (?<=) - look behind (don't include this part)
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
pull_request: | ||
workflow_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing workflow in branches other than main
.
|
||
- name: Gather commits by day and file extension statistics | ||
run: | | ||
./01stats.sh gecko-dev build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we get out of this build step? Also I see that it takes pretty long time to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to move stats collection commands out of GitHub Actions YAML, so that they could be run standalone.
As I continued experiments in my main
branch after opening this PR, more things started to creep in. The command that takes the most time is git fetch --unshallow
added in the last commit to start playing with historical data.
Hi! Thanks for your contribution. Just curious, what's the motivation behind these changes? |
@4e6 well, this PR is far from being finished. The final goal was to get the dataset for diagrams on Firefox Oxidization over time. (like in #10). Because I didn't know the codebase, I started with the code that seemed to be the easiest to get up. Like counting file extensions over time. Because only full git checkout takes the whole 15 minutes, going commit over commit probably won't be feasible to do in one CI run, so the plan is to collect the data month by month over multiple CI runs. |
Maybe it will be faster that restoring `gecko-dev` history with, `git fetch --unshallow`, which takes about 15 minutes.
Doing complete https://github.com/abitrolly/firefox-lang-stats/runs/6713143940 |
Opened the issue in actions/checkout#818 to maybe track possible solutions. |
Clone by action with specified branch was 1m slower actions/checkout#818 (comment)
No description provided.