-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing scripts rework #348
Open
tleb
wants to merge
26
commits into
bootlin:master
Choose a base branch
from
tleb:indexing-scripts-rework
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
COPY sources in two steps: (1) copy requirements.txt then do `pip install` stuff then (2) copy all remaining sources. This means the iterating time to rebuild the Docker image when editing sources is much shorter: from 22.3s to 7.3s on my machine. Signed-off-by: Théo Lebrun <[email protected]>
Previous sequence: - git clone ... # first fetch - git remote add remote0 ... - git fetch remote0 # second fetch - git remote add remote1 ... - git fetch remote1 # third fetch Now: - git init - git remote add remote0 ... - git remote add remote1 ... - git remote add remote2 ... - git fetch --all -j4 # all fetches at the same time Signed-off-by: Théo Lebrun <[email protected]>
This is pretty useful as update-elixir-data gets called often to check for new updates. Most often, there are none, so checking all remotes at the same time is useful. This only applies to the kernel, that is the only project using multiple (three) remotes. Signed-off-by: Théo Lebrun <[email protected]>
Simplify the script. We never `cd` into the directory, we instead use `git -C`. Avoid repeating it by creating a $git variable. Signed-off-by: Théo Lebrun <[email protected]>
Make utils/index-repository idempotent, meaning we can call it multiple times on the same repo and same remotes without issues. Also allow adding new remotes to an existing repo. Signed-off-by: Théo Lebrun <[email protected]>
$ELIXIR_THREADS fallback to nproc is straight forward code, much more than the incantation to find the path to the Elixir install path. Remove the incantation and replace by simple code: if test -z "$ELIXIR_THREADS"; then ELIXIR_THREADS="$(nproc)" fi Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
…ction Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
Allow calling like: ./utils/index musl That will do the same thing as before (fetch+index). It works only if a previous call was made to add remotes. Signed-off-by: Théo Lebrun <[email protected]>
Previously: LXR_PROJ_DIR=/srv/elixir-data ./utils/update-elixir-data Now: ./utils/index /srv/elixir-data --all The impact is slightly different: it also has the side-effect of creating all known projects (Linux, U-Boot, etc.) if they didn't exist. We have asked around and we are not aware of any other Elixir instance. To keep the previous behavior, if people don't want to index all supported projects: x=/srv/elixir-data find $x -mindepth 1 -maxdepth 1 -printf "%f\n | \ xargs -L1 -r ./utils/index $x Signed-off-by: Théo Lebrun <[email protected]>
Avoid the following Git warning: hint: Using 'master' as the name for the initial branch. This default branch name hint: is subject to change. To configure the initial branch name to use in all hint: of your new repositories, which will suppress this warning, call: hint: hint: git config --global init.defaultBranch <name> hint: hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and hint: 'development'. The just-created branch can be renamed via this command: hint: hint: git branch -m <name> Signed-off-by: Théo Lebrun <[email protected]>
utils/pack-repositories did the following on repos which have a gc.log file existing (created when GC fails): git prune git gc --aggressive git prune git gc --aggressive Here we: - Delete utils/pack-repositories; we don't want that detection to be done manually. Instead, we integrate the gc.log detection into utils/index that should be called often. - Create a hidden flag ($ELIXIR_GC) to allow manual trigger. - Replace the above sequence with a simpler `git gc --aggressive`. Let's trust Git. - Do a `git gc --auto` in the default case. This call is automatically done by porcelain commands but we don't run any so let's give Git an opportunity to cleanup from time to time (heuristic based). - Replace the gc.log detection from: find . -name gc.log To: test -e $data/$project/repo/gc.log It should be more reliable. With the first approach we risk projects that contain a file gc.log to trigger the detection on each run. Signed-off-by: Théo Lebrun <[email protected]>
New script utils/index does an automatic call to `git gc --auto` and if it detects a gc.log file, it runs `git gc --aggressive`. There shouldn't be any reason for people to have to think about that aspect. Remove that info from the README and make it lighter weight. Signed-off-by: Théo Lebrun <[email protected]>
Previously, to start an indexing from scratch: ./utils/index /srv/elixir-data musl https://git.musl-libc.org/git/musl This is annoying as the script already has the remote URLs for all known projects. Now, a call without remote will automatically add the remote URLs matching the project name: ./utils/index /srv/elixir-data musl This copies the behavior that was previously only implemented for --all. Signed-off-by: Théo Lebrun <[email protected]>
Stop writing a global file when initializing projects. This can cause permission issues. We instead pass the option manually for each Git process call using: git -c safe.directory=... Signed-off-by: Théo Lebrun <[email protected]>
Instead, start from $0 and move back up two times. So, something like: ./elixir/utils/index ./elixir/utils ./elixir ./elixir/update.py Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
tleb
force-pushed
the
indexing-scripts-rework
branch
from
December 20, 2024 21:19
aec3728
to
d6e1cb1
Compare
Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
This changes the stdout/stderr buffering behavior of Python. Without it, indexing scripts don't stream updates and use really big buffers. Signed-off-by: Théo Lebrun <[email protected]>
Signed-off-by: Théo Lebrun <[email protected]>
tleb
force-pushed
the
indexing-scripts-rework
branch
from
December 21, 2024 16:33
54fd0df
to
d207b11
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi!
This is a big rework of the wrapper scripts around indexing. Notice how we remove all scripts to replace by a single one called
utils/index
.update.py
path.)index/index-repository
, so it can be called in the same manner (eg./utils/index /srv/elixir-data musl https://git.musl-libc.org/git/musl
). This will init the project (if not already existing), add remote (if not already existing), fetch and index../utils/index /srv/elixir-data musl
. It will notice that we wantmusl
and automatically add the right remote URL from its list. This is matched on the project name../utils/index /srv/elixir-data --all
. This will add all known projects remote (if not already existing) and fetch+index them. Replacesindex-all-repositories
andupdate-elixir-data
.pack-repositories
, but automatically. Previous setup meant manual intervention was required. We remove the section aboutpack-repositories
from the README.Opinions @fstachura? Commit messages contain much more information.
Closes #342