Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing scripts rework #348

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
8c87a57
Dockerfile: COPY Elixir sources after `pip install`
tleb Nov 7, 2024
a07470f
utils/index-repository: fetch in parallel
tleb Nov 7, 2024
e64d325
utils/update-elixir-data: fetch in parallel
tleb Nov 7, 2024
a619961
utils/index-repository: add alias for `git -C ...`
tleb Nov 8, 2024
afdf910
utils/index-repository: support calling on existing repository
tleb Nov 8, 2024
a931e93
utils/*: delete common.sh and inline $ELIXIR_THREADS fallback
tleb Nov 8, 2024
a343612
utils/index-repository: refactor by creating project_init() function
tleb Nov 8, 2024
2fa895c
utils/index-repository: refactor by creating project_add_remote() fun…
tleb Nov 8, 2024
d8df220
utils/index-repository: refactor by creating project_fetch() function
tleb Nov 8, 2024
219640e
utils/index-repository: refactor by creating project_index() function
tleb Nov 8, 2024
a2febaa
utils: rename index-repository to index
tleb Nov 8, 2024
6297a12
utils: deduplicate index-all-repositories into index
tleb Nov 8, 2024
b2a4694
utils/index: make it possible to update a specific project
tleb Nov 8, 2024
ce78f48
utils: deduplicate utils/update-elixir-data into utils/index
tleb Nov 8, 2024
122ed22
utils/index: add init.defaultBranch= config to `git init` call
tleb Nov 8, 2024
6105f40
utils: deduplicate pack-repositories into index
tleb Nov 8, 2024
f50cb64
README: remove "Keeping git repository disk usage under control" section
tleb Nov 8, 2024
5248656
utils/index: allow indexing project with remote URLs
tleb Nov 8, 2024
bf1dbec
utils/index: remove `git config --system --add safe.directory` call
tleb Nov 8, 2024
4c7f61a
utils/index: remove /usr/local/elixir/update.py absolute path
tleb Nov 8, 2024
d6e1cb1
README: update following utils/* script changes
tleb Nov 8, 2024
80ca8ac
utils/index: force use of bash, we depend on it for ${@:5} syntax
tleb Dec 20, 2024
e898e29
utils/index: avoid passing argument to test(1)
tleb Dec 20, 2024
6af146b
Dockerfile: add virtualenv to $PATH by default
tleb Dec 21, 2024
c3eabc1
Dockerfile: set PYTHONUNBUFFERED=1 by default
tleb Dec 21, 2024
d207b11
Dockerfile: add utils/ in $PATH by default, for easy indexing
tleb Dec 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 8 additions & 37 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -255,35 +255,12 @@ as a front-end to reduce the load on the server running the Elixir code.
== Keeping Elixir databases up to date

To keep your Elixir databases up to date and index new versions that are released,
we're proposing to use a script like `utils/update-elixir-data` which is called
we're proposing to use a script like `index /srv/elixir-data --all` which is called
through a daily cron job.

You can set `$ELIXIR_THREADS` if you want to change the number of threads used by
update.py for indexing (by default the number of CPUs on your system).

== Keeping git repository disk usage under control

As you keep updating your git repositories, you may notice that some can become
considerably bigger than they originally were. This seems to happen when a `gc.log`
file appears in a big repository, apparently causing git's garbage collector (`git gc`)
to fail, and therefore causing the repository to consume disk space at a fast
pace every time new objects are fetched.

When this happens, you can save disk space by packing git directories as follows:

----
cd <bare-repo>
git prune
rm gc.log
git gc --aggressive
----

Actually, a second pass with the above commands will save even more space.

To process multiple git repositories in a loop, you may use the
`utils/pack-repositories` that we are providing, run from the directory
where all repositories are found.

= Building Docker images

Dockerfiles are provided in the `docker/` directory.
Expand All @@ -305,22 +282,16 @@ The Docker image does not contain any repositories.
To index a repository, you can use the `index-repository` script.
For example, to add the https://musl.libc.org/[musl] repository, run:

# docker exec -it -e PYTHONUNBUFFERED=1 elixir-container \
/bin/bash -c 'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \
/usr/local/elixir/utils/index-repository \
musl https://git.musl-libc.org/git/musl'

Without PYTHONUNBUFFERED environment variable, update logs may show up with a delay.
# docker exec -it elixir-container \
index -c '/srv/elixir-data musl'

Or, to run indexing in a separate container:

# docker run -e PYTHONUNBUFFERED=1 -v ./elixir-data/:/srv/elixir-data \
--entrypoint /bin/bash elixir -c \
'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \
/usr/local/elixir/utils/index-repository \
musl https://git.musl-libc.org/git/musl'
# docker run -v ./elixir-data/:/srv/elixir-data \
--entrypoint index elixir -c \
'/srv/elixir-data musl'

You can also use utils/index-all-repositories to start indexing all officially supported repositories.
You can also use `index /srv/elixir-data --all` to start indexing all officially supported repositories.

After indexing is done, Elixir should be available under the following URL on your host:
http://172.17.0.2/musl/latest/source
Expand All @@ -332,7 +303,7 @@ If 172.17.0.2 does not answer, you can check the IP address of the container by
== Automatic repository updates

The Docker image does not automatically update repositories by itself.
You can, for example, start `utils/update-elixir-data` in the container (or in a separate container, with Elixir data volume/directory mounted)
You can, for example, start `index /srv/elixir-data --all` in the container (or in a separate container, with Elixir data volume/directory mounted)
from cron on the host to periodically update repositories.

== Using Docker image as a development server
Expand Down
7 changes: 6 additions & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ RUN \
libyaml-0-2 \
wget

COPY . /usr/local/elixir/
COPY ./requirements.txt /usr/local/elixir/requirements.txt

WORKDIR /usr/local/elixir/

Expand All @@ -43,6 +43,8 @@ RUN python3 -m venv venv && \
pip install /tmp/build/berkeleydb-*.whl && \
pip install -r requirements.txt

COPY . /usr/local/elixir/

RUN mkdir -p /srv/elixir-data/

COPY ./docker/000-default.conf /etc/apache2/sites-available/000-default.conf
Expand All @@ -55,5 +57,8 @@ ARG ELIXIR_VERSION
ENV ELIXIR_VERSION=$ELIXIR_VERSION

ENV ELIXIR_ROOT=/srv/elixir-data
ENV PATH="/usr/local/elixir/venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
ENV PATH="/usr/local/elixir/utils:$PATH"

ENTRYPOINT ["/usr/sbin/apache2ctl", "-D", "FOREGROUND"]
24 changes: 0 additions & 24 deletions utils/common.sh

This file was deleted.

157 changes: 157 additions & 0 deletions utils/index
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
#!/bin/bash

if test $# -lt 2; then
echo "Usage: $0 <elixir_data_path> <project_name> [<repo_urls>...]"
echo "Usage: $0 <elixir_data_path> --all"
exit 1
fi

# $1 is the project path (inside will be created data/ and repo/).
# It supports being called on an existing project.
project_init() {
# Detect already inited projects. Avoids stderr logs.
# Using `git tag -n1` because `git status` doesn't work on bare repos.
if git -C $1/repo tag -n1 >/dev/null 2>/dev/null; then
return;
fi

mkdir -p $1/data $1/repo

git -C $1/repo -c init.defaultBranch=main init --bare
}

# $1 is the project path (parent of data/ and repo/).
# $2 is the remote URL.
project_add_remote() {
git="git -C $1/repo -c safe.directory=$1/repo"

# Do nothing if remote already exists.
if $git remote | xargs -L1 -r $git remote get-url 2>/dev/null | grep -qxF "$2"; then
return;
fi

# Remotes are called remote$i with $i = 0, 1, 2...
i="$($git remote | awk '
BEGIN { n=-1; }
$0 ~ /^remote[0-9]+$/ { i=substr($0, length("remote")+1);
if (i>n) n=i; }
END { print n+1; }')"

$git remote add remote$i "$2"
}

# $1 is the project path (parent of data/ and repo/).
project_fetch() {
git="git -C $1/repo -c safe.directory=$1/repo"

$git fetch --all --tags -j4

# A gc.log file implies a garbage collect failed in the past.
# Also, create a hidden flag which could be useful to trigger GCs manually.
if test -e $1/repo/gc.log -o "$ELIXIR_GC"; then
$git gc --aggressive
else
# Otherwise, give Git an occasion to trigger a GC.
# Porcelain commands should trigger that, but we don't use any.
$git gc --auto
fi
}

# $1 is the project path (parent of data/ and repo/).
project_index() {
if test -z "$ELIXIR_THREADS"; then
ELIXIR_THREADS="$(nproc)"
fi

elixir_sources="$(dirname "$(dirname "$0")")"

LXR_REPO_DIR=$1/repo LXR_DATA_DIR=$1/data \
python3 "$elixir_sources/update.py" $ELIXIR_THREADS
}

# $1 is the Elixir root data path.
# $2 is the project name.
# $... are the remote URLs.
add_remotes() {
dir="$1/$2"

project_init "$dir"

shift
shift
for remote
do
project_add_remote "$dir" "$remote"
done
}

# Call add_remotes() if no remotes are passed as arguments.
#
# $1 is the Elixir root data path.
# $2 is the CLI arg count.
# $3 is the CLI arg for project name (can be --all).
# $4 is the project name.
# $... are the default remote URLs.
add_default_remotes() {
if test $2 -eq 2 -a \( "$3" = "--all" -o "$3" = "$4" \); then
add_remotes "$1" "$4" ${@:5}
fi
}

do_index() {
if test ! "$(find $1/data -type f)"; then
# If we are indexing from scratch, do it twice as the initial one
# probably took a lot of time.
project_fetch "$1"
project_index "$1"
project_fetch "$1"
project_index "$1"
else
project_fetch "$1"
project_index "$1"
fi
}

# Add all known projects remotes. This works in two cases:
# ./utils/index <elixir_data_path> --all # => Add default remotes for all projects
# ./utils/index <elixir_data_path> musl # => Add default remote for musl
add_default_remotes $1 $# $2 amazon-freertos https://github.com/aws/amazon-freertos.git
add_default_remotes $1 $# $2 arm-trusted-firmware https://github.com/ARM-software/arm-trusted-firmware
add_default_remotes $1 $# $2 barebox https://git.pengutronix.de/git/barebox
add_default_remotes $1 $# $2 busybox https://git.busybox.net/busybox
add_default_remotes $1 $# $2 coreboot https://review.coreboot.org/coreboot.git
add_default_remotes $1 $# $2 dpdk https://dpdk.org/git/dpdk \
https://dpdk.org/git/dpdk-stable
add_default_remotes $1 $# $2 glibc https://sourceware.org/git/glibc.git
add_default_remotes $1 $# $2 llvm https://github.com/llvm/llvm-project.git
add_default_remotes $1 $# $2 mesa https://gitlab.freedesktop.org/mesa/mesa.git
add_default_remotes $1 $# $2 musl https://git.musl-libc.org/git/musl
add_default_remotes $1 $# $2 ofono https://git.kernel.org/pub/scm/network/ofono/ofono.git
add_default_remotes $1 $# $2 op-tee https://github.com/OP-TEE/optee_os.git
add_default_remotes $1 $# $2 qemu https://gitlab.com/qemu-project/qemu.git
add_default_remotes $1 $# $2 u-boot https://source.denx.de/u-boot/u-boot.git
add_default_remotes $1 $# $2 uclibc-ng https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git
add_default_remotes $1 $# $2 zephyr https://github.com/zephyrproject-rtos/zephyr
add_default_remotes $1 $# $2 toybox https://github.com/landley/toybox.git
add_default_remotes $1 $# $2 grub https://git.savannah.gnu.org/git/grub.git
add_default_remotes $1 $# $2 bluez https://git.kernel.org/pub/scm/bluetooth/bluez.git
add_default_remotes $1 $# $2 linux https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git \
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git \
https://github.com/bootlin/linux-history.git
add_default_remotes $1 $# $2 xen https://xenbits.xen.org/git-http/xen.git
add_default_remotes $1 $# $2 freebsd https://git.freebsd.org/src.git

# Index a single project
if test "x$2" != "x--all"; then
dir="$1/$2"
add_remotes "$@"
do_index "$dir"
else
# Index all projects.
# Note: this is not only the default projects ones but all the ones in $1.
find $1 -mindepth 1 -maxdepth 1 -type d | \
while read dir; do
do_index "$dir"
done
fi

96 changes: 0 additions & 96 deletions utils/index-all-repositories

This file was deleted.

Loading