Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makepkg: lint_pkgbuild is slow #19

Open
2 of 7 tasks
sskras opened this issue Sep 9, 2022 · 11 comments
Open
2 of 7 tasks

makepkg: lint_pkgbuild is slow #19

sskras opened this issue Sep 9, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@sskras
Copy link

sskras commented Sep 9, 2022

Description

I tried building newest msys2/MINGW-packages/mingw-w64-clang.

Every time I run makepkg-mingw, things after Building mingw64... and before Making package: mingw-w64-clang 14.0.6-6 take ~90-120 seconds to complete:

image

My machine has i5-6200U CPU running @ 2.30GHz:

image

Is this expected? And why?

Verification

Windows Version

Microsoft Windows [Version 10.0.19044.1889]

MINGW environments affected

  • MINGW64
  • MINGW32
  • UCRT64
  • CLANG64
  • CLANG32
  • CLANGARM64

Expected behavior

Tried tracing the scripts using set -x. Looks like it's doing a lot of Bash processing. Big part of which seems to be repetitive.

Eg. I noticed the Bash function lint_pkgbuild() coming from pacman/scripts/libmakepkg and occurring at the start of my trace.

I expect to know the root cause of this step taking so relatively long to complete.
And whether is this expected.

Actual behavior

The initial step takes ~90-120 seconds to complete.

Repro steps

1. Download the files:

$ mkdir temp
$ cd temp
$ git clone [email protected]:msys2/MINGW-packages.git

2. Go into the Clang dir:

$ cd MINGW-packages/mingw-w64-clang/

3. Start makepkg & stop it just before the build:

$ time makepkg-mingw -sCLf |& sed '/Making package/ q'; pkill -f bash./usr/bin/makepkg
@sskras sskras added the bug Something isn't working label Sep 9, 2022
@lazka
Copy link
Member

lazka commented Sep 9, 2022

I always disable linting because of this recently... Yeah, might be a good idea to make it opt in.

@lazka
Copy link
Member

lazka commented Sep 9, 2022

Excluding the msys2 dir from defender also helps a lot

@lazka
Copy link
Member

lazka commented Sep 10, 2022

I've created #18

Feedback welcome

@sskras
Copy link
Author

sskras commented Sep 10, 2022

@lazka, thanks for the efforts.

Another question goes about the actual parsing implementation.

@MehdiChinoune commented yesterday:

It is expected because It is a split package. msys2/MSYS2-packages#2593

As the original issue is already closed, I quote and comment it here:

@lazka commented on Jul 31, 2021 in msys2/MINGW-packages#2593:

not much we can do I think.. lots of forks..

I struggle at analyzing all the related Bash code at the moment. Could you please summarize:

  • Why is such heavy forking needed to parse PKGFILEs ?
  • Could it be rewritten in another language and so avoid forking more easily ?

@lazka
Copy link
Member

lazka commented Sep 11, 2022

I struggle at analyzing all the related Bash code at the moment. Could you please summarize:

* Why is such heavy forking needed to parse PKGFILEs ?

because it's fast on Linux, and pacman is primarily developed for Linux.

* Could it be rewritten in another language and so avoid forking more easily ?

Tricky since PKGBUILD files are written in bash. Someone would have to speed things up without complicating the code too much, and propose that to upstream.

I see some potential by evaluating functions once and then re-using that for further checks, but that's no small change.

@sskras
Copy link
Author

sskras commented Sep 11, 2022

I see some potential by evaluating functions once and then re-using that for further checks, but that's no small change.

Thanks. And I am yet to see the forking source (no pun intended:)

IOW I believe that eval works without invoking fork(). Eg.: What's the difference between eval and exec?

eval will run the arguments as a command in the current shell. In other words eval foo bar is the same as just foo bar. But variables will be expanded before executing, so we can execute commands saved in shell variables:

$ unset bar
$ cmd="bar=foo"
$ eval "$cmd"
$ echo "$bar"
foo

It will not create a child process, so the variable is set in the current shell. (Of course eval /bin/ls will create a child process, the same way a plain old /bin/ls would.)

I guess I will need to dedicate some days into diving into the set of related Bash scripts anyways.

@lazka
Copy link
Member

lazka commented Sep 11, 2022

You can run bash -x /usr/bin/makepkg --printsrcinfo to see what it's doing

just as an example, the following contains some parts of the code involved:

#!/bin/bash

for i in {1..1000}
do
    while read -r; do
        true
    done < <(echo "")
done

This takes:

  • 6.113s with defender active
  • 5.706s with MSYS2 in the defender exclusion list
  • 4.546s with defender switched off completely
  • 0.220s under WSL

@sskras
Copy link
Author

sskras commented Sep 12, 2022

Thanks. I added some traces to the scripts and found this little guy called extract_function_variable() invoking subshells:

image

Here it runs grep_function "$funcname" "$attr_regex" like this:

grep_function "package_mingw-w64-x86_64-clang" "^[[:space:]]* checkdepends_any\+?=[^(]"

... which in turn does some parsing by executing declare -f for the funcname and piping it to grep:

{ declare -f "package_mingw-w64-x86_64-clang" || declare -f package; } 2>/dev/null | grep -E "^[[:space:]]* checkdepends_any\+?=[^(]"

@lazka, do you know (why) is using subshell absolutely needed here?
Wouldn't the same parsing work by the means of eval ?

@lazka lazka transferred this issue from msys2/MINGW-packages Sep 13, 2022
@lazka lazka changed the title The makepkg preparation step before Making package: mingw-w64-clang ... is slow makepkg: lint_pkgbuild is slow Sep 13, 2022
@sskras
Copy link
Author

sskras commented Sep 13, 2022

Thanks for fixing the title (you beat me to it:)

Maybe it's worth reporting to the upstream?

@sskras
Copy link
Author

sskras commented Sep 14, 2022

OK, it seems my thought was wrong. I tried to move the grep_function ... upper and pipe it to the looped read directly:

 92         grep_function "$funcname" "$attr_regex" | \
 93         while read -r; do

This (unexpectedly) increased running time by the 18%, it's 110s => 130s!

Before the change:

real    1m50.572s
user    0m20.097s
sys     0m45.337s

After the change:

real    2m10.283s
user    0m23.358s
sys     0m59.751s

I guess, the fork()+exec() combo occurs to launch grep anyways, still incurring a big timing cost on Windows.
Sorry for the noise.

@sskras
Copy link
Author

sskras commented Sep 18, 2022

OK, I tried to use Bash ERE matching instead. The inspiration:
https://stackoverflow.com/questions/17420994/how-can-i-match-a-string-with-a-regex-in-bash/51960723#51960723

For that I used operator =~ inside [[ ... ]] and the BASH_REMATCH array. The PoC:

image

The ASCII diff
--- usr/share/makepkg/util/pkgbuild.sh  2022-09-02 13:25:59.000000000 +0300
+++ /usr/share/makepkg/util/pkgbuild.sh 2022-09-18 11:42:26.713924400 +0300
@@ -31,7 +31,15 @@
 }

 grep_function() {
-       { declare -f "$1" || declare -f package; } 2>/dev/null | grep -E "$2"
+       # Strip ^ (the beginning-of-a-line) symbol from the original regex:
+       REGEX="${2/#^/}"
+       # 1. Replace ^ with \n (to work in a multiline string);
+       # 2. wrap the regex into the "(...)" parentheses (to match the whole line later);
+       # 3. append any-chars-before-the-end-of-a-line to the original regex (to capture remainder of the line):
+       REGEX_BASH=$'\n('$REGEX$'[^\n]*)'
+
+       FUNC_BODY="$({ declare -f "$1" || declare -f package; } 2>/dev/null)"
+       if [[ $FUNC_BODY =~ $REGEX_BASH ]]; then echo "${BASH_REMATCH[1]}"; fi
 }

 array_build() {

This alone gave me ~30% or ~30s decrease (96 => 67) in running time of makepkg --printsrcinfo PKGBUILD.

The potential issue with the PoC at the moment is that it doesn't handle the next matching lines of the same function.

Though this could be worked around by multiplying REGEX_BASH into something like this in a loop:
${REGEX_BASH}.*${REGEX_BASH}.*${REGEX_BASH}.*${REGEX_BASH}<...>

Not sure if my PoC is worth of extending. The running time is still quite long: ~1m7s on my machine.
I wonder how long would it take the makepkg to do the same using Linux (no installation at hand).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants