Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python-based version of the QLever script #17

Merged
merged 0 commits into from
Feb 24, 2024
Merged

Python-based version of the QLever script #17

merged 0 commits into from
Feb 24, 2024

Conversation

hannahbast
Copy link
Member

The shell script we have so far was a proof of concept to play around with. The usability was great, but given the heavy use of bash-specific code, there were major portability issues with other operating systems. In particular, even the most basic functionality did not work on MacOS.

These are the first steps to rewrite the script in Python. As a welcome side effect, this also allows for cleaner code (bash scripts always end up very messy and cryptic when they grow bigger) and better testability.

Copy link

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time I ran it, I just got this message and it kept running forever:

Follow olympics.server-log.txt until the server is ready (Ctrl-C stops following the log, but not the server)

nohup: ServerMain: No such file or directory

I think it would be good to check if the configured binary is found and if not, give a message to tell the user how to set PATH, QLEVER_HOME, or [server] binary - whatever you think is the recommended method. Also, I am using the recommended directory structure of qlever-code, qlever-control, and qlever-indices so it would be nice if it worked "out of the box" for people who set up that directory structure.

Index test

First, I tried PATH=../../qlever-code/build:$PATH ../qlever.py index but it gave a strange error:

libc++abi: terminating due to uncaught exception of type std::runtime_error: U_FILE_ACCESS_ERROR
cat: stdout: Broken pipe

This could be some problem with my compiled binary of IndexBuilderMain which is quite out of date at this point. If I get more time, I'll rebuild it and test this again.

Luckily, when I set USE_DOCKER = true in the .ini file, it worked perfectly and built the index!

Start test

With docker disabled, I got another error message at first:

Action: "start"

nohup ServerMain -i olympics -j 8 -p 7019 -m 5 -c 2 -e 1 -k 100 -a olympics_7643543846 > olympics.server-log.txt 2>&1 &

Follow olympics.server-log.txt until the server is ready (Ctrl-C stops following the log, but not the server)

2023-08-28 17:32:58.879 - INFO:  QLever Server, compiled on Fri Jun  9 09:58:04 PDT 2023 using git hash 8d0639
2023-08-28 17:32:58.883 - ERROR: U_FILE_ACCESS_ERROR

Once again, when I set USE_DOCKER = true in the file and try again, it works perfectly.

Stop

Stop is not working, but it warned me that "stop" isn't yet working on MacOS, so that's expected. Let me know if I can help with getting it to work, maybe after this PR is merged.

qlever.py Outdated
@@ -0,0 +1,441 @@
#!/usr/bin/python3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On MacOS, I have a newer version of python installed with Homebrew, and it gets installed at the path /opt/homebrew/bin/python3 to not conflict with the older system python3 at /usr/bin/python3. Likewise, people may have a virtualenv active in which case they want to use the venv's python, not this system python. So changing this to the following would let everyone run this as ./qlever.py regardless of how they have python configured, I think:

Suggested change
#!/usr/bin/python3
#!/usr/bin/env python3

Qleverfile.ini Outdated

[data]
BASE_URL = https://github.com/wallscope/olympics-rdf
GET_DATA_CMD = wget -q -nc -O olympics.zip ${BASE_URL}/raw/master/data/olympics-nt-nodup.zip && unzip -q -o olympics.zip && rm olympics.zip

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wget is not installed on MacOS, but curl is. Do you think we can use curl instead? If not, let's recommend installing wget along with psutil before using this.

Action: "get-data"

wget -q -nc -O olympics.zip https://github.com/wallscope/olympics-rdf/raw/master/data/olympics-nt-nodup.zip && unzip -q -o olympics.zip && rm olympics.zip

sh: wget: command not found
Suggested change
GET_DATA_CMD = wget -q -nc -O olympics.zip ${BASE_URL}/raw/master/data/olympics-nt-nodup.zip && unzip -q -o olympics.zip && rm olympics.zip
GET_DATA_CMD = curl -L ${BASE_URL}/raw/master/data/olympics-nt-nodup.zip --output olympics.zip && unzip -q -o olympics.zip && rm olympics.zip

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A partial review of the first ~150 lines of the Python script. It is very readable so far and makes it easy to reason about.

qlever.py Outdated
Comment on lines 31 to 32
self.config.read("Qleverfile.ini")
self.name = self.config['DEFAULT']['name']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do those give reasonable error messages when there is no, or an ill-formed Qleverfile?

qlever.py Outdated
Comment on lines 36 to 64
defaults = {
"DEFAULT": {
"log-level": "info",
},
"server": {
"binary": "ServerMain",
"num_threads": "8",
"cache_max_size_gb": "5",
"cache_max_size_gb_single_entry": "1",
"cache_max_num_entries": "100",
"with_text_index": "no",
"only_pso_and_pos_permutations": "no",
"no_patterns": "no",
},
"index": {
"binary": "IndexBuilderMain",
"with_text_index": "no",
"only_pso_and_pos_permutations": "no",
"no_patterns": "no",
},
"docker": {
"image": "adfreiburg/qlever",
"container_server": f"qlever.server.{self.name}",
"container_indexer": f"qlever.indexer.{self.name}",
},
"ui": {
"port": "7000",
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving those out of this class right to the beginning of the file.

qlever.py Outdated
Comment on lines 65 to 69
for section in defaults:
for option in defaults[section]:
if not self.config[section].get(option):
self.config[section][option] = defaults[section][option]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there also a check, that all the configurations from the Qleverfile are also valid options?
For example what happens if I use num_treads in the QLeverfile or a similar typo? I think rasing a hard error would be the most user-friendly, but therefore we need the set of all supported options as a whitelist (similar to the defaults).

qlever.py Outdated
except Exception:
self.docker_enabled = False
print("Note: `docker info` failed, therefore"
" docker.USE_DOCKER=true not supported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it USE_DOCKER or use_docker (the other settings are lowercase). Or is this whole config case-insensitive?

qlever.py Outdated
print("Note: QLever binaries not found or failed, therefore"
" docker.USE_DOCKER=false not supported")
self.binaries_work = False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we immediately terminate the script if we have neither a working Docker nor a working native binary setup? I think this would be the right position to give a useful error message here.

Also: the self.binaries_work variable is set here, but never used as far as I see.

qlever.py Outdated

# Default values for options that are not mandatory in the Qleverfile.
defaults = {
"DEFAULT": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it "DEFAULT" in caps, but the rest is lowercase?

qlever.py Outdated
Comment on lines 154 to 157
curl_cmd = f"curl -s http://localhost:{port}/ping?msg={message}"
exit_code = subprocess.call(curl_cmd, shell=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a cross-platform python http library for these things and subprocess etc. only where really necessary?

@Qup42
Copy link
Member

Qup42 commented Sep 13, 2023

Support for ad-freiburg/wharfer would also be nice. I don't have access to docker directly on some machines. Adding wharfer support is probably best done as a separate PR once this PR is merged. I would be willing to contribute this feature.

@hannahbast
Copy link
Member Author

@Qup42 wharfer is our group-specific (and by now already quite old) solution for the security problems associated with running docker. Maybe it would be better to switch to something more commonly used. For example, how about using podman?

@Qup42
Copy link
Member

Qup42 commented Sep 13, 2023

@hannahbast Sure that's even better! Podman has been on my bucket list for some time now. :) Afaik dockers image format is compatible with podman. The cli options seem to differ quite a bit. So then it would be podman support (in a separate PR).

@Qup42 Qup42 mentioned this pull request Dec 22, 2023
@hannahbast hannahbast merged commit 1761afc into main Feb 24, 2024
@hannahbast hannahbast deleted the python-qlever branch August 17, 2024 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants