-
Notifications
You must be signed in to change notification settings - Fork 115
Add --print-progress flag to show puller progress #66
base: master
Are you sure you want to change the base?
Conversation
Currently this depends on a forked version of puller.par. Pending pull request to upstream puller.par is here: google/containerregistry#66
Add minimal prints to indicate puller progress if container_pull has print_progress=True attribute. The intention is that the pull operation would provide some feedback to the user, instead of bazel getting seemingly stuck for 10 minutes when pulling an image that is many gigabytes in size. Pending pull request to upstream puller.par is here: google/containerregistry#66
client/v2_2/save_.py
Outdated
@@ -168,7 +172,8 @@ def write_file(name, accessor, | |||
future_to_params[f] = digest_name | |||
|
|||
layer_name = os.path.join(directory, '%03d.tar.gz' % idx) | |||
f = executor.submit(write_file, layer_name, image.blob, blob) | |||
message = 'Downloading from {} ({}/{})'.format(image.name(), idx+1, num_layers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downloading from gcr.io/tensorflow/tensorflow:latest (1/12 layers)
Downloading from gcr.io/tensorflow/tensorflow:latest (2/12 layers)
...
maybe more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I went with slightly modified version:
Downloading from gcr.io/tensorflow/tensorflow:latest (layer 1/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (layer 2/12)
...
Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I do think it makes sense to print some form of progress message for these long running actions. My one concern is wrt how to print out these messages (i.e., use of sys vs something else for logging that can help control these messages at a coarse grain). |
@KaylaNguyen ping? |
@KaylaNguyen could you please provide feedback about this PR? I opened it already in February, and still no response. |
Hi @scele, currently we don't have any plan to maintain this repo. But I can take a look at it in my personal time. Response time will be slow to very slow. I'll see what I can do and get back to you by the end of next week. Thanks for your understanding :) |
Please wait until the next release to merge with new changes. Ping me when you're ready and I'll export this PR from internal. Thanks! |
@KaylaNguyen Sorry, I missed your previous comment.. :( I have rebased now, can you take a new look to get this merged? Thanks! |
Will slowly get back to you by end of next week :) Thank you for your patience. |
@@ -22,6 +22,7 @@ | |||
import io | |||
import json | |||
import os | |||
import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logging instead, you can check fast_flatten or fast_pusher for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my commit message for explanation why I don't want to use logging
.
Also, I would like the print to go to stderr, not stdout, because other bazel build status messages also go to stderr:
Loading:
Loading: 0 packages loaded
currently loading: foo
Analyzing: target //foo:bar (45 packages loaded)
...
I think the pull status progresses should appear together with these messages in stderr like so:
Loading:
Loading: 0 packages loaded
currently loading: foo
Analyzing: target //foo:bar (45 packages loaded)
Downloading from gcr.io/tensorflow/tensorflow:latest (1/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (2/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (3/12)
...
If the output goes to stdout, then these status messages would clutter run_log.txt in the following invocation:
bazel run //foo:bar 2>build_log.txt >run_log.txt
@@ -144,7 +145,8 @@ def tarball(name, image, | |||
def fast(image, | |||
directory, | |||
threads = 1, | |||
cache_directory = None): | |||
cache_directory = None, | |||
print_progress = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update Args section below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Add minimal prints to indicate puller progress if --print-progress flag is passed to fast puller. The intention is that the bazel container_pull workspace rules could eventually output something, instead of bazel getting seemingly stuck for 10 minutes when pulling an image that is many gigabytes in size. We are intentionally not using the logging facility and the existing --stderrthreshold argument to display these prints, because that will produce log-formatted output that looks too detailed when inlined with other bazel output. The intention is to show user-friendly progress instead: Downloading from gcr.io/tensorflow/tensorflow:latest (layer 1/12) Downloading from gcr.io/tensorflow/tensorflow:latest (layer 2/12) Downloading from gcr.io/tensorflow/tensorflow:latest (layer 3/12) Downloading from gcr.io/tensorflow/tensorflow:latest (layer 4/12) Downloading from gcr.io/tensorflow/tensorflow:latest (layer 5/12) ...
Add minimal prints to indicate puller progress if --print-progress
flag is passed to fast puller. The intention is that the bazel
container_pull workspace rules could eventually output something,
instead of bazel getting seemingly stuck for 10 minutes when
pulling an image that is many gigabytes in size.
We are intentionally not using the logging facility and the existing
--stderrthreshold argument to display these prints, because that
will produce log-formatted output that looks too detailed when inlined
with other bazel output. The intention is to show user-friendly
progress instead:
Downloading from gcr.io/tensorflow/tensorflow:latest (1/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (2/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (3/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (4/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (5/12)
...