Skip to content
This repository has been archived by the owner on Jun 4, 2021. It is now read-only.

Add --print-progress flag to show puller progress #66

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

scele
Copy link

@scele scele commented Feb 14, 2018

Add minimal prints to indicate puller progress if --print-progress
flag is passed to fast puller. The intention is that the bazel
container_pull workspace rules could eventually output something,
instead of bazel getting seemingly stuck for 10 minutes when
pulling an image that is many gigabytes in size.

We are intentionally not using the logging facility and the existing
--stderrthreshold argument to display these prints, because that
will produce log-formatted output that looks too detailed when inlined
with other bazel output. The intention is to show user-friendly
progress instead:

Downloading from gcr.io/tensorflow/tensorflow:latest (1/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (2/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (3/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (4/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (5/12)
...

@scele scele changed the title Add --verbose flag to show puller progress Add --print-progress flag to show puller progress Feb 19, 2018
scele added a commit to scele/rules_docker that referenced this pull request Feb 20, 2018
Currently this depends on a forked version of puller.par.

Pending pull request to upstream puller.par is here:
google/containerregistry#66
scele added a commit to scele/rules_docker that referenced this pull request Oct 4, 2018
Add minimal prints to indicate puller progress if container_pull
has print_progress=True attribute.  The intention is that the pull
operation would provide some feedback to the user, instead of bazel
getting seemingly stuck for 10 minutes when pulling an image that is
many gigabytes in size.

Pending pull request to upstream puller.par is here:
google/containerregistry#66
@@ -168,7 +172,8 @@ def write_file(name, accessor,
future_to_params[f] = digest_name

layer_name = os.path.join(directory, '%03d.tar.gz' % idx)
f = executor.submit(write_file, layer_name, image.blob, blob)
message = 'Downloading from {} ({}/{})'.format(image.name(), idx+1, num_layers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloading from gcr.io/tensorflow/tensorflow:latest (1/12 layers)
Downloading from gcr.io/tensorflow/tensorflow:latest (2/12 layers)
...

maybe more clear?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I went with slightly modified version:

Downloading from gcr.io/tensorflow/tensorflow:latest (layer 1/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (layer 2/12)
...

Is that ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@nlopezgi
Copy link

nlopezgi commented Oct 5, 2018

I do think it makes sense to print some form of progress message for these long running actions. My one concern is wrt how to print out these messages (i.e., use of sys vs something else for logging that can help control these messages at a coarse grain).
@KaylaNguyen could you comment on this PR? Do you have any advice as to how to print out the messages?

@scele
Copy link
Author

scele commented Oct 16, 2018

@KaylaNguyen ping?

@scele
Copy link
Author

scele commented Oct 25, 2018

@KaylaNguyen could you please provide feedback about this PR? I opened it already in February, and still no response.

@KaylaNguyen
Copy link
Contributor

Hi @scele, currently we don't have any plan to maintain this repo. But I can take a look at it in my personal time. Response time will be slow to very slow. I'll see what I can do and get back to you by the end of next week. Thanks for your understanding :)

@KaylaNguyen
Copy link
Contributor

Please wait until the next release to merge with new changes. Ping me when you're ready and I'll export this PR from internal. Thanks!

@scele
Copy link
Author

scele commented Jan 3, 2019

@KaylaNguyen Sorry, I missed your previous comment.. :( I have rebased now, can you take a new look to get this merged? Thanks!

@KaylaNguyen
Copy link
Contributor

Will slowly get back to you by end of next week :) Thank you for your patience.

@@ -22,6 +22,7 @@
import io
import json
import os
import sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logging instead, you can check fast_flatten or fast_pusher for reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my commit message for explanation why I don't want to use logging.
Also, I would like the print to go to stderr, not stdout, because other bazel build status messages also go to stderr:

Loading: 
Loading: 0 packages loaded
    currently loading: foo
Analyzing: target //foo:bar (45 packages loaded)
...

I think the pull status progresses should appear together with these messages in stderr like so:

Loading: 
Loading: 0 packages loaded
    currently loading: foo
Analyzing: target //foo:bar (45 packages loaded)
Downloading from gcr.io/tensorflow/tensorflow:latest (1/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (2/12)
Downloading from gcr.io/tensorflow/tensorflow:latest (3/12)
...

If the output goes to stdout, then these status messages would clutter run_log.txt in the following invocation:

bazel run //foo:bar 2>build_log.txt >run_log.txt

@@ -144,7 +145,8 @@ def tarball(name, image,
def fast(image,
directory,
threads = 1,
cache_directory = None):
cache_directory = None,
print_progress = False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update Args section below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Add minimal prints to indicate puller progress if --print-progress
flag is passed to fast puller.  The intention is that the bazel
container_pull workspace rules could eventually output something,
instead of bazel getting seemingly stuck for 10 minutes when
pulling an image that is many gigabytes in size.

We are intentionally not using the logging facility and the existing
--stderrthreshold argument to display these prints, because that
will produce log-formatted output that looks too detailed when inlined
with other bazel output.  The intention is to show user-friendly
progress instead:

   Downloading from gcr.io/tensorflow/tensorflow:latest (layer 1/12)
   Downloading from gcr.io/tensorflow/tensorflow:latest (layer 2/12)
   Downloading from gcr.io/tensorflow/tensorflow:latest (layer 3/12)
   Downloading from gcr.io/tensorflow/tensorflow:latest (layer 4/12)
   Downloading from gcr.io/tensorflow/tensorflow:latest (layer 5/12)
   ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants