-*- org -*-
HINT: org-mode global cycling: S-TAB
HINT: To show all content (including any drawers), regardless of org-mode startup visibility:
C-u C-u C-u TAB
[The above assumes the default key binding of TAB to org-cycle.]
This is the ‘NEWS’ file for the ‘ads-github-tools’ project. This file contains information on changes since the last release of the package, as well as a running list of high-level changes from previous versions. If critical bugs are found in any of the software, notice of such bugs and the versions in which they were fixed will be noted here, as well.
Corrected a bunch of typos in error messages of the ‘ads-github-repo-create’ program. They formerly read:
ads-github-repo-create (error): newly repo verification:...
and have now been corrected to read:
ads-github-repo-create (error): newly created repo verification:...
Corrected a bunch of typos in error messages of the ‘ads-github-repo-create’ program. They formerly read:
ads-github-repo-create (error): 'foo' attribute on newly create repo...
and have now been corrected to read:
ads-github-repo-create (error): 'foo' attribute on newly created repo...
(None)
The concurrency features added to the ‘ads-github-cache’ tool in the ads-github-tools-0.3.4 release (see issues #80 and #81) inadvertently broke some operations that emitted cached data. Data was not only concurrently fetched for updating the cache, but was sometimes emitted concurrently, as well, causing the output to be garbled (mainly in the form of JSON that was no longer well-formed due to being intermixed from multiple emitting background processes). This has been fixed. The data is still fetched concurrently (from the upstream GitHub v3 API endpoints), but emitting the data from cached objects now happens sequentially as one would expect. This is mainly relevant for paths that represent “paged collections”, where multiple pages of data are available.
The ‘HACKING’ file has been updated to reflect the versions of the GNU Autotools currently being used for the ‘ads-github-tools’ project. All developers working on the project need to be using the same versions of the Autotools programs in the toolchain. Currently (as of 2022-10), those versions are:
automake 1.16.3 autoconf 2.69
The build process itself was updated back in issue #79 (part of the ‘ads-github-tools-0.3.4’ release), but those earlier changes neglected to update the documentation in the ‘HACKING’ file.
The docs have also been improved by noting where source tarballs may be obtained, which is helpful for side-stepping the need to even have the GNU Autotools installed.
The ‘ads-github-repo-create’ tool now has an ‘–organization=NAME’ option that can be used to create the repository in the specified (existing) GitHub Organization (as opposed to in the account of the authenticated GitHub user).
The API response emitted by the upstream GitHub v3 API for the endpoint:
/repos/:owner/:repo
changed regarding the ‘visiblity’ field, which broke the verification code in the ‘ads-github-repo-create’ tool that was performing its checks based on the old behavior.
The impact of the breakage for users was that full automated verification of the newly created repo was not possible without the fix provided with the current release of the ‘ads-github-tools’. The repo got created as intended (assuming there were no other problems); it was only the automated verification checking that choked.
Specifically, when the ‘ads-github-repo-create’ program was originally implemented a few years ago, support for the ‘visibility’ field was a GitHub “opt-in only” experimental feature, and the field would either be omitted or null in the response when not opting in. Fast-forward to the present, and the field seems to be present in the response always. The program’s verification code has therefore been updated accordingly.
The configure-time checks for unused tools ‘groupadd’, ‘useradd’, and ‘usermod’ have been removed.
In earlier versions of the ‘ads-github-tools’ project, the ‘configure’ script did look for a program named ‘git-hub’ on $PATH at configure time, but at runtime the invocation used was that for Git subcommands:
git hub ...
That meant the location of the ‘git-hub’ program found at runtime was dependent upon the user’s current $PATH (and whatever other rules ‘git’ uses internally) rather than the $PATH as it was set at configure time.
This behavior has been changed so that the program is invoked directly with the name ‘git-hub’ rather than as a ‘git’ subcommand.
The only program affected by this change is ‘ads-github-fetch-all-upstreams’. The change does not result in any change to user-visible behavior.
The ‘ads-github-cache’ program now updates (only if needed) the locally cached data for each page of a “paged collection”. This new behavior creates the opportunity for all of the requests to make progress concurrently, since much time involved in such operations is waiting for the remote GitHub v3 API to respond.
Anecdotal testing shows a significant real-world improvement. For example, the author’s invocations of:
$ ads-github-cache --verbose --update
drop from more than two minutes (with the previous behavior) down to between nine and eleven seconds.
See also the entry for issue #81, which describes how the concurrency works by default, and how the user may control its behavior.
This change only affects developers working on the ‘ads-github-tools’ project, and only if they need to re-generate the auto-generated ‘configure’ script, the ‘Makefile.in’ files, or related. The project was previously using Automake and aclocal version 1.16.1, and is now using version 1.16.3 of both.
Please do not submit pull requests (PRs) or send patches with changes generated by other versions of these tools. If in doubt, just leave auto-generated files out of your PR or patch and note that that is the case. Thanks.
The ‘ads-github-cache’ program now accepts a new ‘-j NUM’ (–jobs=NUM) command line option to control the number of outstanding requests that will be made to the GitHub v3 API endpoints when obtaining data for objects represented as “paged collections”.
By default, the number is twice the number of CPUs found to be available at runtime, but with a hard cap of 20 (to avoid making an excessive number of requests on machines that happen to have a larger number of CPUs).
If provided, the user-specified number will be used unconditionally, without any cap (assuming the machine has enough resources available). Please use this feature responsibly.
Note that values of NUM must be an integer greater than zero (the program will reject other values and exit with an error status).
If the value 1 is provided for NUM, the effect will be to eliminate concurrent requests.
This change introduces a new minor utility to the ads-github-tools project: ‘ads-github-nproc’. This new program simply prints on stdout the number of CPUs available to the current process. It provides a common interface to the rest of the ‘ads-github-tools’ programs that may need that data. To accomplish its task, however, the program will attempt to use a number of platform specific techniques. The code is adapted from the ‘AX_COUNT_CPUS’ macro from the GNU Autoconf Archive. See ads-github-nproc(1) and the comments in the code for original authorship of the macro, as well a licensing that is inherited from the original m4 macro source file (though still GPL compatible).
The ‘ads-github-cache’ program now accepts a ‘–cache-dir=DIR’ command line option to specify root of the cache directory. The cache directory is the location into which the ‘ads-github-cache’ program reads and/or writes cached data.
If not specified, the default location for the cache directory will be used:
${HOME}/.ads-github-tools.d/cache/
Prior to this change, the above location was always used unconditionally.
The cache directory to use may also be specified in the ‘ADS_GITHUB_TOOLS_CACHE_DIR’ environment variable. The command line option, if provided, takes precedence over the environment variable.
Using the environment variable has the advantage that the ‘ads-github-cache’ program will honor it even when invoked indirectly by some of the other ‘ads-github-tools’ programs. If relying on this behavior, make sure the environment variable is “exported” in your shell environment.
The specified directory (but not its parent directories) will be created if it does not exist.
The ability to specify the cache directory is provided mainly for purposes of developing and testing the ‘ads-github-tools’ themselves. It is also useful for experimenting and benchmarking.
Also includes a new example of such ad hoc scripting by feeding data from the cache to the jq(1) command for extracting specific pieces of data from the JSON structure, and feeding them into a downstream Unix text processing tools in a shell pipeline to accomlish a task.
It may be surprising to users that cached objects can need updating even if the user has not performed any activity with GitHub, so we now point out the fact in ads-github-cache(1).
The example given is the GitHub ‘/user/repos’ endpoint, which represents all repositories that the authenticated GitHub user has explicit permission (‘:read’, ‘:write’, or ‘:admin’) to access. The cached representation of such objects, therefore, can change if the authenticated user is a member of any GitHub Organizations – even without the user having performed any activitity with GitHub. For example, a new repository could might be added to an Organization to which the user is associated, so the user would then have access to the new repo; a cache update would be needed to have that new repo reflected in the local cache, however.
issue 39: “ads-github-fetch-all-upstreams: default to triangular workflow, but allow override” (reopened)
This issue was originally noted as fixed in ads-github-tools 0.3.1, but a bug was found with the implementation. The ‘–triangular’ and ‘–no-triangular’ options worked as advertized, but the default behavior of the app was inconsistent with what was documented. The documentation indicated that the default behavior was that the tool would use a triangular configuration when cloning a repo if the user did not otherwise specify a preference, and that was indeed the intention.
The program’s behavior has been corrected to be consistent with the documentation in this regard; by default a triangular configuration will be used.
The ‘ads-github-repo-create’ tool shipped with ads-github-tools-0.3.2 (and earlier) did not properly escape JSON strings during new repo creation. When users supplied command line parameters that contained string values that would need such escaping in a JSON document, the program submitted JSON to the GitHub API that was not well-formed. That resulted in an HTTP 400 response, with the response body:
{"message":"Problems parsing JSON","documentation_url":"https://docs.github.com/rest/reference/repos#create-a-repository-for-the-authenticated-user"}
The program has been fixed to properly escape those string values.
Would have expected this issue to have cropped up earlier, but it just happened to be noticed. It would not have been an issue anywhere Bash was available as ‘/bin/bash’, which is common.
Earlier versions of the ‘ads-github-cache’ program compared the ‘Link:’ HTTP response header in a case-sensitive fashion: it recognized ‘Link:’ but not ‘link:’. Sometime in January 2021, GitHub’s API responses relevant to the program started using all lowercase letters for the HTTP header names in at least some responses, which caused the program to error out.
The use of lowercase headers is consistent with RFC 7230 section 3.2 (“Header Fields”), so the bug was in the program (not upstream).
The ‘ads-github-cache’ program has been modified to treat HTTP headers in a case-insensitive fashion.
When attempting to create a new GitHub repo, the service may reject the API call for any number of reasons. The previous version of the command simply exited with an error message indicating the HTTP response code, but in isolation that information is not enough for the user to understand and address the problem (especially in the case of 400 errors).
The program has been improved to provide more information to the user when the repo creation operation fails. It will now dump the HTTP response content to the screen beneath the error messages, which is usually a small JSON document that provides more hints about what the problem was.
issue 69: “parse-netrc build: remove ‘-z config-profile’ from ‘cargo build’ flags (no longer needed)”
In earlier releases of the ‘ads-github-tools’, the ‘cargo-build’ make target used to build the ‘parse-netrc’ program specified the ‘-Z config-profile’ option. Prior to Rust 1.43.0 (released 2020-04-23), that flag was needed to cause profiles to be read from .cargo/config files.
However, the config-profile feature was stabilized as of 1.43.0, and now that Rust (and Cargo) versions newer than 1.42.x are widely deployed, the ‘config-profile’ feature is enabled by default. There is no longer a need to explicitly enable it, and in fact our doing so caused the build to break when using current versions of Cargo.
The ‘-Z config-profile’ option is no longer hard-coded into the build target.
There are a lot of other configure- and build-time improvements in this release made to the Rust-based machinery that produces the ‘parse-netrc’ program. See the Git log for details.
(None)
“implement cache for ‘/user/repos’, other paths” #59
The program was implemented primarily for the internal use by other utilities in the ads-github-tools, but may be used directly from the command line, as well.
One major change is that users will want to get in the habit of periodically invoking:
$ ads-github-cache -v --update
either manually or (more likely) from an automated process (such as a daily cron job). Having the cache available is the difference between some operations being painfully slow or pleasantly snappy on on large collections of resources (e.g., GitHub repos).
Integration of the cache into the various utilities is still at the very early stages. Expect to see regular improvements in this area in subsequent releases. See ads-github-cache(1) for all the gory details.
Usage:
ads-github-cache --update
:
ads-github-cache --update 'https://api.github.com/user/repos' ads-github-cache --update '/user/repos'
:
ads-github-cache --clear '/user/repos'
:
ads-github-cache --clear-all
:
ads-github-cache --get '/user/repos'
:
ads-github-cache --get-cached '/user/repos'
$ ads-github-cache --help usage: ads-github-cache { -h | --help } or: ads-github-cache { -V | --version } or: ads-github-cache [OPTION...] --update [--] [URL_OR_PATH...] or: ads-github-cache [OPTION...] { --get | --get-cached } [--] URL_OR_PATH or: ads-github-cache [OPTION...] --clear [--] [URL_OR_PATH...] or: ads-github-cache [OPTION...] --clear-all
:
Manage a user-specific cache of GitHub v3 API responses.
:
Mandatory arguments to long options are mandatory for short options too.
:
-h, --help Print this help message on stdout -V, --version Print the version of the program on stdout --clear Remove the specified URLs or paths from the cache --clear-all Remove all cached entries, for all cached URLs or paths --get Obtain content of specfied URL or path through the cache --get-cached Like '--get', but error out if item is not already present in cache. Avoids updating the cache to obtain the item. Think "offline". --update Update the cache entry for the specified URLs or paths, or all -v, --verbose Print program progress messages on stderr. Specify multiple times to increase verbosity: info, debug, and tracing (set -x) -- Signals the end of options and disables further options processing. Any remaining argument(s) will be interpretted as a repo name
:
Report bugs to Alan D. Salewski <[email protected]>.
“offline usage: parse GitHub username from ~/.netrc (new tool: parse-netrc)” #60
The motivation for this tool was to augment the new ‘ads-github-cache’ program (see notes about issue #59 in this release, above). While addressing that issue, it was realized that we needed a way to determine the username of the “in-effect GitHub user” without accessing the network, in the same way that curl(1) determines it when using its -n (–netrc) option. This is needed because some GitHub users have multiple GitHub accounts (e.g., work and personal), but access both from a single Unix account. We need to be able to tell them apart so we can keep separate per-GitHub-user caches.
We recently introduced the ads-github-whoami(1) utility[0] to obtain the info about the authenticated GitHub user, but that tool requires network access at runtime. Our cache needs to be able to work entirely offline for some use cases.
[0] see issue #52, and our entry for it in the NEWS section for ads-github-tools 0.3.1
The first approach considered was to just use curl to get the info, either from the ‘curl’ command line utility, or from a custom program written against libcurl. It turns out that the functions related to parsing the user’s ~/.netrc file are not intended as public symbols in libcurl, so we kept looking.
Eventually it was decided that a custom tool would be introduced for the purpose. Enter the new ‘parse-netrc’ utility.
The program currently lives in-tree in the ‘ads-github-tools’ project, but (as suggested by the lack of an “ads-github-” prefix in its name) it is a candidate for extracting from the tree. Time will tell.
Of note is the fact that the ‘parse-netrc’ program is implemented in the Rust programming language, the build of which has been integrated into our GNU Autotools-based build.
In the Rust ecosystem, the primary dependency management and build-orchestration tool is named “Cargo”. It is distributed together with the ‘rustc’ compiler.
Our build integration incorporates use of the Cargo tool, and during development it is the primary tool used when directly working on the ‘parse-netrc’ program (or any other Rust-based program that we might happen to introduce). In short, once your build tree is configured, everything “just works” in the same way as with a stand-alone Rust project that only uses Cargo. Experienced Rust developers should feel right at home.
Nevertheless, Cargo is a developer-focused tool. It is intended for use by software developers working on Rust-based projects, and who are interested in either consuming ecosystem libraries, or publishing new libraries and programs into that ecosystem (or both). Its audience is software developers, and software developers only. It sometimes gets pressed into service, awkwardly, for other purposes, but we avoid doing that here.
While working on the code base, we are wearing our developer hats. When operating in that role we use Cargo’s abilities just like any other Rust project.
When packaging the ‘ads-github-tools’ for distribution, however, we switch hats. The perspectives of the sysadmin and end-user take precedence.
Toward that end, we want to isolate uses from the peculiarities of any particular language tooling as possible. Sysadmins and end-users that wish to build, install, and/or use the ads-github-tools should not need to understand anything about Rust or Cargo other than that they need to have the programs installed as prerequisites. Once the small number of documented prerequisites are installed, we do not want the build process to then reach out to the Internet and download a bunch of other stuff.
In that spirit, our incorporation of the Rust ecosystem involves “vendoring” the source code for our third-party libraries. That means copies of those source code artifacts live “in-tree” (beneath the directory ‘src/third-party/cargo-vendored’), and configuration is provided that causes the ‘cargo’ build orchestration tool to reference only those copies at build time. No network access is needed or even desired; everything you need to build the ads-github-tools (that is not a documented prerequisite) is included in the distribution tarball.
All of that means that the ‘./configure’, ‘make’, and ‘make install’ recipe that we all know and love will continue to “just work”. No new language-inflicted or tool-inflicted weirdness is introduced. Unless you consider the above weird :-)
It also means that when we eventually get around to writing the parse-netrc(1) manpage, it will automatically get installed in the normal way (without introducing yet more language-specific tooling).
Finally, it’s only appropriate to include a mention here that the parsing of the netrc file is accomplished via the use of the third-party ‘netrc’ crate:
http://yuhta.github.io/netrc-rs/doc/netrc/index.html
The library cannot yet parse netrc files that have comments in them, but the rest of the needed core funcionality is there. In any event, it allowed us to whip-up the ‘parse-netrc’ program with minimal effort, and we appreciate having the library available.
$ parse-netrc --help usage: parse-netrc { -h | --help } or: parse-netrc { -V | --version } or: parse-netrc [OPTION...] { -u USER | --user=USER } [--] HOSTNAME
:
Extract and print fields from matching netrc record, if any.
:
Mandatory arguments to long options are mandatory for short options too.
:
-h, --help Print this help message on stdout -V, --version Print the version of the program on stdout -u, --user=USER Require match of USER in matched netrc record -v, --verbose Print program progress messages on stderr. Specify multiple times to increase verbosity: info, debug, and tracing -- Signals the end of options and disables further options processing. Any remaining argument(s) will be interpretted as a hostname
:
Report bugs to Alan D. Salewski <[email protected]>.
The ‘ads-github-fetch-all-upstreams’ program has been enhanced to leverage the new ads-github-tools cache.
BACKWARD_INCOMPATIBILITY: The default behavior of the program has been changed in that the offline cached data will be used by default. Users who have not yet initialized the cache will see an error:
$ ads-github-fetch-all-upstreams -vcu ads-github-cache (error): "paged collection" metadata not found in cache for: /user/repos ads-github-fetch-all-upstreams (error): was unable to obtain /user/repos data from cache; bailing out
The fix is easy: just update the cache and then rerun:
$ ads-github-cache -v --update [takes a while the first time]
:
$ ads-github-fetch-all-upstreams -vcu [note how fast it is!]
While manually updating the cache as shown above works well enough for very casual usage, if the cache is not up-to-date (enough) then the problem still exists that you have wait a painfully long time for something to run before ads-github-fetch-all-upstreams does what you want it to do.
What you really want is for the cached data to be ready and waiting, so that when you whip out ads-github-fetch-all-upstreams it can operate quickly. Toward that end, we expect most users will use a cron job (or similar) to update the cache once or twice a day (or maybe less frequently).
There is a new ‘–cache-mode=MODE’ command line option that allows the user to select which caching behavior is desired: ‘offline’ (the default), ‘online’, or ‘none’.
If you just want to get the old, slow behavior, say ‘–cache-mode=none’.
See ads-github-fetch-all-upstreams(1) for the full scoop.
issue 33: ‘ads-github-fetch-all-upstreams -vcu foo*’ leads to ‘jq: error (at <stdin>:3): Cannot index string with string “owner”’
The ads-github-fetch-all-upstreams(1) tool was previously using the default behavior of curl(1) to invoke GitHub v3 API endpoints and emit the retrieved responses on stdout, feeding that output in Unix pipelines to other programs (mostly to jq(1)).
While that approach was fast and convenient when originally implemented, it also had the downside of preventing the program from properly detecting all failures because it could not check the HTTP response codes.
The program now follows the pattern (used elsehwere in the ‘ads-github-tools’) of configuring ‘curl’ to write the response body to a temporary file named by a well-known global variable, and emit the HTTP response code on stdout. The calling code is thus able to check the ‘curl’ exit status, the emitted HTTP response code, and the size and content of the response body independently of one another. This makes the program more robust in general, and avoids complaints emitting from the downstream jq(1) process that should have been handled upstream of it.
issue 34: ‘ads-github-fetch-all-upstreams -vcu foo results in “line 1067: test: 7: unary operator expected” when github host is inaccessible’
This issue was partially fixed by the above changes for issue #33, but a check specific to this issue has been added to verify that the obtained page count value contains all digits. This keeps any emitted error messages as close as possible to the source of the problem.
Consistent with the ‘ads-github-tools-0.3.0’ release, when initially cloning a repo the ‘ads-github-fetch-all-upstreams’ program continues to use a “triangular” workflow by default.
The manpage entry explains:
–triangular –no-triangular In newly cloned repos, use (or avoid using) a Git remote configuration suitable for supporting a triangular workflow.
Among other things, a triangular config will cause the default branch in the local working directory to be configured to track the branch of the same name from the upstream remote.
For a non-triangular configuration, the default branch in the local working directory will be configured to track the branch of the same name from the origin remote.
See git-hub(1) for details.
If neither “–triangular” nor “–no-triangular” is specified, then the “hub.triangular” setting from git-config(1) (scopes “global” and “system” only) is consulted, and honored if present. Finally, if neither command line options nor configuration specifies a preference, then a triangular configuration is the default.
Note: Whether a triangular or non-triangular configuration is used is purely a matter of preference for the user (if any) that will be working with the newly cloned repo; it has no bearing on the functionality of the “ads-github-fetch-all-upstreams” program.
The ‘ads-github-show-rate-limits’ program is now careful to check not only the curl(1) exit status of the ‘GET /rate_limit’ request, but also the HTTP response status. Previously it was obtaining the JSON response on the curl(1) stdout, and had no way to determine what the HTTP response code was. The new behavior is to follow our pattern of saving the raw response output to a temporary file, and having curl write the HTTP response code to stdout; this allows us to deal with both aspects independently.
When the ‘GET /rate_limit’ call results in a non-success status, the program now emits and error message and exits with a non-zero status, as one would expect. If a JSON response payload was returned, it will be printed beneath the error message to provide additional context for the failure.
Ex. 1:
ads-github-show-rate-limits (error): HTTP response code was: "401"; expected 200 ("OK"); bailing out HTTP response payload may contain additional info: {"message":"Bad credentials","documentation_url":"https://docs.github.com/rest"}
Ex. 2:
ads-github-show-rate-limits (error): HTTP response code was: "404"; expected 200 ("OK"); bailing out HTTP response payload may contain additional info: {"message":"Not Found","documentation_url":"https://docs.github.com/rest"}
If no response payload was returned, that fact will be explicitly stated, as well:
Ex. 3:
ads-github-show-rate-limits (error): HTTP response code was: "500"; expected 200 ("OK"); bailing out (HTTP response payload was empty)
“git config: honor ‘hub.upstreamremote’ and ‘hub.forkremote’, if set” #46
When clone a repo initially the default behavior at some point seemed to change from creating a Git remote named “origin” for the user’s GitHub repo to creating a Git remote named “fork”. This was due to a behavior change in the underlying git-hub(1) tool, which switched its behavior when the tool went verion 1.0 some time ago. Users of ‘ads-github-tools’ could influence the behavior of that tool by setting the ‘hub.forkremote’ and ‘hub.upstreamremote’ git-config(1) values, but in the absence of such a config the user still got default behavior that was different from what it had been.
The ‘ads-github-fetch-all-upstreams’ program has been modified to look for and honor the config mentioned above, but when not present it now asserts a preference for the name “origin” rather than “fork” when cloning a repo. It also asserts a preference for the name “upstream”, but that name is in alignment with the git-hub(1) default already.
Note that the changes introduced here cure the immediate problem of having Git remotes with the name “fork”, but the solution as implemented is not entirely general. In particular, the name “upstream” for the other Git remote is baked-in at a fairly deep level, and will need additional surgery to use an arbitrary name. Thus, if the ‘hub.upstreamremote’ setting is present and contains a value other than “upstream”, then when you run:
$ ads-github-fetch-all-upstreams -vcu
you will actually end up with three remotes rather than two for newly cloned repos. The program will create a remote named “upstream” in addition to whatever other name is specified. While this is clearly suboptimal, there have not been any problems with using the hard-coded name “upstream”; as a practical matter, it is not an emergency to address that. For best results, stick with the name “upstream” for now.
This was a minor in-tree cleanup activity. It was noticed that the *.in template files had their executable bit set, which was misleading. Those files are not intended to be directly executed, but rather are filtered at build time to produce the programs that are directly executable. This was probably “origin cruft”; each of the programs was probably first implemented as a shell script which was then parameterized and otherwise modified to create the *.in template.
GitHub has officially deprecated password-based authentication, and it will be disabled entirely on [2020-11-13 Fri]:
https://developer.github.com/changes/2020-02-14-deprecating-password-auth/
The ‘ads-github-tools’ programs were reviewed to determine the impact, if any, that those deprecations might have. The are not any application behavioral changes related to deprecations because the ads-github-tools were not using any of the deprecated authn or authz APIs.
However, anyone that is using their GitHub user password in their ~/.netrc file will need to change their config to use what GitHub calls a “personal access token”, instead. There are no known users that are not using a personal access token already, but if you happen to be one you will want to check out the official GitHub docs on the subject:
The executive summary is that you use a “personal access token” in basically the same way that you would use a password: you put it in your ~/.netrc file as the value for the ‘password’ field. It is a better approach than using the password itself, though, because the token can be configured to have more constrained permissions than the password; you decide what it can do at the time you create the token. (You can also delete and replace a personal access token with less disruption then you can change the password on your account if that password were used by various tools.)
You may wish to create a token that is not able to perform certain actions (e.g., deleting repositories). That will have a minor impact on the ads-github-tools, but they will simply tell you when they are not able to perform some action.
This new tool allows the user to create new GitHub repositories while working at the command line. The tool’s ‘–help’ output and sample usage below give a feel for what it can do:
$ ./src/main/bash/bin/ads-github-repo-create --help usage: ads-github-repo-create { -h | --help } or: ads-github-repo-create { -V | --version } or: ads-github-repo-create [OPTION...] [--] REPO_NAME
:
Create a new GitHub repository.
:
Mandatory arguments to long options are mandatory for short options too.
:
-h, --help Print this help message on stdout -V, --version Print the version of the program on stdout -a, --auto-init Auto-initialize the repo with a minimal README.md file --prop-secs=NUMBER Seconds to wait for GitHub data propagation before verifying Default is 4 seconds. Only relevant when -a (--auto-init) is in-effect. -b, --default-branch=NAME Specify the name to use for the default branch. Implies -a (--auto-init). -d, --description=BLURB Specify a short description blurb for the repo --force-replace If repo already exists, attempt to replace it (delete, then re-create; requires 'delete_repo' scope) --homepage=URL Specify a URL with more information about the repo -I, --disable-issues Disable GitHub issues for the new repo -P, --disable-projects Disable GitHub projects for the new repo -O, --output-format=WORD Emit output in the format specified by WORD [default: text] Valid values for WORD include: 'text' and 'json' -p, --private Make the repository a GitHub private repo (default is public) -v, --verbose Print program progress messages on stderr. Specify multiple times to increase verbosity: info, debug, and tracing (set -x) -W, --disable-wiki Disable GitHub wiki feature for the new repo -- Signals the end of options and disables further options processing. Any remaining argument(s) will be interpretted as a repo name
:
Report bugs to Alan D. Salewski <[email protected]>.
$ ads-github-repo-create 'aljunk-testing-repo-006' Summary of newly created repository: Name: aljunk-testing-repo-006 Is fork?: no Owner: salewski Is archived?: no Is disabled?: no
:
Created at: 2020-10-09 07:09:52+0000 Updated at: 2020-10-09 07:09:56+0000
:
Is private?: no Visibility: -
:
Auto-initialized?: no Default branch: main
:
Has issues?: yes Has projects?: yes Has wiki?: yes
:
GitHub: https://github.com/salewski/aljunk-testing-repo-006 Homepage: - Description: -
:
Clone via ssh: [email protected]:salewski/aljunk-testing-repo-006.git Clone via https: https://github.com/salewski/aljunk-testing-repo-006.git
This issue documents a change in direction we took while designing the ‘ads-github-repo-create’ tool. Originally the tool always auto-initialized the new repo, but that behavior interfered with use cases in which the GitHub repo was being created as the target for some existing Git repository (e.g., when mirroring a Git repo from elsewhere into GitHub). Consequently, the auto-initialize behavior was made optional, with the tradeoff being that the user needs to be cognizant of whether auto-initialization will be helpful to or an annoyance for the task at hand.
The old contact email address ‘[email protected]’ should still work, but the preferred address to use going forward is ‘[email protected]’. Note that the author has experienced a fair amount of mail lossage this year (2020) due to overly-aggressive mail filtering by the AT&T mail system; lots of legit mail does not get through, and there is no indication of the blockage. For best results, use the new address.
The new ‘ads-github-whoami’ tool shows information about the authenticated GitHub user.
By default, it simply emits the username (a.k.a. “owner” or “login”):
$ ads-github-whoami salewski
The tool has a ‘-l’ (–long) option that can be specified more than once to request increasingly verbose amounts of information. An option is also provided to request that the raw JSON output be emitted.
Currently five “long” levels are supported: none, “1x”, “2x”, and “3x”, and “4x”. Additional ‘-l’ opts are accepted, but (currently) anything more than four will not affect the output produced.
The behavior of this option allows us to optimize for the common case, yet still keep noisy-looking, less-often-valuable information within easy reach.
Note that the availability of the data displayed depends on the “scopes” of access associated with the GitHub “personal access token” used to authenticate. There are publicly visible fields (the type of thing you see on a given user’s GitHub page), and fields that are private. Reading the data values for the private fields requires the ‘user:read’ access scope. In the examples that follow, that access has NOT been provided. Users need to decide for themselves which access are appropriate for their individual use cases. As you can see in the examples that follow, the ‘ads-github-whoami’ tool will simply display placeholder values for those fields that are not accessible.
Emits just the GitHub username (a.k.a. “owner” or “login”).
$ ads-github-whoami salewski
In addition to the GitHub username, also show basic contact information for user, if available.
$ ads-github-whoami -l salewski 999999 "Alan D. Salewski" <[email protected]>
In addition to basic contact information for the user, also show (if available) the user’s basic GitHub stats.
$ ads-github-whoami -ll salewski 999999 "Alan D. Salewski" <[email protected]> mfa enabled: [DATA NOT AVAILABLE] ───────────────────────────────────────────────────── public repos: 1234 private repos: - (owned: -) public gists: 20 private gists: - followers: 23 following: 239 collaborators: - plan space: - disk usage: -
In addition to the user’s basic contact information and basic GitHub stats, also show (if available) location, website, organization affiliation, and bio.
$ ads-github-whoami -lll salewski 999999 "Alan D. Salewski" <[email protected]> mfa enabled: [DATA NOT AVAILABLE] ───────────────────────────────────────────────────── public repos: 1234 private repos: - (owned: -) public gists: 20 private gists: - followers: 23 following: 239 collaborators: - plan space: - disk usage: - ───────────────────────────────────────────────────── company: - hireable: - location: - github: https://github.com/salewski website: https://salewski.github.io/ bio: -
In addition to the user’s basic contact information, basic GitHub stats, and bio info, also show (if available) GitHub account metadata that is less frequently interesting.
$ ads-github-whoami -llll salewski 997214 "Alan D. Salewski" <[email protected]> mfa enabled: [DATA NOT AVAILABLE] ───────────────────────────────────────────────────── public repos: 1234 private repos: - (owned: -) public gists: 20 private gists: - followers: 23 following: 239 collaborators: - plan space: - disk usage: - ───────────────────────────────────────────────────── company: - hireable: - location: - github: https://github.com/salewski website: https://salewski.github.io/ bio: - ───────────────────────────────────────────────────── created at: 2011-08-22 20:17:54+0000 updated at: 2011-08-22 20:17:54+0000 plan name: - node id: MBCDabcdWXYZwxyz== site admin: no
This change only impacts developers working on the ‘ads-github-tools’.
If a repository is explicitly named on the ‘ads-github-merge-all-upstreams’ command line and is not actually processed, the program now emits an error message and exits with a non-zero status.
This helps avoid the false-warm-and-fuzzy from silent non-action when the user fat-fingers one or more repo names on the command line.
This issue was due to our use of the underlying git-hub(1) used when cloning non-fork GitHub repos.
When cloning a repo, the default behavior of git-hub(1) is to create a triangular workflow configuration, which is usually what you want when there is and upstream repo from which your origin repo was forked. However, if the repo is not a GitHub fork of some other GitHub repo, then we need to tell git-hub(1) to not try to create a triangular workflow config; attempting to do so will result in the error:
Warning: Repository username/some-repo is not a fork, just cloning, upstream will not be set
usage: git-hub [-h] [--version] [-v] [-s] {clone,issue,pull,setup} ...
git-hub: error: Can't use triangular workflow without an upstream repo
ads-github-fetch-all-upstreams (error): was error while attempting to clone repo "some-repo" from GitHub; bailing out
The ‘ads-github-fetch-all-upstreams’ program is now careful to provide the ‘–no-triangular’ option to git-hub’s ‘clone’ sub-command when cloning a non-fork repo.
The ‘ads-github-show-rate-limits’ program has a new ‘-u’ (–utc) option that allows the user to request that time fields be output in human-readable (RFC 3339) formatted UTC (as opposed to the default seconds since the Unix epoch, or (with the ‘-h’ (–human-readable) option) human-readable local time.
See ads-github-show-rate-limits(1) for details.
The ‘ads-github-show-rate-limits’ program has a new ‘-O’ (–output-format=WORD) option that allows the user to request different output formats. Currently the only two output formats accepted are ‘text’ (the default) and ‘json’.
The default output of the program has not changed. It is still whitespace-separated columns of text. Specifying the new ‘–output-format=text’ option is just an explicit way of requesting the default behavior.
See ads-github-show-rate-limits(1) for full details.
Some examples:
- Example 1a: Default output
$ ads-github-show-rate-limits core 5000 5000 1489368062 graphql 200 200 1489368062 search 30 30 1489364522
- Example 1b: Same thing, only explicitly using the new ‘–output-format=text’ option:
$ ads-github-show-rate-limits --output-format text core 5000 5000 1489368062 graphql 200 200 1489368062 search 30 30 1489364522
- Example 2a: Requesting JSON ouput:
$ ads-github-show-rate-limits --output-format json {"resources":{"core":{"limit":5000,"remaining":5000,"reset":1489368065},"search":{"limit":30,"remaining":30,"reset":1489364525},"graphql":{"limit":200,"remaining":200,"reset":1489368065}},"rate":{"limit":5000,"remaining":5000,"reset":1489368065}}
- Example 2b: Requesting JSON ouput, and then using jq(1) to pretty-print it:
$ ads-github-show-rate-limits --output-format json | jq '.' { "resources": { "core": { "limit": 5000, "remaining": 5000, "reset": 1489368082 }, "search": { "limit": 30, "remaining": 30, "reset": 1489364542 }, "graphql": { "limit": 200, "remaining": 200, "reset": 1489368082 } }, "rate": { "limit": 5000, "remaining": 5000, "reset": 1489368082 } }
issue 25: ads-github-show-rate-limits(1) indicates data is parsed from HTTP headers, but it is really parsed from the JSON in the HTTP body response of the …/rate_limit endpoint
The ads-github-show-rate-limits(1) man page previously mis-documented the underlying mechanism used by the tool to obtain the user’s GitHub API rate limits information. It was documented to obtain that information from the HTTP response headers rather than the JSON response body from the ‘/rate_limit’ service endpoint.
While per-endpoint calls do allow for the possibility of extracting GitHub service limit information out of the HTTP response headers (e.g., see the “Rate Limiting” section of the GitHub API documentation), the ‘ads-github-tools-show-rate-limits’ program is actually making a call to the GitHub /rate_limit API endpoint and extracting the data that it displays out of the JSON returned in the HTTP response body.
The ‘ads-github-fetch-all-upstreams’ program now unconditionally unsets the GREP_OPTIONS environment variable.
This only has an effect if the grep program in use is GNU grep; other grep implementations did not recognize or alter their behavior based on the ‘GREP_OPTIONS’ variable. GNU grep prior to version 2.11 (~2014-11) would append values from this variable to the command line, which would make behavior of our invocations in the current program unpredictable. Versions of GNU grep 2.11 or newer no longer behave that way, but do emit a warning on stderr about the change in behavior.
Unsetting GREP_OPTIONS here has the effect of making our grep invocations predictable (when older versions of GNU grep are in use) and also of suppressing the spurious warning:
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
when newer versions of GNU grep are in use.
Application of the same improvement as described above for ads-github-merge-all-upstreams, only here for the ads-github-merge-all-upstreams tool.
This was a cosmetic issue with the documentation.
The project is now developed using GNU Automake 1.16 (rather than version 1.14).
The currently used version of GNU Autoconf (2.69) has not changed, but a crufty reference to an older version in our ‘bootstrap’ script has been cleaned up.
Fixed typo in log message.
Prior to this fix the program choked (when the repo directory was initially in a “detached HEAD” state) upon attempting to restore the working directory to the state it was in prior to the program switching to the default branch. This happened because the code only knew how to deal with a working directory “attached” to a named branch. Since the repo was originally in the detached HEAD state, the program did not have a named branch to which it could switch back.
Rather than attempt to switch back to an explicitly named branch, the code now basically does this:
git checkout @{-1}
which you can read as: “where HEAD used to be one move ago (before we checked-out the default branch)”.
The impact of the original problem was that the ‘ads-github-merge-all-upstreams’ program stopped dead in its tracks and left the repo directory with the default branch checked out.
The workaround was to simply re-run the program leaving that repo with the default branch checked out. Because the program then found the repo with the default branch already checked out, it did not attempt to swicth the branch, so also did not need to attempt to restore it to its original state. Once the program completed, the user could then manually check out the tag (or whatever) to then put the repo back into the original detached HEAD state (if so desired).
With this fix, such workarounds are no longer needed.
Replaced three invocations of the git porcelain command ‘git branch’ with (as appropriate) invocations of git plumbing commands:
git symbolic-ref --short --quiet HEAD
or:
git for-each-ref --format='%(refname:short)' 'refs/heads/'
or:
git for-each-ref --format='%(refname:short)' 'refs/remotes/upstream/'
A ‘-m’ (–missing-only) command line option has been added to the ‘ads-github-fetch-all-upstreams’ program. This new option provides a shorthand way of invoking:
$ ads-github-fetch-all-upstreams -c <MISSING_REPO>...
while at the same time limiting the program’s behavior to only missing repos. And it does so without the user having to explicitly name (or even know) the values for each missing repository name.
In this context, a “missing repository” has the same meaning as it does for the ‘-c’ (–clone-if-missing) option. That is: any repository from GitHub where the local git working directory is not present.
The following is the equivalent invocation of the above:
$ ads-github-fetch-all-upstreams -m
The semantics of:
$ ads-github-fetch-all-upstreams -m <REPO>...
limits the operation of the program to only those explicitly named repos, and will clone them only if they are missing (that is, the program will not fetch changes from the corresponding upstream repo when the local repo is already present; see “Design Notes” section below for more on the rationale for this behavior). If any of the explicitly named repos are already present, then the program will print an error message and exit with a non-zero status at the end (after cloning all of the other explicitly named repos (if any)). This is intended to make the user aware of any repo names fat-fingered on the command line.
Note that the ‘-m’ (–missing-only) option implies ‘-c’ (–clone-if-missing), so the ‘-c’ in the following is redundant (but innocuous):
$ ads-github-fetch-all-upstreams -m -c
Said another way, the ‘–missing-only’ option limits the behavior of the program to operate only on repos that are missing, so the behavior of the program when ‘–missing-only’ is specified is meaningless unless ‘–clone-if-missing’ were also implied.
The author considered making the semantics of the ‘–missing-only’ option, when provided with explicitly named repo names on the command line, be “fetch (or clone) these particular repos plus any repos that are missing”.
That approach would be consistent with the behavior of:
$ ads-github-fetch-all-upstreams -c <NONMISSING_REPO>...
insofar as the repos would be cloned if they were missing, but fetched otherwise. That idea was ultimately rejected because it would fetch updates for the repo if its working directory happened to be present. That’s not “missing only” behavior, so would make the behavior of ‘–missing-only’ (or whatever the option would be named) more difficult to explain and comprehend. It’s also not the common case itch being scratched.
This decision does mean, though, that there is currently no way to express “fetch (or clone) these particular repos plus any repos that are missing” when one or more of the explicitly named repos is already present, yet still have the program exit with a zero exit status. This use case can, of course, be easily accommodated by using two separate invocations of the program:
$ # first clone (as fast as possible) any missing repos... $ ads-github-fetch-all-upstreams -m
:
$ # ...and then fetch changes for explicitly named repos (note no need for '-c' here) $ ads-github-fetch-all-upstreams <REPO>...
The second invocation in that example will be an elaborate NOOP when all of the explicitly named repos were freshly cloned by the first invocation.
If this example represents your common use case and you find this behavior limiting or otherwise annoying, please contact the author as outlined in the BUGS file.
In the version of `’ads-github-fetch-all-upstreams’` shipped with `ads-github-tools-0.1.1`, when a user fat-fingered more than one repo name on the command line, the program printed an error message that named only the first unprocessed repo name that was detected.
The current version of the program improves on that behavior to name all unprocessed repo names. This gives a better indication to the user about the size of the error, and also gives the user the opportunity to correct all such errors at once (rather than playing whack-a-mole fixing one at a time and having successive runs of the program fail on the remaining repos that were not named in the error message).
issue 17: ads-github-merge-all-upstreams: stderr message on failed ff merge indicates “bailing out”, which is not always true
In the version of ‘ads-github-merge-all-upstreams’ distributed as part of ads-github-tools-0.1.0, the tool would emit an incorrect “bailing out” message when the tool’s ‘-k’ (–keep-going) option was speecified and a fast-forward merge is not possible for some reason:
$ ads-github-merge-all-upstreams -vkpp ... ads-github-merge-all-upstreams (info): [repo: "foobar"] currently checked-out branch ("master") is the default branch (no need for checkout) fatal: Not possible to fast-forward, aborting. ads-github-merge-all-upstreams (error): [repo: "foobar"] was error while attempting to merge 'upstream/master'; bailing out ads-github-merge-all-upstreams (warning): '-k' (--keep-going) specified; continuing ads-github-merge-all-upstreams (info): [repo: "bazqux"] currently checked-out branch ("master") is the default branch (no need for checkout) Already up-to-date. ...
That message is correct for the default behavior of the program (when the ‘-k’ option is not specified, but when the ‘-k’ option is specified, the message should not have indicated that the program was “bailing out”; it was not correct, and gave the impression that the error detection is not working correctly.
The program has been fixed to emit “context aware” messages in these scenarios. With the default invocation (no command line options specified) is the same as it was previously:
$ ads-github-merge-all-upstreams ... fatal: Not possible to fast-forward, aborting. ads-github-merge-all-upstreams (error): [repo: "foobar"] was error while attempting to merge 'upstream/master'; bailing out
:
$ echo $? 1
When the ‘-k’ (–keep-going) option is specified, the error message related to the failed fast-forward merge no longer indicates that the program is “bailing out”:
$ ads-github-merge-all-upstreams -k fatal: Not possible to fast-forward, aborting. ads-github-merge-all-upstreams (error): [repo: "serverless"] was error while attempting to merge 'upstream/master' ads-github-merge-all-upstreams (warning): '-k' (--keep-going) specified; continuing ... ads-github-merge-all-upstreams (error): one or more errors encountered with '-k' (--keep-going) specified (see error output above for details); bailing out
:
$ echo $? 1
Note that the error message from the ‘git merge –ff-only …’ command invoked internally still indicates that it is “aborting”. We hope that it is clear from context that that is not coming directly from the ‘ads-github-merge-all-upstreams’ tool and that users are not confused by it.
issue 18: ads-github-merge-all-upstreams: suppress “no upstream changes to merge” messages unless verbose output requested
The default output of the ‘ads-github-merge-all-upstreams’ program (that is, when the program is invoked with no command line options) was previously too chatty; it was emitting a bunch of “no upstream changes to merge” messages:
... ads-github-merge-all-upstreams (info): [repo: "foo"] no upstream changes to merge, and "aggressive push" (a al '-p -p') not specified; skipping (okay) ads-github-merge-all-upstreams (info): [repo: "bar"] no upstream changes to merge, and "aggressive push" (a al '-p -p') not specified; skipping (okay) ads-github-merge-all-upstreams (info): [repo: "baz"] no upstream changes to merge, and "aggressive push" (a al '-p -p') not specified; skipping (okay) ...
The program now only emits those messages when the user requests verbose output via ‘-v’ (–verbose) option.
issue 16: ads-github-merge-all-upstreams: use ‘git push origin heads/<BRANCH>’ to avoid ambiguity with tags of the same name as <BRANCH>
If a repository had a tag name that matched the default branch name (typically “master”), then the previous version of the program would not be able to push the merged changes to the origin repository because the git refspec used internally was ambiguous. The program has been enhanced to qualify the default branch name as “heads/<BRANCH>” rather than just “<BRANCH>” to remove this ambiguity.
- ads-github-normalize-url
Produces a “normalized” view of a given URL, suitable for use in generating an ID. Currently is a quick ‘n dirty implementation optimized for this sole purpose, so there’s no guarantee that the normalized variation of the URL will actually work
- ads-github-hash-url
Similar in spirit to ‘git-hash-object(1)’, this tool takes a (presumably normalized) URL and emits a checksum for it. Currently uses the SHA-3 256-bit algorithm variant.
- ads-github-show-rate-limits
Shows the user’s GitHub API rate limits (“core” and “search”).
- ads-github-fetch-all-upstreams
Operates on the working directories of a collection of GitHub-hosted git repositories. The user can specify one or more repositories explicitly to restrict operations to just those repos. Each that is found with an ‘upstream’ remote defined will have ‘git fetch upstream’ invoked in it.
- With the ‘–clone-if-missing’ option, any of the user’s GitHub repos for which there is not a git working directory beneath the current location will be cloned (using the ‘git-hub’ tool’s ‘clone’ operation, which sets up the ‘upstream’ remote if the repo is a fork).
- There’s also an ‘–upstream-remote-if-missing’ option that will add the ‘upstream’ remote on existing project working directories that do not have it (only if the project is a fork of another project, of course).
- ads-github-merge-all-upstreams
Operates on the working directories of a collection of GitHub-hosted git repositories. Each that is found with both ‘origin’ and ‘upstream’ remotes defined will have:
git merge --ff-only upstream/<DEFAULT_BRANCH_NAME>
invoked in it. The user can specify one or more repositories explicitly to restrict operations to just those repos. The program is careful to sanity check the local repository before attempting any operations on it. Also, it will skip any repository for which the git index has any changes recorded. Will (temporarily) check out the default branch before merging (if the working directory happens to have some other branch checked out); will restore the originally checked out branch when done if the temporary switch was necessary.
- With the ‘–push’ option, will invoke:
git push origin <DEFAULT_BRANCH_NAME>
for each repo.
- With the ‘–push’ option, will invoke: