Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fetch from multiple remotes #4923

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

mateon1
Copy link

@mateon1 mateon1 commented Nov 20, 2024

See #4920 for rationale.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes
    • I have updated existing tests affected by this change

@mateon1 mateon1 force-pushed the multifetch branch 2 times, most recently from e81be24 to bbb44a7 Compare November 20, 2024 00:11
@mateon1
Copy link
Author

mateon1 commented Nov 20, 2024

Here's how fetching looks like in the situation given in #4920 with this PR:
(--all-remotes, or passing --remote b --remote c in either order give identical results)

[nix-shell:/tmp/a]$ jj git fetch --all-remotes
bookmark: master@c [updated] untracked
bookmark: pr@b     [deleted] untracked

[nix-shell:/tmp/a]$ jj log -r 'all()'
◆    xqknuwot [email protected] 2024-11-19 17:21:30 master@c bdd92bb1
├─╮  (empty) Merge remote-tracking branch 'b/pr'
│ ◆  mszwrokv [email protected] 2024-11-19 17:08:09 1780e38a
│ │  work
◆ │  qrmnvrwp [email protected] 2024-11-19 17:15:43 8ac29516
├─╯  other
◆  yrpzrykw [email protected] 2024-11-19 17:06:57 master@b 103e2828
│  (empty) Initial commit
│ @  pznkzwlt [email protected] 2024-11-19 17:05:48 47efbbb4
├─╯  (empty) (no description set)
◆  zzzzzzzz root() 00000000

[nix-shell:/tmp/a]$ jj op log

[nix-shell:/tmp/a]$ jj op show 37dfff3a32f5
37dfff3a32f5 mateon@ceres 1 minute ago, lasted 7 milliseconds
fetch from git remote(s) b,c
args: jj git fetch --all-remotes

Changed commits:
○  Change xqknuwottovp
   + xqknuwot bdd92bb1 master@c | (empty) Merge remote-tracking branch 'b/pr'

Changed remote bookmarks:
master@c:
+ untracked xqknuwot bdd92bb1 master@c | (empty) Merge remote-tracking branch 'b/pr'
- untracked qrmnvrwp 8ac29516 other
pr@b:
+ untracked (absent)
- untracked mszwrokv 1780e38a work

For comparison, here are the situations with separate fetch operations (behavior not affected by this PR)
(cc @martinvonz, there was some confusion in the related issue around this, this should show that this implementation has behavior more consistent with performing the fetches separately vs the current code)

b, then c
[nix-shell:/tmp/a]$ jj git fetch --remote b
bookmark: pr@b [deleted] untracked
Abandoned 1 commits that are no longer reachable.

[nix-shell:/tmp/a]$ jj git fetch --remote c
bookmark: master@c [updated] untracked

[nix-shell:/tmp/a]$ jj op show 3734baa27878
3734baa27878 mateon@ceres 18 seconds ago, lasted 8 milliseconds
fetch from git remote(s) b
args: jj git fetch --remote b

Changed commits:
○  Change mszwrokvtxot
   - mszwrokv hidden 1780e38a work

Changed remote bookmarks:
pr@b:
+ untracked (absent)
- untracked mszwrokv hidden 1780e38a work

[nix-shell:/tmp/a]$ jj op show eec7
eec7028a1200 mateon@ceres 18 seconds ago, lasted 6 milliseconds
fetch from git remote(s) c
args: jj git fetch --remote c

Changed commits:
○  Change xqknuwottovp
│  + xqknuwot bdd92bb1 master@c | (empty) Merge remote-tracking branch 'b/pr'
○  Change mszwrokvtxot
   + mszwrokv 1780e38a work

Changed remote bookmarks:
master@c:
+ untracked xqknuwot bdd92bb1 master@c | (empty) Merge remote-tracking branch 'b/pr'
- untracked qrmnvrwp 8ac29516 other
c, then b
[nix-shell:/tmp/a]$ jj git fetch --remote c
bookmark: master@c [updated] untracked

[nix-shell:/tmp/a]$ jj git fetch --remote b
bookmark: pr@b [deleted] untracked

[nix-shell:/tmp/a]$ jj op show f03c2cf59b81
f03c2cf59b81 mateon@ceres 1 minute ago, lasted 5 milliseconds
fetch from git remote(s) c
args: jj git fetch --remote c

Changed commits:
○  Change xqknuwottovp
   + xqknuwot bdd92bb1 master@c | (empty) Merge remote-tracking branch 'b/pr'

Changed remote bookmarks:
master@c:
+ untracked xqknuwot bdd92bb1 master@c | (empty) Merge remote-tracking branch 'b/pr'
- untracked qrmnvrwp 8ac29516 other

[nix-shell:/tmp/a]$ jj op show c802
c8026bc76d32 mateon@ceres 1 minute ago, lasted 4 milliseconds
fetch from git remote(s) b
args: jj git fetch --remote b

Changed remote bookmarks:
pr@b:
+ untracked (absent)
- untracked mszwrokv 1780e38a work

@yuja
Copy link
Contributor

yuja commented Nov 20, 2024

Perhaps, the easiest workaround is to do rebase_descendants() per remote? Does this patch solve the problem?

diff --git a/cli/src/git_util.rs b/cli/src/git_util.rs
index 74ebe2d2fa..8949218232 100644
--- a/cli/src/git_util.rs
+++ b/cli/src/git_util.rs
@@ -496,6 +496,11 @@
             _ => user_error(err),
         })?;
         print_git_import_stats(ui, tx.repo(), &stats.import_stats, true)?;
+        let settings = tx.settings().clone(); // XXX silly clone
+        let num_rebased = tx.repo_mut().rebase_descendants(&settings)?;
+        if num_rebased > 0 {
+            writeln!(ui.status(), "Rebased {num_rebased} descendant commits")?;
+        }
     }
     warn_if_branches_not_found(
         ui,

It might be cleaner to do import at the end, but it's weird that git:fetch() accepts multiple remotes (and returns one default_branch.)

@mateon1
Copy link
Author

mateon1 commented Nov 20, 2024

Yuja's patch does work, with this result:

[mateon@ceres:/tmp/a]$ jj git fetch --all-remotes 
bookmark: pr@b [deleted] untracked
Abandoned 1 commits that are no longer reachable.
bookmark: master@c [updated] untracked

[mateon@ceres:/tmp/a]$ jj op show 63915faff2bc
63915faff2bc mateon@ceres 31 seconds ago, lasted 11 milliseconds
fetch from git remote(s) b,c
args: jj git fetch --all-remotes

Changed commits:
○  Change xqknuwottovp
   + xqknuwot bdd92bb1 master@c | (empty) Merge remote-tracking branch 'b/pr'

Changed remote bookmarks:
master@c:
+ untracked xqknuwot bdd92bb1 master@c | (empty) Merge remote-tracking branch 'b/pr'
- untracked qrmnvrwp 8ac29516 other
pr@b:
+ untracked (absent)
- untracked mszwrokv 1780e38a work

So it's a direct fix to the current bug while keeping existing behavior.
Personally I find it rather unintuitive that branch changes on one remote can mess with your local work even if the commits still exist in the shared history.
Although frankly I find it surprising that any kind of fetch can rebase your local commits - my expectation as a new jujutsu user was having to rebase your work yourself when a fetch makes your commits stale. I also expected fetch from multiple remotes to be significantly more atomic than it is now.

Re: default_branch, that's also a bit weird. It's only used from jj git clone, which only supports a single remote. Even if it did support multiple, how could you possibly know which remote is 'most authoritative' for choosing the default branch? What if you clone two forks of the same repo? What if you clone two completely independent repos?
I'm not sure what the most correct thing to do here would be. Returning a Vec of default_branches corresponding to the passed remotes? Making a 'clone' implementation that supports only one remote, handles default_branch, and ripping out that code from fetch? Or just choose the first remote where HEAD refers to a branch?

@yuja
Copy link
Contributor

yuja commented Nov 20, 2024

I find it surprising that any kind of fetch can rebase your local commits - my expectation as a new jujutsu user was having to rebase your work yourself when a fetch makes your commits stale.

You can turn it off by git.abandon-unreachable-commits = false, but you'll need to clean up old branches manually (if the remote does rebase/squash merges.)

Re: default_branch, that's also a bit weird. It's only used from jj git clone, which only supports a single remote. Even if it did support multiple, how could you possibly know which remote is 'most authoritative' for choosing the default branch?

Perhaps, the API can be refactored to something like this:

let mut peer = GitFetch::new(..);
let default_branch = peer.fetch(remote_name, branch_names, ..);
...
let stats = peer.import_refs();

(and git::fetch() will be a thin wrapper.)

@essiene might be interested.

@essiene
Copy link
Contributor

essiene commented Nov 23, 2024

I find it surprising that any kind of fetch can rebase your local commits - my expectation as a new jujutsu user was having to rebase your work yourself when a fetch makes your commits stale.

You can turn it off by git.abandon-unreachable-commits = false, but you'll need to clean up old branches manually (if the remote does rebase/squash merges.)

Re: default_branch, that's also a bit weird. It's only used from jj git clone, which only supports a single remote. Even if it did support multiple, how could you possibly know which remote is 'most authoritative' for choosing the default branch?

Perhaps, the API can be refactored to something like this:

let mut peer = GitFetch::new(..);
let default_branch = peer.fetch(remote_name, branch_names, ..);
...
let stats = peer.import_refs();

This seems like a good approach.

We could call peer.fetch() in a loop for all remotes and at the end, call peer.import_refs just once.

(and git::fetch() will be a thin wrapper.)

We could also have a thin wrapper for git::clone which returns the default branch as git::fetch doesn't need to. And I'll be able to add git::sync that does the extra stuff I need.

If there are no objections, I go ahead and send out a few PRs that do this.

@essiene might be interested.

Indeed. Thanks for looping me in.

@yuja
Copy link
Contributor

yuja commented Nov 23, 2024

We could also have a thin wrapper for git::clone which returns the default branch as git::fetch doesn't need to. And I'll be able to add git::sync that does the extra stuff I need.

Sounds good, thanks.

BTW, I used peer in the example, but it's probably a bad word. I expect peer to be associated with a single remote.

@essiene
Copy link
Contributor

essiene commented Nov 23, 2024

We could also have a thin wrapper for git::clone which returns the default branch as git::fetch doesn't need to. And I'll be able to add git::sync that does the extra stuff I need.

Sounds good, thanks.

BTW, I used peer in the example, but it's probably a bad word. I expect peer to be associated with a single remote.

Yeah, makes sense.

essiene added a commit that referenced this pull request Nov 24, 2024
* This allows more control over the stages of the fetch.
* Provide thin and directed wrappers for clone and fetch
  which allow slightly different overall actions.
* Allow `git::fetch` to handle multiple remotes.
  * One important side-effect to note (as demonstrated in the changed
    cli test) is that when there are multiple remotes, all fetches
    must pass before `import_refs` starts; no branch imports if a 
    fetch from any remote fails.
* Update call sites to use new `git::clone` and `git::fetch`

Fixes: #4923
essiene added a commit that referenced this pull request Nov 24, 2024
* This allows more control over the stages of the fetch.
* Provide thin and directed wrappers for clone and fetch
  which allow slightly different overall actions.
* Allow `git::fetch` to handle multiple remotes.
  * One important side-effect to note (as demonstrated in the changed
    cli test) is that when there are multiple remotes, all fetches
    must pass before `import_refs` starts; no branch imports if a 
    fetch from any remote fails.
* Update call sites to use new `git::clone` and `git::fetch`

Fixes: #4923
essiene added a commit that referenced this pull request Nov 24, 2024
* Reimplement `git::fetch` in terms of the new lower-level
  `GitFetch` api, which allows more control over the stages
  of the fetch.
* Make `git::fetch` to accept multiple remotes, fetch from them
  all, before calling `import_refs`.
* Set `default_branch` to None in the `GitFetchStats` return value
  from `git::fetch`. 
* Update call sites to use new `git::clone` and `git::fetch`
* Update tests: One important side-effect to note (as demonstrated
  in the changed cli test) is that when there are multiple remotes,
  all fetches must pass before `import_refs` starts; no branch
  imports if a fetch from any remote fails. This is WAI.

Fixes: #4923
essiene added a commit that referenced this pull request Nov 24, 2024
* This allows more control over the stages of the fetch.
* Implement `git::clone` in terms of the new api.
* Update call sites to use new `git::clone`.
* Update tests to explicitly use `git::clone` where relevant.

Issue: #4923
essiene added a commit that referenced this pull request Nov 24, 2024
* Reimplement `git::fetch` in terms of the new lower-level
  `GitFetch` api, which allows more control over the stages
  of the fetch.
* Make `git::fetch` to accept multiple remotes, fetch from them
  all, before calling `import_refs`.
* Set `default_branch` to None in the `GitFetchStats` return value
  from `git::fetch`. 
* Update call sites to use new `git::clone` and `git::fetch`
* Update tests: One important side-effect to note (as demonstrated
  in the changed cli test) is that when there are multiple remotes,
  all fetches must pass before `import_refs` starts; no branch
  imports if a fetch from any remote fails. This is WAI.

Fixes: #4923
essiene added a commit that referenced this pull request Nov 24, 2024
* Reimplement `git::fetch` in terms of the new lower-level
  `GitFetch` api, which allows more control over the stages
  of the fetch.
* Make `git::fetch` to accept multiple remotes, fetch from them
  all, before calling `import_refs`.
* Set `default_branch` to None in the `GitFetchStats` return value
  from `git::fetch`. 
* Update call sites to use new `git::clone` and `git::fetch`
* Update tests: One important side-effect to note (as demonstrated
  in the changed cli test) is that when there are multiple remotes,
  all fetches must pass before `import_refs` starts; no branch
  imports if a fetch from any remote fails. This is WAI.

Fixes: #4923
@essiene
Copy link
Contributor

essiene commented Nov 24, 2024

We could also have a thin wrapper for git::clone which returns the default branch as git::fetch doesn't need to. And I'll be able to add git::sync that does the extra stuff I need.

Sounds good, thanks.
BTW, I used peer in the example, but it's probably a bad word. I expect peer to be associated with a single remote.

Yeah, makes sense.

#4959 & #4960

essiene added a commit that referenced this pull request Nov 30, 2024
* This allows more control over the stages of the fetch.
* Implement `git::clone` in terms of the new api.
* Update call sites to use new `git::clone`.
* Update tests to explicitly use `git::clone` where relevant.

Issue: #4923
essiene added a commit that referenced this pull request Nov 30, 2024
* This allows more control over the stages of the fetch.
* Implement `git::fetch_and_get_default_branch` in terms of the new api.
* Update the cli/src/commands/Git/clone.rs to use new `git::fetch_and_get_default_branch`.
* Update tests to explicitly use `git::fetch_and_get_default_branch` where relevant.
* Update tests to reflect that `git::fetch` doesn't return the default branch.

Issue: #4923
essiene added a commit that referenced this pull request Nov 30, 2024
* This allows more control over the stages of the fetch.
* Implement `git::fetch_and_get_default_branch` in terms of the new api.
* Update the cli/src/commands/Git/clone.rs to use new `git::fetch_and_get_default_branch`.
* Update tests to explicitly use `git::fetch_and_get_default_branch` where relevant.
* Update tests to reflect that `git::fetch` doesn't return the default branch.

Issue: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
* This allows more control over the stages of the fetch.
* Implement `git::fetch_and_get_default_branch` in terms of the new api.
* Update the cli/src/commands/Git/clone.rs to use new `git::fetch_and_get_default_branch`.
* Update tests to explicitly use `git::fetch_and_get_default_branch` where relevant.
* Update tests to reflect that `git::fetch` doesn't return the default branch.

Issue: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
* This allows more control over the stages of the fetch.
* Implement `git::fetch` in terms of the new api.
* Update tests to explicitly use `git::fetch_and_get_default_branch` where relevant.

Issue: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
* This allows more control over the stages of the fetch.
* Implement `git::fetch` in terms of the new api.
* Update tests to explicitly use `git::fetch_and_get_default_branch` where relevant.

Issue: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
* GitFetch{} separates `fetch()` from `import_refs()`.
* This allows more control over the stages of the fetch so multiple fetches
  can happen before `import_refs` resolves the changes in the local jj repo.
* Implement `git::fetch` in terms of the new api.
* Add a test case for initial fetch from a repo without `HEAD` set. This tests
  the default branch retrieving behaviour.

Issue: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
…rectly.

* Make the new `GitFetch` api public.
* Rewrite all tests that used `git::fetch*` to use the new API.
  One interesting artifact of the new API is that it holds onto a reference
  of the `tx.repo_mut()`, so the GitFetch object needs to fall go of scope
  completely before we can make any other mutable calls to `tx`.
* Delete the `git::fetch`
* Update `jj git clone` and `git_fetch` in `cli/src/git_utils.rs` to use
  the new api directly. Removing one redundant layer of indirection.
* This fixes #4923 as it first fetches from all remotes before `import_refs()`
  is called, so there is no race condition if the same commit is treated
  differently in different remotes specified in the same command.

Fixes: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
…rectly.

* Make the new `GitFetch` api public.
* Rewrite all tests that used `git::fetch*` to use the new API.
  One interesting artifact of the new API is that it holds onto a reference
  of the `tx.repo_mut()`, so the GitFetch object needs to fall go of scope
  completely before we can make any other mutable calls to `tx`.
* Delete the `git::fetch`
* Update `jj git clone` and `git_fetch` in `cli/src/git_utils.rs` to use
  the new api directly. Removing one redundant layer of indirection.
* This fixes #4923 as it first fetches from all remotes before `import_refs()`
  is called, so there is no race condition if the same commit is treated
  differently in different remotes specified in the same command.

Fixes: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
* GitFetch{} separates `fetch()` from `import_refs()`.
* This allows more control over the stages of the fetch so multiple fetches
  can happen before `import_refs` resolves the changes in the local jj repo.
* Implement `git::fetch` in terms of the new api.
* Add a test case for initial fetch from a repo without `HEAD` set. This tests
  the default branch retrieving behaviour.

Issue: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
…rectly.

* Make the new `GitFetch` api public.
* Rewrite all tests that used `git::fetch*` to use the new API.
  One interesting artifact of the new API is that it holds onto a reference
  of the `tx.repo_mut()`, so the GitFetch object needs to fall go of scope
  completely before we can make any other mutable calls to `tx`.
* Delete the `git::fetch`
* Update `jj git clone` and `git_fetch` in `cli/src/git_utils.rs` to use
  the new api directly. Removing one redundant layer of indirection.
* This fixes #4923 as it first fetches from all remotes before `import_refs()`
  is called, so there is no race condition if the same commit is treated
  differently in different remotes specified in the same command.

Fixes: #4923
essiene added a commit that referenced this pull request Dec 1, 2024
…rectly.

* Make the new `GitFetch` api public.
* Rewrite all tests that used `git::fetch*` to use the new API.
  One interesting artifact of the new API is that it holds onto a reference
  of the `tx.repo_mut()`, so the GitFetch object needs to fall go of scope
  completely before we can make any other mutable calls to `tx`.
* Delete the `git::fetch`
* Update `jj git clone` and `git_fetch` in `cli/src/git_utils.rs` to use
  the new api directly. Removing one redundant layer of indirection.
* This fixes #4923 as it first fetches from all remotes before `import_refs()`
  is called, so there is no race condition if the same commit is treated
  differently in different remotes specified in the same command.

Fixes: #4920
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants