Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: v2 fast rebuilding #3401

Merged
merged 3 commits into from
Dec 28, 2024
Merged

feat: v2 fast rebuilding #3401

merged 3 commits into from
Dec 28, 2024

Conversation

shuo-wu
Copy link
Contributor

@shuo-wu shuo-wu commented Dec 24, 2024

Which issue(s) this PR fixes:

longhorn/longhorn#9488

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Copy link

coderabbitai bot commented Dec 24, 2024

Walkthrough

This pull request introduces modifications across multiple files to enhance replica management in the Longhorn system. The primary changes involve updating the ReplicaRemove method signature across various components to include a new replicaName parameter. Additionally, a new configuration setting V2DataEngineFastReplicaRebuilding is introduced to provide more granular control over replica rebuilding processes for the V2 Data Engine. These changes aim to improve the flexibility and control of replica handling in the system.

Changes

File Change Summary
controller/engine_controller.go Updated removeUnknownReplica and startRebuilding methods to include error return and modified ReplicaRemove method calls
controller/volume_controller.go Modified shouldCleanUpFailedReplica method to incorporate V2DataEngineFastReplicaRebuilding setting
engineapi/engine.go Updated ReplicaRemove method signature to include replicaName parameter
engineapi/enginesim.go Modified ReplicaRemove method signature to include replicaName parameter
engineapi/proxy_replica.go Updated ReplicaRemove method to pass replicaName to grpcClient
engineapi/types.go Updated EngineClient interface ReplicaRemove method signature
scheduler/replica_scheduler.go Added configuration setting checks in CheckAndReuseFailedReplica and RequireNewReplica methods
types/setting.go Added new V2DataEngineFastReplicaRebuilding setting definition

Sequence Diagram

sequenceDiagram
    participant Controller as Engine/Volume Controller
    participant Scheduler as Replica Scheduler
    participant Setting as Configuration
    participant Engine as Engine API

    Setting->>Scheduler: Check V2DataEngineFastReplicaRebuilding
    alt Setting Enabled
        Scheduler->>Controller: Allow Replica Reuse/Rebuild
        Controller->>Engine: ReplicaRemove(engine, url, replicaName)
    else Setting Disabled
        Scheduler->>Scheduler: Skip Replica Reuse
        Controller->>Controller: Standard Replica Handling
    end
Loading

Possibly related PRs

Suggested reviewers

  • mantissahz
  • innobead
  • ChanYiLin
  • COLDTURNIP

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Member

@derekbit derekbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Only one question and need to update the dependencies.

Found error

vet: engineapi/enginesim_test.go:59:41: not enough arguments in call to sim.ReplicaRemove
	have (*"github.com/longhorn/longhorn-manager/k8s/pkg/apis/longhorn/v1beta2".Engine, string)
	want (*"github.com/longhorn/longhorn-manager/k8s/pkg/apis/longhorn/v1beta2".Engine, string, string)
time="2024-12-27T16:44:46Z" level=fatal msg="exit status 1"

Comment on lines +4988 to +4997
V2DataEngineFastReplicaRebuilding, err := c.ds.GetSettingAsBool(types.SettingNameV2DataEngineFastReplicaRebuilding)
if err != nil {
log.WithError(err).Warnf("Failed to get the setting %v, will consider it as false", types.SettingDefinitionV2DataEngineFastReplicaRebuilding)
V2DataEngineFastReplicaRebuilding = false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it optional?

Copy link
Member

@innobead innobead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question.

@@ -1443,6 +1446,16 @@ var (
Default: "",
}

SettingDefinitionV2DataEngineFastReplicaRebuilding = SettingDefinition{
DisplayName: "V2 Data Engine Fast Replica Rebuilding",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this setting is temporary before v2 becomes GA, as delta replica rebuilding should be eventually enabled always. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES! Next, I will check when we can launch a setting similar to v1 Snapshot Data Integrity instead. If we can do that before v1.8 official release, V2 Data Engine Fast Replica Rebuilding won't be delivered to users at all.

go.mod Outdated Show resolved Hide resolved
@shuo-wu shuo-wu marked this pull request as ready for review December 27, 2024 21:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
engineapi/proxy_replica.go (1)

15-16: Validate or log replicaName for clarity.
Consider logging or validating the replicaName parameter for better traceability or debugging, given its potential diagnostic value.

engineapi/engine.go (1)

180-180: Unused parameter replicaName.
The newly added parameter isn’t used inside this method. If it’s intended to remain optional, consider documenting it. Otherwise, remove it to reduce confusion.

types/setting.go (1)

1449-1458: Check prerequisites for fast replica rebuilding.
The description mentions that the snapshot data integrity must be set to enabled or fast-check. Consider validating that requirement at runtime to prevent misconfigurations where fast rebuilding is enabled without snapshot checksums.

+ // Example snippet to validate prerequisites
+ if newSetting.Name == SettingNameV2DataEngineFastReplicaRebuilding && newSetting.Value == "true" {
+   sdiSetting, err := ds.GetSetting(SettingNameSnapshotDataIntegrity)
+   if err != nil {
+     return fmt.Errorf("failed to retrieve snapshot data integrity setting")
+   }
+   if sdiSetting.Value == string(longhorn.SnapshotDataIntegrityDisabled) {
+     return fmt.Errorf("cannot enable v2-engine-fast-replica-rebuilding without enabling snapshot-data-integrity")
+   }
+ }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ac13db and 09d3349.

⛔ Files ignored due to path filters (38)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/fsnotify/fsnotify/.cirrus.yml is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/.editorconfig is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/.gitattributes is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/.gitignore is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/CHANGELOG.md is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/CONTRIBUTING.md is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_fen.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_inotify.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_kqueue.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_other.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/backend_windows.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/fsnotify.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/darwin.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_darwin.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_dragonfly.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_freebsd.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_kqueue.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_linux.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_netbsd.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_openbsd.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_solaris.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/debug_windows.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/freebsd.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/internal.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/unix.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/unix2.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/internal/windows.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/mkdoc.zsh is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/system_bsd.go is excluded by !vendor/**
  • vendor/github.com/fsnotify/fsnotify/system_darwin.go is excluded by !vendor/**
  • vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/google.golang.org/protobuf/internal/errors/is_go112.go is excluded by !vendor/**
  • vendor/google.golang.org/protobuf/internal/errors/is_go113.go is excluded by !vendor/**
  • vendor/google.golang.org/protobuf/internal/impl/message_reflect.go is excluded by !vendor/**
  • vendor/google.golang.org/protobuf/internal/version/version.go is excluded by !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (9)
  • controller/engine_controller.go (2 hunks)
  • controller/volume_controller.go (1 hunks)
  • engineapi/engine.go (1 hunks)
  • engineapi/enginesim.go (1 hunks)
  • engineapi/enginesim_test.go (1 hunks)
  • engineapi/proxy_replica.go (1 hunks)
  • engineapi/types.go (1 hunks)
  • scheduler/replica_scheduler.go (2 hunks)
  • types/setting.go (4 hunks)
🔇 Additional comments (11)
controller/volume_controller.go (1)

4991-5001: Consider differentiating ephemeral errors from missing settings.
Currently, the code sets V2DataEngineFastReplicaRebuilding to false upon any retrieval error, potentially triggering undesired replica cleanup if the error is transient. Consider handling “not found” vs. other error scenarios separately, or adding fallback logic to avoid indiscriminate cleanup.

engineapi/enginesim_test.go (1)

59-59: LGTM for new parameter addition.
The extra argument aligns well with the updated ReplicaRemove signature, and the test remains clear and coherent.

engineapi/enginesim.go (1)

148-148: Consider utilizing new parameter if necessary.
Although the signature now includes replicaName, it remains unused in this method. If it can help diagnose or coordinate replica operations, consider logging it.

engineapi/types.go (1)

92-92: Ensure all callers are updated to the new method signature.
Adding the replicaName parameter helps distinguish replicas when removing them. Verify that all references to ReplicaRemove have been updated accordingly to avoid runtime errors.

scheduler/replica_scheduler.go (2)

570-578: Handle setting retrieval errors gracefully.
The code falls back to false if retrieving the V2DataEngineFastReplicaRebuilding setting fails. Consider logging more context or re-checking to ensure we do not silently ignore crucial issues.


670-678: Consistent logic for scheduling new replicas.
Similar to the earlier block, ensure the fallback logic for V2DataEngineFastReplicaRebuilding remains consistent so that scheduling does not unintentionally bypass fast rebuilding.

controller/engine_controller.go (2)

1658-1658: Passing an empty replica name.
When calling ReplicaRemove(e, url, ""), verify that this usage is supported. If the replica’s name is crucial for logs or further validation, consider replacing the empty string with a meaningful identifier.


1861-1861: Remove the rebuilding replica using its actual name.
Now that ReplicaRemove takes replicaName, it ensures clarity. Confirm that both the replica URL and the name match the intended target to avoid removing the wrong replica.

types/setting.go (3)

135-135: Properly name the new setting constant.
The naming convention follows existing patterns for setting definitions.


229-229: Inclusion in SettingNameList is correct.
This ensures the new setting is recognized and managed platform-wide.


349-349: Registering the setting in settingDefinitions.
Ensures that the setting is visible for validation and defaulting mechanisms.

Copy link
Member

@derekbit derekbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekbit derekbit merged commit b4bcf52 into longhorn:master Dec 28, 2024
9 checks passed
@derekbit
Copy link
Member

@mergify backport v1.8.x

Copy link

mergify bot commented Dec 28, 2024

backport v1.8.x

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants