Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-36735: Wait for carrier before announcing IPs via GARP/NA #112

Conversation

thom311
Copy link
Contributor

@thom311 thom311 commented Jul 8, 2024

This is a backport of k8snetworkplumbingwg/sriov-cni#301 to release-4.16 branch to fix OCPBUGS-30549. The 4.16.z issue is OCPBUGS-36735

Those upstream patches are already on latest master (release-4.17).

The patches were cherry-picked without manual modifications or conflicts in git.


It also contains a backport of 2f64420 , from k8snetworkplumbingwg/sriov-cni@2f64420

@openshift-ci openshift-ci bot requested review from bn222 and wizhaoredhat July 8, 2024 08:34
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 8, 2024
Copy link
Contributor

openshift-ci bot commented Jul 8, 2024

Hi @thom311. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@bn222
Copy link
Contributor

bn222 commented Jul 8, 2024

please use a merge commit instead.

@bn222
Copy link
Contributor

bn222 commented Jul 8, 2024

Discussed offline and this is not a sync. It's a bugfix. Please rename the PR name to OCPBUGS-30549: ...
and make sure that the jira bot is not reporting issues.

@thom311 thom311 changed the title [4.16] Wait for carrier before announcing IPs via GARP/NA OCPBUGS-30549:[4.16] Wait for carrier before announcing IPs via GARP/NA Jul 8, 2024
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Jul 8, 2024
@openshift-ci-robot
Copy link
Contributor

@thom311: This pull request references Jira Issue OCPBUGS-30549, which is invalid:

  • expected the bug to target the "4.16.z" version, but no target version was set
  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required"
  • expected Jira Issue OCPBUGS-30549 to depend on a bug targeting a version in 4.17.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is a backport of k8snetworkplumbingwg/sriov-cni#301 to release-4.16 branch to fix OCPBUGS-30549.

Those upstream patches are already on latest master (release-4.17).

The patches were cherry-picked without manual modifications or conflicts in git.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jul 8, 2024
@thom311 thom311 marked this pull request as draft July 8, 2024 09:43
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 8, 2024
@zhaozhanqi
Copy link

/retitle OCPBUGS-36735: Wait for carrier before announcing IPs via GARP/NA

@openshift-ci openshift-ci bot changed the title OCPBUGS-30549:[4.16] Wait for carrier before announcing IPs via GARP/NA OCPBUGS-36735: Wait for carrier before announcing IPs via GARP/NA Jul 9, 2024
@openshift-ci-robot
Copy link
Contributor

@thom311: This pull request references Jira Issue OCPBUGS-36735, which is invalid:

  • expected Jira Issue OCPBUGS-36735 to depend on a bug targeting a version in 4.17.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is a backport of k8snetworkplumbingwg/sriov-cni#301 to release-4.16 branch to fix OCPBUGS-30549.

Those upstream patches are already on latest master (release-4.17).

The patches were cherry-picked without manual modifications or conflicts in git.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@zhaozhanqi
Copy link

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@zhaozhanqi: This pull request references Jira Issue OCPBUGS-36735, which is invalid:

  • expected Jira Issue OCPBUGS-36735 to depend on a bug targeting a version in 4.17.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@zhaozhanqi
Copy link

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@zhaozhanqi: This pull request references Jira Issue OCPBUGS-36735, which is invalid:

  • expected Jira Issue OCPBUGS-36735 to depend on a bug targeting a version in 4.17.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@zhaozhanqi
Copy link

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@zhaozhanqi: This pull request references Jira Issue OCPBUGS-36735, which is invalid:

  • expected dependent Jira Issue OCPBUGS-30549 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is ON_QA instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@thom311 thom311 force-pushed the th/announce-ips-wait-carrier-4.16 branch from fa19d63 to 4c8f82e Compare July 9, 2024 10:31
@thom311
Copy link
Contributor Author

thom311 commented Jul 9, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 9, 2024
@openshift-ci-robot
Copy link
Contributor

@thom311: This pull request references Jira Issue OCPBUGS-36735, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.z) matches configured target version for branch (4.16.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-30549 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-30549 targets the "4.17.0" version, which is one of the valid target versions: 4.17.0
  • bug has dependents

Requesting review from QA contact:
/cc @zhaozhanqi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from zhaozhanqi July 9, 2024 10:32
@thom311 thom311 marked this pull request as ready for review July 9, 2024 10:33
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 9, 2024
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2024
@asood-rh
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 31, 2024
zeeke and others added 6 commits August 19, 2024 14:12
Previously, we called AnnounceIPs() right after configuring the
interface. At that point, the interface was just recently set IFF_UP,
and it might not yet have carrier. In that case, the messages will be
lost. We will need to wait a bit for carrier.

Since AnnounceIPs() is an optional performance improvement, let's first
do all the important things. Move the non-critical call to the end.

This will be interesting next, when we will do some additional waiting
for the device to have carrier. Let's not do the waiting in the middle
of the critical operations, but only at the end.

Signed-off-by: Thomas Haller <[email protected]>
After setting up the interface, it might take a bit for kernel to detect
carrier. If we then already send the GARP/NA packets, they are lost.

Instead, wait for up to 200 msec for the interface to get carrier. This
time is chosen somewhat arbitrarily. We don't want to block the process
too long, but we also need to wait long enough, that kernel and driver
has time to detect carrier. Also, while busy waiting, sleep with an
exponential back-off time (growth factor 1.5).

Fixes: c241dcb ('Send IPv4 GARP and IPv6 Unsolicited NA in "cmdAdd"')

See-also: https://issues.redhat.com/browse/OCPBUGS-30549

Signed-off-by: Thomas Haller <[email protected]>
Co-authored-by: Thomas Haller <[email protected]>
Signed-off-by: Thomas Haller <[email protected]>
In the case where the Gratuitous ARPs send function failed
we failed the all CNI create and revert the configuration

Signed-off-by: Sebastian Sch <[email protected]>
@thom311 thom311 force-pushed the th/announce-ips-wait-carrier-4.16 branch from 4c8f82e to 43c8bcc Compare August 19, 2024 12:14
@openshift-ci-robot
Copy link
Contributor

@thom311: This pull request references Jira Issue OCPBUGS-36735, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.z) matches configured target version for branch (4.16.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-30549 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-30549 targets the "4.17.0" version, which is one of the valid target versions: 4.17.0
  • bug has dependents

Requesting review from QA contact:
/cc @zhaozhanqi

In response to this:

This is a backport of k8snetworkplumbingwg/sriov-cni#301 to release-4.16 branch to fix OCPBUGS-30549. The 4.16.z issue is OCPBUGS-36735

Those upstream patches are already on latest master (release-4.17).

The patches were cherry-picked without manual modifications or conflicts in git.


It also contains a backport of 2f64420 , from k8snetworkplumbingwg/sriov-cni@2f64420

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@thom311 thom311 marked this pull request as ready for review August 19, 2024 12:15
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 19, 2024
@openshift-ci openshift-ci bot requested a review from Billy99 August 19, 2024 12:16
@SchSeba
Copy link
Contributor

SchSeba commented Aug 19, 2024

lets wait for k8snetworkplumbingwg/sriov-cni#307 to be on master and we can merge it here also
@thom311 what do you think?

@thom311
Copy link
Contributor Author

thom311 commented Aug 19, 2024

lets wait for k8snetworkplumbingwg/sriov-cni#307 to be on master and we can merge it here also @thom311 what do you think?

One concern is that 307 enables optimistic DAD on the interface. I think that is the right thing to do, but it seems quite a large change (with potentially unknown consequences) for a "release-4.16" branch. Also, IPv6 is probably less important to warrant a backport.

But I don't mind. As you prefer. Setting back to Draft.

@thom311 thom311 marked this pull request as draft August 19, 2024 13:09
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 19, 2024
@thom311
Copy link
Contributor Author

thom311 commented Aug 28, 2024

@SchSeba hi. With #112 (comment) , I think this backport should not include k8snetworkplumbingwg/sriov-cni#307, because that enables optimistic DAD for IPv6. That is a relatively new change, possibly wide-ranging, and I think it should spend some time on master before being backported to a stable branch (if ever).

I would un-Draft this PR and I think it's good as-is.

The branch only contains cherry-picks from origin/master branch (without manual changes).

@thom311 thom311 marked this pull request as ready for review September 12, 2024 15:24
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 12, 2024
Copy link
Contributor

openshift-ci bot commented Sep 12, 2024

@thom311: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold cancel

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 13, 2024
Copy link
Contributor

openshift-ci bot commented Oct 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SchSeba, thom311, wizhaoredhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [SchSeba,wizhaoredhat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit d61383f into openshift:release-4.16 Oct 13, 2024
4 checks passed
@openshift-ci-robot
Copy link
Contributor

@thom311: Jira Issue OCPBUGS-36735: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-36735 has been moved to the MODIFIED state.

In response to this:

This is a backport of k8snetworkplumbingwg/sriov-cni#301 to release-4.16 branch to fix OCPBUGS-30549. The 4.16.z issue is OCPBUGS-36735

Those upstream patches are already on latest master (release-4.17).

The patches were cherry-picked without manual modifications or conflicts in git.


It also contains a backport of 2f64420 , from k8snetworkplumbingwg/sriov-cni@2f64420

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: sriov-cni
This PR has been included in build sriov-cni-container-v4.16.0-202410130935.p0.gd61383f.assembly.stream.el9.
All builds following this will include this PR.

@thom311 thom311 deleted the th/announce-ips-wait-carrier-4.16 branch November 18, 2024 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.