Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce ALTERNATE_RESPONSE_PORT TLV #430

Closed
wants to merge 1 commit into from

Conversation

leoleovich
Copy link
Contributor

Summary:
Introduce a new TLV as a POC to support switching port on the GM side. This is an important step towards the asymmetry compensation.
We use a offset flag which is uint16 (65536 combinations) which ensures consistency:

offset 0 = noop
offset 1 = switch to the next available port
offset 2 = switch to the next next available port
...
offset 65535 = you get the idea

For example we have 2 buildings with 2 paths between - short and long (ex 1km vs 3+km)
When forward and return paths are the same (short + short) or (long + long) everything is good.
Suddenly something changed on the network and path rehashing results in short + long paths (or long + short for completeness).
At this moment we see a large path delay change to from say (5us + 5us)/ 2 = 5us to (5us + 15us) / 2 = 10us.
This is a problem which only gonna get worse with other DC types where 1km vs 10km paths are possible.
What this change is going to do is:
If we suspect the path change - we can ask GM to start sending packets from different port until we recover the path symmetry. We can't influence the exact port of the GM, but we can basically ask to switch to a next one (in test plan it's visible as 34488->38509). This will be a TLV with offset = 1
If this doesn't help we can try jumping to a "next next" port (which in test plan seen as 38509->47977) - this is offset = 2
Because GM doesn't keep this state (hello simple PTP) we now have to submit this counter every time to preserve the "shift". If we set it to 0 (or don't submit the TLV) we will get back to original port 34488.

Randomly changing count value results in:

08:07:02.490110 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:02.490357 IP6 server.34488 > client.ptp-event: PTPv18
08:07:02.490434 IP6 server.47064 > client.ptp-general: PTPv18

08:07:03.490066 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:03.490379 IP6 server.38509 > client.ptp-event: PTPv18
08:07:03.490392 IP6 server.63604 > client.ptp-general: PTPv18

08:07:04.490507 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:04.490755 IP6 server.47977 > client.ptp-event: PTPv18
08:07:04.490836 IP6 server.49796 > client.ptp-general: PTPv18

08:07:05.491305 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:05.491541 IP6 server.34987 > client.ptp-event: PTPv18
08:07:05.491562 IP6 server.35270 > client.ptp-general: PTPv18

One can see new port usages in server -> client communication

Differential Revision: D66656872

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 4, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66656872

leoleovich added a commit to leoleovich/time that referenced this pull request Dec 5, 2024
Summary:

Introduce a new TLV as a POC to support switching port on the GM side. This is an important step towards the asymmetry compensation.
We use a `offset` flag which is uint16 (65536 combinations) which ensures consistency:
```
offset 0 = noop
offset 1 = switch to the next available port
offset 2 = switch to the next next available port
...
offset 65535 = you get the idea
```

For example we have 2 buildings with 2 paths between - short and long (ex 1km vs 3+km)
When forward and return paths are the same (short + short)  or (long + long) everything is good.
Suddenly something changed on the network and path rehashing results in short + long paths (or long + short for completeness).
At this moment we see a large path delay change to from say `(5us + 5us)/ 2 = 5us` to `(5us + 15us) / 2 = 10us`.
This is a problem which only gonna get worse with other DC types where 1km vs 10km paths are possible.
What this change is going to do is:
If we suspect the path change - we can ask GM to start sending packets from different port until we recover the path symmetry. We can't influence the exact port of the GM, but we can basically ask to switch to a next one (in test plan it's visible as `34488`->`38509`). This will be a TLV with `offset = 1`
If this doesn't help we can try jumping to a "next next" port (which in test plan seen as `38509->47977`) - this is `offset = 2`
Because GM doesn't keep this state (hello simple PTP) we now have to submit this counter every time to preserve the "shift". If we set it to 0 (or don't submit the TLV) we will get back to original port `34488`.

Randomly changing count value results in:
```
08:07:02.490110 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:02.490357 IP6 server.34488 > client.ptp-event: PTPv18
08:07:02.490434 IP6 server.47064 > client.ptp-general: PTPv18

08:07:03.490066 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:03.490379 IP6 server.38509 > client.ptp-event: PTPv18
08:07:03.490392 IP6 server.63604 > client.ptp-general: PTPv18

08:07:04.490507 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:04.490755 IP6 server.47977 > client.ptp-event: PTPv18
08:07:04.490836 IP6 server.49796 > client.ptp-general: PTPv18

08:07:05.491305 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:05.491541 IP6 server.34987 > client.ptp-event: PTPv18
08:07:05.491562 IP6 server.35270 > client.ptp-general: PTPv18
```

One can see new port usages in server -> client communication

Reviewed By: abulimov

Differential Revision: D66656872
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66656872

leoleovich added a commit to leoleovich/time that referenced this pull request Dec 5, 2024
Summary:

Introduce a new TLV as a POC to support switching port on the GM side. This is an important step towards the asymmetry compensation.
We use a `offset` flag which is uint16 (65536 combinations) which ensures consistency:
```
offset 0 = noop
offset 1 = switch to the next available port
offset 2 = switch to the next next available port
...
offset 65535 = you get the idea
```

For example we have 2 buildings with 2 paths between - short and long (ex 1km vs 3+km)
When forward and return paths are the same (short + short)  or (long + long) everything is good.
Suddenly something changed on the network and path rehashing results in short + long paths (or long + short for completeness).
At this moment we see a large path delay change to from say `(5us + 5us)/ 2 = 5us` to `(5us + 15us) / 2 = 10us`.
This is a problem which only gonna get worse with other DC types where 1km vs 10km paths are possible.
What this change is going to do is:
If we suspect the path change - we can ask GM to start sending packets from different port until we recover the path symmetry. We can't influence the exact port of the GM, but we can basically ask to switch to a next one (in test plan it's visible as `34488`->`38509`). This will be a TLV with `offset = 1`
If this doesn't help we can try jumping to a "next next" port (which in test plan seen as `38509->47977`) - this is `offset = 2`
Because GM doesn't keep this state (hello simple PTP) we now have to submit this counter every time to preserve the "shift". If we set it to 0 (or don't submit the TLV) we will get back to original port `34488`.

Randomly changing count value results in:
```
08:07:02.490110 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:02.490357 IP6 server.34488 > client.ptp-event: PTPv18
08:07:02.490434 IP6 server.47064 > client.ptp-general: PTPv18

08:07:03.490066 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:03.490379 IP6 server.38509 > client.ptp-event: PTPv18
08:07:03.490392 IP6 server.63604 > client.ptp-general: PTPv18

08:07:04.490507 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:04.490755 IP6 server.47977 > client.ptp-event: PTPv18
08:07:04.490836 IP6 server.49796 > client.ptp-general: PTPv18

08:07:05.491305 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:05.491541 IP6 server.34987 > client.ptp-event: PTPv18
08:07:05.491562 IP6 server.35270 > client.ptp-general: PTPv18
```

One can see new port usages in server -> client communication

Reviewed By: abulimov

Differential Revision: D66656872
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66656872

Summary:

Introduce a new TLV as a POC to support switching port on the GM side. This is an important step towards the asymmetry compensation.
We use a `offset` flag which is uint16 (65536 combinations) which ensures consistency:
```
offset 0 = noop
offset 1 = switch to the next available port
offset 2 = switch to the next next available port
...
offset 65535 = you get the idea
```

For example we have 2 buildings with 2 paths between - short and long (ex 1km vs 3+km)
When forward and return paths are the same (short + short)  or (long + long) everything is good.
Suddenly something changed on the network and path rehashing results in short + long paths (or long + short for completeness).
At this moment we see a large path delay change to from say `(5us + 5us)/ 2 = 5us` to `(5us + 15us) / 2 = 10us`.
This is a problem which only gonna get worse with other DC types where 1km vs 10km paths are possible.
What this change is going to do is:
If we suspect the path change - we can ask GM to start sending packets from different port until we recover the path symmetry. We can't influence the exact port of the GM, but we can basically ask to switch to a next one (in test plan it's visible as `34488`->`38509`). This will be a TLV with `offset = 1`
If this doesn't help we can try jumping to a "next next" port (which in test plan seen as `38509->47977`) - this is `offset = 2`
Because GM doesn't keep this state (hello simple PTP) we now have to submit this counter every time to preserve the "shift". If we set it to 0 (or don't submit the TLV) we will get back to original port `34488`.

Randomly changing count value results in:
```
08:07:02.490110 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:02.490357 IP6 server.34488 > client.ptp-event: PTPv18
08:07:02.490434 IP6 server.47064 > client.ptp-general: PTPv18

08:07:03.490066 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:03.490379 IP6 server.38509 > client.ptp-event: PTPv18
08:07:03.490392 IP6 server.63604 > client.ptp-general: PTPv18

08:07:04.490507 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:04.490755 IP6 server.47977 > client.ptp-event: PTPv18
08:07:04.490836 IP6 server.49796 > client.ptp-general: PTPv18

08:07:05.491305 IP6 client.ptp-event > server.ptp-event: PTPv18
08:07:05.491541 IP6 server.34987 > client.ptp-event: PTPv18
08:07:05.491562 IP6 server.35270 > client.ptp-general: PTPv18
```

One can see new port usages in server -> client communication

Reviewed By: abulimov

Differential Revision: D66656872
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66656872

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f155c98.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants