Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds:Move creating the retry timer in handleRpcStreamClosed to as late as possible and call close() #11776

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions core/src/main/java/io/grpc/internal/AbstractStream.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,17 @@
import io.perfmark.PerfMark;
import io.perfmark.TaskCloseable;
import java.io.InputStream;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.annotation.concurrent.GuardedBy;

/**
* The stream and stream state as used by the application. Must only be called from the sending
* application thread.
*/
public abstract class AbstractStream implements Stream {
private static final Logger log = Logger.getLogger(AbstractStream.class.getName());

/** The framer to use for sending messages. */
protected abstract Framer framer();

Expand Down Expand Up @@ -371,6 +375,11 @@ private void notifyIfReady() {
boolean doNotify;
synchronized (onReadyLock) {
doNotify = isReady();
if (!doNotify) {
log.log(Level.FINE,
"Transport not ready. allocated/de:{0}/{3}, sent queued: {1}, ready thresh: {2}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can create an immense about of logspam. It's also not at all clear what it is talking about (if you aren't looking at the line of code logs this, it is meaningless). What is this trying to achieve?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to achieve with this? I don't follow what it has to do with anything.

new Object[] {allocated, numSentBytesQueued, onReadyThreshold, deallocated});
}
}
if (doNotify) {
listener().onReady();
Expand Down
22 changes: 10 additions & 12 deletions xds/src/main/java/io/grpc/xds/client/ControlPlaneClient.java
Original file line number Diff line number Diff line change
Expand Up @@ -449,16 +449,6 @@ private void handleRpcStreamClosed(Status status) {
stopwatch.reset();
}

// FakeClock in tests isn't thread-safe. Schedule the retry timer before notifying callbacks
// to avoid TSAN races, since tests may wait until callbacks are called but then would run
// concurrently with the stopwatch and schedule.

long elapsed = stopwatch.elapsed(TimeUnit.NANOSECONDS);
long delayNanos = Math.max(0, retryBackoffPolicy.nextBackoffNanos() - elapsed);

rpcRetryTimer =
syncContext.schedule(new RpcRetryTask(), delayNanos, TimeUnit.NANOSECONDS, timeService);

Status newStatus = status;
if (responseReceived) {
// A closed ADS stream after a successful response is not considered an error. Servers may
Expand Down Expand Up @@ -486,9 +476,17 @@ private void handleRpcStreamClosed(Status status) {
newStatus.getCode(), newStatus.getDescription(), newStatus.getCause());
}

closed = true;
close(newStatus.asException());

// FakeClock in tests isn't thread-safe. Schedule the retry timer before notifying callbacks
// to avoid TSAN races, since tests may wait until callbacks are called but then would run
// concurrently with the stopwatch and schedule.
long elapsed = stopwatch.elapsed(TimeUnit.NANOSECONDS);
long delayNanos = Math.max(0, retryBackoffPolicy.nextBackoffNanos() - elapsed);
rpcRetryTimer =
syncContext.schedule(new RpcRetryTask(), delayNanos, TimeUnit.NANOSECONDS, timeService);

xdsResponseHandler.handleStreamClosed(newStatus, !responseReceived);
cleanUp();
}

private void close(Exception error) {
Expand Down
4 changes: 4 additions & 0 deletions xds/src/main/java/io/grpc/xds/client/XdsClientImpl.java
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,9 @@
// cpcForThisStream is null when doing shutdown
private void cleanUpResourceTimers(ControlPlaneClient cpcForThisStream) {
Collection<String> authoritiesForCpc = getActiveAuthorities(cpcForThisStream);
String target = cpcForThisStream == null ? "null" : cpcForThisStream.getServerInfo().target();
logger.log(XdsLogLevel.DEBUG, "Cleaning up resource timers for CPC {0}, authorities {1}",
target, authoritiesForCpc);

for (Map<String, ResourceSubscriber<?>> subscriberMap : resourceSubscribers.values()) {
for (ResourceSubscriber<?> subscriber : subscriberMap.values()) {
Expand Down Expand Up @@ -957,6 +960,7 @@

ControlPlaneClient cpcClosed = serverCpClientMap.get(serverInfo);
if (cpcClosed == null) {
logger.log(XdsLogLevel.DEBUG, "Couldn't find CPC for {0}", serverInfo);

Check warning on line 963 in xds/src/main/java/io/grpc/xds/client/XdsClientImpl.java

View check run for this annotation

Codecov / codecov/patch

xds/src/main/java/io/grpc/xds/client/XdsClientImpl.java#L963

Added line #L963 was not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log statement is not at all clear. Any time this happens we'd expect us not to care because the CPC is shut down, but if you just read the message you would probably think this a surprise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the message clearer. We might care because cleanup and metric reporting are being skipped.

return;
}

Expand Down
Loading