Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle system time move backward case in timer task processing #7030

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented Dec 21, 2024

What changed?

  • Handle system time move backward case in timer task processing by using max(now(), task visibility timestamp)

Why?

  • Monotonic time is used in time.Time comparison only when both operands have monotonic time value. In our case, the timestamp stored/derived in/from mutable state doesn't have monotonic, thus the comparison logic will use wall clock time to decide if a timer task should be processed. If wall clock move backwards (after we verifies that now() > task visibility timestamp when submitting the task to the task scheduler), the timer task will be dropped and cause workflow to stuck.

How did you test it?

  • Unit tests

Potential risks

  • In worst case, we will execute a timer task logic earlier than expected.

Documentation

Is hotfix candidate?

  • Could be. But the bug should be very rare as the use millisecond precision when doing time comparison in timer task processing.

@yycptt yycptt requested a review from bergundy December 21, 2024 02:22
@yycptt yycptt requested a review from a team as a code owner December 21, 2024 02:22
@yycptt yycptt force-pushed the fix-queues-is-time-expired branch from fb5f5fd to c3f87c1 Compare December 21, 2024 02:24
@@ -303,10 +304,14 @@ func (p *scheduledQueue) lookAheadTask() {
// IsTimeExpired checks if the testing time is equal or before
// the reference time. The precision of the comparison is millisecond.
func IsTimeExpired(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a note explaining why we take the task here (to prevent clock skew issues).

// NOTE: Persistence layer may lose precision when persisting the task, which essentially moves
// task fire time backward. But we are already performing truncation here, so doesn't need to
// account for that.
referenceTime = util.MaxTime(referenceTime, task.GetKey().FireTime).Truncate(persistence.ScheduledTaskMinPrecision)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between visibility timestamp and fire time and why take that here?

@@ -236,7 +240,7 @@ func (t *timerQueueStandbyTaskExecutor) executeActivityTimeoutTask(
// created.
isHeartBeatTask := timerTask.TimeoutType == enumspb.TIMEOUT_TYPE_HEARTBEAT
ai, heartbeatTimeoutVis, ok := mutableState.GetActivityInfoWithTimerHeartbeat(timerTask.EventID)
if isHeartBeatTask && ok && queues.IsTimeExpired(timerTask.GetVisibilityTime(), heartbeatTimeoutVis) {
if isHeartBeatTask && ok && queues.IsTimeExpired(timerTask, timerTask.GetVisibilityTime(), heartbeatTimeoutVis) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use referenceTime to allow progressing beyond the visibility timestamp in case task processing is delayed for a while?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants