Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Nightly job run twice #789

Open
wandering-tales opened this issue Oct 22, 2024 · 3 comments
Open

[BUG] - Nightly job run twice #789

wandering-tales opened this issue Oct 22, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@wandering-tales
Copy link

wandering-tales commented Oct 22, 2024

Describe the bug

It seems this long standing issue (see #52, #132, #159, and #601) is still unresolved.

I have configured a daily job running at 3:00, which processes a batch of entities relevant to my domain. The job completes in around 170ms:

_, _ = scheduler.NewJob(
		gocron.DailyJob(1, gocron.NewAtTimes(gocron.NewAtTime(3, 0, 0))),
		gocron.NewTask(task.MyTask, myParam1, myParam2),
		gocron.WithEventListeners(
			gocron.AfterJobRunsWithError(func(jobID uuid.UUID, jobName string, err error) {
				log.Printf("failed my task: %v", err)
			})),
)

but this occurred last night:

2024-10-22T03:00:00.245294425Z stderr F 2024/10/22 03:00:00 processing thing with ID 59e50053-0d33-4729-9732-9e69ade399a2
...
2024-10-22T03:00:00.414909773Z stderr F 2024/10/22 03:00:00 processing thing with ID 59e50053-0d33-4729-9732-9e69ade399a2

Version

  • gocron: 2.12.1

Expected behavior

The task should be run only once.

Additional context

I have tested the scheduler behavior locally and the NextRuns method of the job returned what is expected.

@wandering-tales wandering-tales added the bug Something isn't working label Oct 22, 2024
@rsvix
Copy link

rsvix commented Nov 4, 2024

EDIT: Removing my comment since in my case it was an implementation error, sorry guys

@JohnRoesler
Copy link
Contributor

heyo @wandering-tales

Yes, this has been a very hard to track down issue, and not reproducible with any regularity, so it's been incredibly difficult to debug.

I've considered different ways to try and account for this but don't have a solid solution at this point.

One idea, is check if the time returned by calling the job type's .next() method is too close to the lastRun or time.Now(). But, I have take a couple stabs at writing up some logic for that and it's gotten quite hairy.

The bug here, at least in my estimation, seems to be that the time being passed into the .next() method to schedule from, is somehow wrong, leading to the same next value being returned. Either that, or somehow two instances of the job are being scheduled 🤔

@wandering-tales
Copy link
Author

Hi @JohnRoesler

Sorry for the late reply. I confirm this is keep happening systematically. Here's an excerpts of relevant Grafana logs entries (in the end there are no sensitive information, so I can paste it here):

Screenshot 2024-12-20 at 15 52 15

Considering that those ID's are processed in a deterministic order and doing some cheap math, it seems that the two job instances starts within ~36ms (691849517ns - 655338486ns).

Luckily those operations performed by the job are idempotent, so no big deal in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants