Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SlidingInvisibilityTimeout behavior #209

Closed
AntonZiminSaritasa opened this issue Sep 23, 2021 · 12 comments
Closed

SlidingInvisibilityTimeout behavior #209

AntonZiminSaritasa opened this issue Sep 23, 2021 · 12 comments

Comments

@AntonZiminSaritasa
Copy link

AntonZiminSaritasa commented Sep 23, 2021

The SQL Server package supports SlidingInvisibilityTimeout property:
https://www.hangfire.io/blog/2017/06/16/hangfire-1.6.14.html

Can I simulate a similar config using Hangfire.PostgreSql?

My config:

InvisibilityTimeout = TimeSpan.FromDays(1)

Some jobs take more than 30 minutes to complete, that's why I need the large InvisibilityTimeout value. One worker crashes and the job stays in the queue for many hours.

Ideally, some watchdog should update the jobs with "Processing" state if the server is dead (ServerId does not exist in hangfire.server table anymore).

https://github.com/frankhommers/Hangfire.PostgreSql/blob/master/src/Hangfire.PostgreSql/PostgreSqlConnection.cs#L401
https://github.com/frankhommers/Hangfire.PostgreSql/blob/master/src/Hangfire.PostgreSql/PostgreSqlConnection.cs#L427

@frankhommers
Copy link
Collaborator

@azygis did you make this happen in the last PRs or am I wrong?

@azygis
Copy link
Collaborator

azygis commented Feb 12, 2022

Nope, none of the timeouts in there. Though I'd rather use transactions and obsolete these altogether.

@Frazi1
Copy link
Contributor

Frazi1 commented Feb 13, 2022

I'm willing to work on this, but don't have much time. I'll see if I can make something next weekend.

However, I'm still not sure how to better approach this.

  1. the simplest option is to have a watchdog monitor dead servers and requeue jobs (that's what @AntonZiminSaritasa suggested above). On my previous job, we had this solution working for a while in production without any noticable trouble.

  2. But Hangfire.SqlServer doesn't use a watchdog. Instead, they have a dedicated timer per job which periodically updates invisibility timeout on the job to keep it "alive".

This approach might be a little more difficult to implement, but it's more granural and might be a little more performant because it won't require scanning through the database to find stuck jobs. I don't have any numbers though.

My personal preference is to follow SqlServer's approach and not reinvent the wheel. Do you guys have any thoughts on this?

Nope, none of the timeouts in there. Though I'd rather use transactions and obsolete these altogether.

@azygis
How do transactions help with long running jobs?

@AntonZiminSaritasa
Copy link
Author

AntonZiminSaritasa commented Feb 14, 2022

Personally, I migrated my Hangfire DB to SQL Server, it works fine. I prefer Option 2.

@azygis
Copy link
Collaborator

azygis commented Feb 14, 2022

@Frazi1
Sorry, I did not see yet how the SqlServer provider does this. Had a quick google search when I saw this issue, their 1.5 release notes mention that they're now using transactions. Since the version is that old, they might have changed the implementation substantially.

All in all, my previous message can be translated as "I'd prefer to do it the way Hangfire.SqlServer does". As you say, not reinvent the wheel.

@timpikelmg
Copy link
Contributor

Hi all, I had a look at this and for the use cases we want to use it for (which will be a mixture of short and long running jobs) we would really like to have a better solution that the current invisibility timeouts.

To that end, I have ported the sliding invisibility timeouts from SQL Server across as the first step in this PR #353.

Sliding invisibility is useful especially for very long running jobs and also good to have as an option since apparently the transaction approach, which holds a transaction open for long periods of time, has been seen to cause issues with backups on SQL Server (see #864) so makes sense to have both options here as well.

I'm still getting up to speed on the internals of Hangfire and storage libraries so hopefully the code is suitable. It's turned off by default.

I'm hoping I can find some time soon to see about porting the transaction approach across as well.

@azygis
Copy link
Collaborator

azygis commented Apr 14, 2024

I might have some time to see this today or tomorrow.

IIRC SQL Server has sp_applock or something similar equivalent of which is not in PostgreSQL. Maybe advisory locks, but these were discussed in this repo somewhere a while back as being somewhat negative for this package. Might be worth reinvestigating.

@timpikelmg
Copy link
Contributor

I might have some time to see this today or tomorrow.

Appreciate any time you can spend having a look. We are still building out our distributed job management service internally and not in production yet but I had some time free so decided to give this a look (and this was a very useful learning exercise digging into the libraries).

IIRC SQL Server has sp_applock or something similar equivalent of which is not in PostgreSQL. Maybe advisory locks, but these were discussed in this repo somewhere a while back as being somewhat negative for this package. Might be worth reinvestigating.

For the transaction dequeuing it basically just holds the transaction open (and the row lock) until the job is finished. This works the same way in Postgres since the rowlock is per transaction. I don't know of any way to do a row lock at a session level, not at least out of the box.
I saw some discussion around advisory locks for distributed locks, but I believe that is separate functionality, this doesn't use distributed locking at all (unless I missed something else in the code).

This draft PR is it ported from SQL Server storage to Postgres. It still needs some tests written/ported for dequeuing and more testing but shows how it would work - timpikelmg#1

@robjakedorsett
Copy link

Hey,
Hangfire SQL Server now doesn't use InvisibilityTimeout, are there plans to follow suit?

@azygis
Copy link
Collaborator

azygis commented Jun 13, 2024

@timpikelmg sorry, life's happening quite a lot therefore there's really little personal time for reviewing stuff. Not going to make any specific promises (yeah, "today or tomorrow" hadn't quite happened yet).

@robjakedorsett not at this point in time, no, unless someone is willing to create a PR for it.

@timpikelmg
Copy link
Contributor

Hey, Hangfire SQL Server now doesn't use InvisibilityTimeout, are there plans to follow suit?

The end goal of these PRs (in my view) is to enable turning off InvisibilityTimeout eventually since this will replace it. I wouldn't expect that for a while, not until sliding invisibility timeout has been the default for a release or two.

azygis added a commit that referenced this issue Jun 27, 2024
…ibility-timeouts

#209 - Add Support for Sliding Invisibility Timeouts
@azygis
Copy link
Collaborator

azygis commented Jun 27, 2024

Implemented with #353.

@azygis azygis closed this as completed Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants