Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: Poll dying connection pools asynchronously #4198

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cartoush
Copy link
Contributor

This PR moves the blocking sleep loop to a CLI hook.

See #4064

@nigoroll
Copy link
Member

@cartoush IIRC your commit messages and pull requests so far all were extremely terse. Can you please elaborate such that reviewers can easily understand which problem you are trying to solve, why you chose a particular approach and, if necessary, how you implemented it?

I recently noticed a guide which I would be tempted to propose to adopt for our project: https://github.com/axboe/liburing/blob/master/CONTRIBUTING.md

Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is room for improvement but this is a good start!

bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_cli.c Show resolved Hide resolved
@cartoush cartoush force-pushed the async_pool_poll branch 2 times, most recently from b036a10 to 3e810ad Compare September 25, 2024 13:07
Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with nitpicks.

bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized splitting this operation brings it in two different contexts, see comments. I have not thought much about how to proceed so my questions are open questions.

bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_conn_pool.c Show resolved Hide resolved
Copy link
Member

@nigoroll nigoroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out by others, locking is not correct yet.
I think we should just have a single lock for dead_pools, and insert each dead cp under that lock.
VCP_RelPoll (if we keep it like this) can just swap the list head under that mtx, work all entries unlocked, remembering those still active (n_kill > 0) on a new list, and finally appending the new list back to dead_pools under the mtx.
Alternatively, why do we not just hand dead pool handling to a worker thread?

edit: this facility has a mascot

@dridi
Copy link
Member

dridi commented Sep 26, 2024

As pointed out by others, locking is not correct yet. I think we should just have a single lock for dead_pools, and insert each dead cp under that lock. VCP_RelPoll (if we keep it like this) can just swap the list head under that mtx, work all entries unlocked, remembering those still active (n_kill > 0) on a new list, and finally appending the new list back to dead_pools under the mtx.

I guess we could work with a mutex dedicated to the dead pools and not even care about the per-vcp lock. Since we care about observing n_kill == 0 we don't need to formally synchronize with waiters, we can only see n_kill after the waiter woke up for all connections and since this happens in a CLI hook, it will eventually be seen.

Alternatively, why do we not just hand dead pool handling to a worker thread?

Isn't a CLI hook good-enough for this kind of cleanup sweep?

@cartoush cartoush force-pushed the async_pool_poll branch 2 times, most recently from 6e5f482 to d1fae69 Compare September 27, 2024 09:17
@nigoroll
Copy link
Member

Alternatively, why do we not just hand dead pool handling to a worker thread?

Isn't a CLI hook good-enough for this kind of cleanup sweep?

It probably is, but I do not know, this was just an option to maybe consider.

I would really like to see swapping the tree head and working the cleanup outside any connection pool mtx.

@cartoush
Copy link
Contributor Author

cartoush commented Oct 2, 2024

Added a dead_pools_mtx instead of using conn_pools_mtx

bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm OK with this now, but I'm running out of brain cycles.

bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
@cartoush
Copy link
Contributor Author

cartoush commented Oct 3, 2024

calls to vcp_free had not been turned into calls to vcp_destroy, its fixed now

bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconciling the desire to loop without holding the lock with the reuse of the VRBT_ENTRY() field already present in struct conn_pool is complicating the change above my acceptable threshold.

If we put the cursor on the lock, then we should probably have a tail queue of dead connections as suggested before. We keep constant-time operations, including the ability to easily steal and reinject dead pools before and after the cleanup loop.

@nigoroll thoughts?

Lck_Unlock(&dead_pools_mtx);
return;
}
TAKE_OBJ_NOTNULL(dead.rbh_root, &dead_pools.rbh_root, CONN_POOL_MAGIC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can safely do that, it should look something like this instead:

dead = dead_pools;
VRBT_INIT(&dead_pools);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the TAKE_OBJ is a smart idea, but using the tree macros as intended would be my preference also.

Comment on lines 308 to 323
if (!VRBT_EMPTY(&dead)) {
Lck_Lock(&dead_pools_mtx);
VRBT_INSERT(vrb, &dead_pools, dead.rbh_root);
Lck_Unlock(&dead_pools_mtx);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VRBT_INSERT() inserts a single entry, not another tree, or does it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understood a single entry is a tree with only 1 element, so inserting the root of a tree should insert the whole tree but I could be wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually pretty smart, IMHO, and I did not think of it: I originally thought we would iterate the leftovers and insert one by one, but as we do not care about ordering in the dead_pools tree, this should work.

Copy link
Member

@nigoroll nigoroll Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the tree macros and indeed, the comparison function is only used for insert and lookup (as expected). So yeah, I think this is still nicer than the alternative of using a union with list pointers in place of the tree pointers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting a union, but if this is safe but "incorrect", we should add red tape to remind our future selves why this is good enough.

@nigoroll
Copy link
Member

Except for the one or two nitpicks, I like the structure of the patch now. I do not understand which complication @dridi is seeing, IMHO, this is a simplification.

@dridi
Copy link
Member

dridi commented Oct 14, 2024

I do not understand which complication @dridi is seeing, IMHO, this is a simplification.

That was in the optic of not misusing the VRBT API on purpose. I'll comment in that thread.

@cartoush
Copy link
Contributor Author

Fixed comments and added comments to the VRBT_INSERT hack

Copy link
Member

@dridi dridi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

bin/varnishd/cache/cache_conn_pool.c Outdated Show resolved Hide resolved
Before, we would wait in a blocking loop for
connection pools to be freed before deleting
them for good.

This commit transforms this blocking loop into a
CLI hook that will delete the freed connection
pools at each trigger.

Refs varnishcache#4064

Better diff with the --ignore-all-space option.
@nigoroll
Copy link
Member

@bsdphk two people think this one is ripe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants