Asynchronous `store` protocol #3672

whisperity · 2022-05-20T17:39:17Z

This is a write-up of a technical specification for an idea that I am sure we have been having for several years now, but usually was discussed in private or only orally.

store, more specifically the API call to massStoreRun waits and blocks until the result of the store is returned to the client. As processing a store action takes a non-trivial amount of time on the serverside (and this operation is also executed only on one thread!), this means that returning from massStoreRun itself takes a non-trivial amount of time. The problem surfaces if the connection between the client and the server coughs, chokes, misbehaves, because it is only the networking stack in the kernel that is keeping the door open for the reply to arrive. While a disappearing client is no problem from the server's side and data won't be lost, CI jobs can hang indefinitely, or scripts that expect data to be available for cmd query after a return from store will break apart.

The proposal is to switch the blocking from relying on externalia like "the TCP stack" into a softer, but more local, blocking mechanism, while also turning the API itself asynchronous. This proposal is backwards compatible.

Database changes

We already have information in RunLock as to what runs are undergoing a store. However, this is not enough, we need to store some semi-temporary information about store "attempts" or "sessions". This could go into its own table, per product, as this needs to be kept for a time even if the run lock is released. This table would contain the run name, a unique session token/identifier, and some status flag. The identifier might be auto-incremented, or a hash of the time when the lock was initialised, it is not a "secret" resource.

These identifiers should be garbage collected in the usual process.

CLI changes

There are no changes needed on the CLI. Optionally, the store command might be extended with a --no-block argument which makes it immediately exit and return to shell once the server started processing the data, in case the user does not care about when the operation finished.

API changes

A new endpoint, hereby referred to as massStoreRunAsync shall be created. This function should return the aforementioned "store session token", or throw. The semantics of this function should be that once the server can confirm that processing of results can reasonably continue (cheap early checks like permission, the fact that the data is validly encoded before unpacking it, etc. should be performed) it returns.

To query whether the store operation has succeeded or not, a new function should be added, which returns status information (from the database) about the store. The information needed here is malleable, but at least a boolean: "Is the operation still in progress?". (Consuming a successful result might want to remove the related information from the database, to ease garbage collection times at startup.)

Implementation changes

The store command should, once received the token from the server, close the connection and use the token to every once in a while poll the server for the status of the operation. Deciding a good interval here could be tough, but trivial choices like "every 10 sec" or "every 30 sec" should be fine as a prototype. As far as I gathered, we already perform a counting of reports during store (which is weird!) but if this information is available, the initial wait time, and the requery interval could be assumed using it.

Inbetween queries, the store binary should sleep using OS primitives for sleeping a process, but without having to rely on the network stack. Every query is its own connection, like cmd ....

Obsoletes #4039.

The text was updated successfully, but these errors were encountered:

whisperity added enhancement 🌟 API change 📄 Content of patch changes API! RFC ✒️ Request For Comments server 🖥️ usability 👍 Usability-related features refactoring 😡 ➡️ 🙂 Refactoring code. labels May 20, 2022

whisperity added the database 🗄️ Issues related to the database schema. label Sep 19, 2023

whisperity self-assigned this Sep 19, 2023

whisperity added this to the release 6.24.0 milestone Oct 9, 2023

whisperity mentioned this issue Oct 11, 2023

feat(store): Explicitly time the client out if the connection hung #4039

Merged

whisperity modified the milestones: release 6.24.0, release 6.25.0 Feb 22, 2024

whisperity mentioned this issue Feb 22, 2024

[server] Multiprocessed schema upgrades for -j products in parallel #4171

Closed

whisperity mentioned this issue May 29, 2024

Give ability to detect if server is loading data. #2417

Open

whisperity mentioned this issue Jul 18, 2024

refactor(server/config): Implement checked access to server configuration, decoupled from SessionManager #4170

Open

This was referenced Aug 11, 2024

Make storing a run not wait until storage is complete #1306

Open

feat(server): Asynchronous server-side background task execution #4317

Open

This was referenced Aug 19, 2024

feat(cmd): Implemented a CLI for task management #4318

Open

feat: massStoreRunAsynchronous() #4326

Open

whisperity mentioned this issue Sep 20, 2024

doc: Background tasks #4352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous `store` protocol #3672

Asynchronous `store` protocol #3672

whisperity commented May 20, 2022 •

edited

Loading

Asynchronous store protocol #3672

Asynchronous store protocol #3672

Comments

whisperity commented May 20, 2022 • edited Loading

Database changes

CLI changes

API changes

Implementation changes

Asynchronous `store` protocol #3672

Asynchronous `store` protocol #3672

whisperity commented May 20, 2022 •

edited

Loading