Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous store protocol #3672

Open
whisperity opened this issue May 20, 2022 · 0 comments · May be fixed by #4326
Open

Asynchronous store protocol #3672

whisperity opened this issue May 20, 2022 · 0 comments · May be fixed by #4326
Assignees
Labels
API change 📄 Content of patch changes API! database 🗄️ Issues related to the database schema. enhancement 🌟 refactoring 😡 ➡️ 🙂 Refactoring code. RFC ✒️ Request For Comments server 🖥️ usability 👍 Usability-related features

Comments

@whisperity
Copy link
Contributor

whisperity commented May 20, 2022

This is a write-up of a technical specification for an idea that I am sure we have been having for several years now, but usually was discussed in private or only orally.

store, more specifically the API call to massStoreRun waits and blocks until the result of the store is returned to the client. As processing a store action takes a non-trivial amount of time on the serverside (and this operation is also executed only on one thread!), this means that returning from massStoreRun itself takes a non-trivial amount of time. The problem surfaces if the connection between the client and the server coughs, chokes, misbehaves, because it is only the networking stack in the kernel that is keeping the door open for the reply to arrive. While a disappearing client is no problem from the server's side and data won't be lost, CI jobs can hang indefinitely, or scripts that expect data to be available for cmd query after a return from store will break apart.

The proposal is to switch the blocking from relying on externalia like "the TCP stack" into a softer, but more local, blocking mechanism, while also turning the API itself asynchronous. This proposal is backwards compatible.

Database changes

We already have information in RunLock as to what runs are undergoing a store. However, this is not enough, we need to store some semi-temporary information about store "attempts" or "sessions". This could go into its own table, per product, as this needs to be kept for a time even if the run lock is released. This table would contain the run name, a unique session token/identifier, and some status flag. The identifier might be auto-incremented, or a hash of the time when the lock was initialised, it is not a "secret" resource.

These identifiers should be garbage collected in the usual process.

CLI changes

There are no changes needed on the CLI. Optionally, the store command might be extended with a --no-block argument which makes it immediately exit and return to shell once the server started processing the data, in case the user does not care about when the operation finished.

API changes

A new endpoint, hereby referred to as massStoreRunAsync shall be created. This function should return the aforementioned "store session token", or throw. The semantics of this function should be that once the server can confirm that processing of results can reasonably continue (cheap early checks like permission, the fact that the data is validly encoded before unpacking it, etc. should be performed) it returns.

To query whether the store operation has succeeded or not, a new function should be added, which returns status information (from the database) about the store. The information needed here is malleable, but at least a boolean: "Is the operation still in progress?". (Consuming a successful result might want to remove the related information from the database, to ease garbage collection times at startup.)

Implementation changes

The store command should, once received the token from the server, close the connection and use the token to every once in a while poll the server for the status of the operation. Deciding a good interval here could be tough, but trivial choices like "every 10 sec" or "every 30 sec" should be fine as a prototype. As far as I gathered, we already perform a counting of reports during store (which is weird!) but if this information is available, the initial wait time, and the requery interval could be assumed using it.

Inbetween queries, the store binary should sleep using OS primitives for sleeping a process, but without having to rely on the network stack. Every query is its own connection, like cmd ....


Obsoletes #4039.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change 📄 Content of patch changes API! database 🗄️ Issues related to the database schema. enhancement 🌟 refactoring 😡 ➡️ 🙂 Refactoring code. RFC ✒️ Request For Comments server 🖥️ usability 👍 Usability-related features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant