Add cancellation to RPC handlers #846

aryanjassal · 2024-11-19T02:40:54Z

Description

Without proper abortion or cancellation for a RPC, it will keep running and conclude first. This ends up adding a significant amount of time when closing the agent, as all outstanding RPC calls must resolve beforehand.

Issues Fixed

Fixes Review and refactor RPC handlers to handle cancellation properly #832 (FIX ENG-432)

Tasks

Add cancellation to node RPC
- NodesClaimNetworkSign
- NodesClaimNetworkVerify
- NodesClaimsGet
- NodesClosestActiveConnectionsGet
- NodesClosestLocalNodesGet
- NodesConnectionSignalFinal
- NodesConnectionSignalInitial
- NodesCrossSignClaim
- NotificationsSend
- VaultsGitInfoGet
- VaultsGitPackGet
- VaultsScan

Final checklist

linear · 2024-11-19T02:40:56Z

ENG-432 Review and refactor RPC handlers to handle cancellation properly

aryanjassal · 2024-11-19T02:58:33Z

For each RPC handler, we need to provide a cancel, meta, and ctx parameter. The type of cancel is (reason?: any) => void, meta is Record<string, JSONValue> | undefined, and ctx is ContextTimed. The type for ctx is easy to import and use, but it is long and error-prone to have to manually enter the type for cancel and meta.

Should I make a commit into js-rpc staging which exports the types properly, or should I just define it inside Polykey itself?

tegefaulkes · 2024-11-19T03:21:30Z

Don't bother defining the type for it unless you're actually using it. So you can just name the import parameter as _import it indicate it's not being used. I think I had the types of these parameters being inferred properly at some point. There's some oddity with abstract classes that make the types not work properly here.

aryanjassal · 2024-11-19T03:28:15Z

Don't bother defining the type for it unless you're actually using it. So you can just name the import parameter as _import it indicate it's not being used. I think I had the types of these parameters being inferred properly at some point. There's some oddity with abstract classes that make the types not work properly here.

I just did this to mute the warnings for type hints, as TS assumes it is any by default, and warns me about it. For unused methods, I do indeed prefix it with an underscore to avoid the usage warnings.

src/git/utils.ts

CMCDragonkai · 2024-12-05T21:01:55Z

Please make sure to review the actual situation where this can cause a big problem: MatrixAI/Polykey-CLI#264 (comment).

If you're fixing it, a test must be created to prevent regression of that specific problem.

aryanjassal · 2024-12-06T06:06:56Z

Please make sure to review the actual situation where this can cause a big problem: MatrixAI/Polykey-CLI#264 (comment).

If you're fixing it, a test must be created to prevent regression of that specific problem.

All this does is add support for cancelling RPC which are in the middle of something. This would ensure that a RPC won't keep the entire agent alive in a deadlock state.

I'm not sure how this relates to adding support for streaming progress updates back. They can still be done just fine after this PR is implemented, no?

aryanjassal · 2024-12-10T06:13:11Z

Polykey/src/client/handlers/VaultsClone.ts

Lines 28 to 42 in 19e5278

    
           const { 
        
             nodeId, 
        
           }: { 
        
             nodeId: NodeId; 
        
           } = validateSync( 
        
             (keyPath, value) => { 
        
               return matchSync(keyPath)( 
        
                 [['nodeId'], () => ids.parseNodeId(value)], 
        
                 () => value, 
        
               ); 
        
             }, 
        
             { 
        
               nodeId: input.nodeIdEncoded, 
        
             }, 
        
           );

This whole section of parsing nodeId can be shrunk down into this one-line solution, but we aren't using this solution. I asked Brian and he told me that using validateSync and matchSync is just a standard for parsing in handlers, but I can't help but feel that it is a huge waste.

const nodeId = ids.parseNodeId(input.nodeIdEncoded);

This single line is more concise and performant than the validateSync and matchSync block. There have been cases where I had to use the validateSync block to actually do parsing. This has mostly been a case with RawHandler RPC handlers. Using this solution everywhere seems like overkill, so shouldn't we try to simplify the code where we can?

@CMCDragonkai

aryanjassal · 2024-12-10T07:24:55Z

isomorphic-git/isomorphic-git#1867

isomorphic-git does not support cancellation and abort signals. This means that there is no native way to cancel a long-running operation, like git clone.

My idea is to use a deconstructed promise and reject the promise when the context sends an abort signal, but I'm not sure how this would affect other things, especially the underlying code in isomorphic-git.

CMCDragonkai · 2024-12-10T19:21:30Z

This requires fast check model checking applied. I don't trust it until sufficient variations of side effects are tested and the right defaults are found. Along with benchmarking.

aryanjassal · 2024-12-12T03:06:40Z

We check for abortion in multiple ways in Polykey. In some places, it is ctx.signal.throwIfAborted() and in others, it is if (ctx.signal.aborted) throw ctx.signal.reason;

Which one should we follow consistently throughout the repo?
@tegefaulkes

aryanjassal · 2024-12-12T06:48:54Z

await vaultManager.pullVault(
  {
    vaultId: vaultId,
    pullNodeId: nodeId,
    pullVaultNameOrId: pullVaultId,
    tran: tran,
  },
  ctx,
);

Here, the parameters are being sent as a JSON object for some reason. I want to know when a JSON object should be used for parameters and when should they be passed as-is.

Before, the ctx field did not exist here, so it looked fine. But after adding the context, having the JSON object there just adds unnecessary complications and it can be omitted.

await vaultManager.pullVault({
  vaultId: vaultId,
  pullNodeId: nodeId,
  pullVaultNameOrId: pullVaultId,
  tran: tran,
});

The one benefit I see is that the parameters don't need to be ordered and can be set via setting key-value pairs like in Python. However, I can't do the same with the context, as it needs a special decorator which doesn't work well with JSON objects.

@timedCancellable(true)
public async pullVault(
  {
    vaultId,
    pullNodeId,
    pullVaultNameOrId,
    tran,
  }: {
    vaultId: VaultId;
    pullNodeId?: NodeId;
    pullVaultNameOrId?: VaultId | VaultName;
    tran?: DBTransaction;
  },
  @context ctx: ContextTimed, // The decorator can't go in JSON object
): Promise<void>;

How should I handle these cases, @tegefaulkes?

aryanjassal · 2024-12-13T01:49:02Z

I went over a couple of things with Brian in a meeting today, and this is the key takeaways.

There is no real easy way to cancel git operations. Instead of racing promises, he suggested an alternate method. I need to proxy the efs which takes in a context, and if the context is aborted, the efs will throw back aborted errors and cancel the git operation. This is not a final solution as it's quite hacky. I think the best solution would be to implement signal support upstream.
Abortion is handled inconsistently throughout the codebase. Some places use if (ctx.signal.aborted) throw ctx.signal.reason); and others use ctx.signal.throwIfAborted(). The throwIfAborted method is a newer addition, and as such, not everything has been updated yet. So, update all abortion to use throwIfAborted.
Wherever I have defined contexts, they are defined as ctx?: Partial<ContextTimed>. This is not quite correct. The correct way to do this is ctx?: Partial<ContextTimedInput>. This will allow passing numbers to convert to a timeout automatically with the decorator. In places, I also was using ContextCancellable thinking ContextTimed is just a timeout, but I later realised that ContextTimed is an intersection with ContextCancellable, so I need to use ContextTimed everywhere instead.
In tests, I did {} as ContextCancellable to create an empty object. Some tests failed as they couldn't access the signal property of the context, but some tests never used the context at all, so all the tests passed even with this. This all needs to change to const ctx = { signal: abortController.signal } as ContextTimed;. The as keyword is used because ContextTimed expects a timer field, too, but its a hassle to set up and is not even used.
In places within the Vaults domain, we have used a JSON object to pass in parameters. This is to avoid spamming a bunch of undefined if a parameter won't be defined. However, some objects have the database transaction also bundled up inside this object. The transaction and the context must be defined outside like function abc(..., ctx, tran); so this will need to change in the code, too.
During a git.clone operation, when I try to cancel the gitPackGet stream, it throws an error along the lines of "Cannot execute readableCancel". I need to look into why this is happening and if this is a js-quic bug or not.

aryanjassal · 2024-12-13T03:52:02Z

src/vaults/VaultInternal.ts

  protected async start_(
    fresh: boolean,
    tran: DBTransaction,
+    ctx?: Partial<ContextTimed>,
    vaultName?: VaultName,
-  ) {
+  ): Promise<void>;
+  @timedCancellable(true)
+  protected async start_(
+    fresh: boolean,
+    tran: DBTransaction,
+    @context ctx: ContextTimed,
+    vaultName?: VaultName,
+  ): Promise<void> {


Here, as tran is expected to be present, and an overload also expects ctx to be present, the vaultName is the only optional parameter. Thus, it has been pushed back to the end of the argument list. This does not follow how the rest of the project has been formatted, but there is little that can be done here.

However, I am curious as to the reason for keeping the vault name optional. Does this mean that vaults can exist without vault names? I guess it would make sense, as a vault name is merely an alias and the vault ID is the actual identifier. But I'd still want to know the actual reason.

As a workaround, I can pack other parameters inside a JSON object and pass the context and transaction identifier outside like regular parameters.

Yea vault names are optional. They can be reset, or they can be cleared.

The signature might be wrong.

aryanjassal · 2024-12-13T05:12:09Z

I am kind of nitpicking here, but I noticed this minor point I want to bring up and get some clarification on.

We rely on getting the class names for logger messages. This is done differently for static and instance methods for a class. In static methods, this.name is required to get the class name. In instance methods, this.constructor.name is required. This inconsistency is difficult to know and can bite unknowing newcomers in their posteriors.

Can't we do something like a getter that returns the class name reliably in all contexts?

class VaultInternal {
  static get className(): string {
      return this.name;
  }

  get className(): string {
      return this.constructor.className;
  }

  // Now the class name can be obtained in any context using
  // this.className
}

Note that this code was generated by ChatGPT and has not been tested yet. This probably won't be the final implementation, but I am just asking if this idea can be implemented.

aryanjassal · 2024-12-13T05:17:51Z

We are also inconsistently using async function abc() and async function abc(): Promise<void>. Should a method be decorated with the return type if it is the default return type, or should the code detect that implicitly?

I think we should be explicit about this. Thoughts, @tegefaulkes?

aryanjassal · 2024-12-13T06:59:02Z

      // This should really be an internal property
      // get whether this is remote, and the remote address
      // if it is, we consider this repo an "attached repo"
      // this vault is a "mirrored" vault
      if (
        (await tran.get([
          ...this.vaultMetadataDbPath,
          VaultInternal.remoteKey,
        ])) != null
      ) {
        // Mirrored vaults are immutable
        throw new vaultsErrors.ErrorVaultRemoteDefined();
      }

This was a comment made in VaultInternal.ts. I don't think I should get this done as a part of this PR. Should I just make a new issue to track this?

CMCDragonkai · 2024-12-13T06:59:56Z

I am kind of nitpicking here, but I noticed this minor point I want to bring up and get some clarification on.

We rely on getting the class names for logger messages. This is done differently for static and instance methods for a class. In static methods, this.name is required to get the class name. In instance methods, this.constructor.name is required. This inconsistency is difficult to know and can bite unknowing newcomers in their posteriors.

Can't we do something like a getter that returns the class name reliably in all contexts?
class VaultInternal {
  static get className(): string {
      return this.name;
  }

  get className(): string {
      return this.constructor.className;
  }

  // Now the class name can be obtained in any context using
  // this.className
}
Note that this code was generated by ChatGPT and has not been tested yet. This probably won't be the final implementation, but I am just asking if this idea can be implemented.

Probably doesn't work that's why it's different.

aryanjassal · 2024-12-13T07:02:26Z

Probably doesn't work that's why it's different.

But is this a good thing to invest some time and effort in? This can probably be a quick fix as a part of this PR itself, or I can make an issue which explores this more in depth and maybe explores a way to get this in other repos without defining it in every class.

aryanjassal · 2024-12-13T07:22:45Z

src/vaults/fileTree.ts has been sitting in the vaults domain without usage. Now that we have basically confirmed glob pattern matching will not be used, should the file be removed? It would still exist in git history and we will be able to recover it if needs be.

Another option would be to repurpose the file to assist with other issues like #822. We shouldn't be keeping dead code unless there's a good reason.

aryanjassal · 2024-12-13T07:37:28Z

Polykey/src/vaults/utils.ts

Lines 54 to 68 in 72c7dfb

    
           async function* readDirRecursively( 
        
             fs, 
        
             dir = '.', 
        
           ): AsyncGenerator<string, void, void> { 
        
             const dirEntries = await fs.promises.readdir(dir); 
        
             for (const dirEntry of dirEntries) { 
        
               const res = path.join(dir, dirEntry.toString()); 
        
               const stat = await fs.promises.stat(res); 
        
               if (stat.isDirectory()) { 
        
                 yield* readDirRecursively(fs, res); 
        
               } else if (stat.isFile()) { 
        
                 yield res; 
        
               } 
        
             } 
        
           }

The type for fs parameter is any, but it's usage is discouraged. I skimmed the js-encryptedfs code but couldn't find any reference to a common base for both fs and efs. Is there a type we can use as a common type for both instead of any?

CMCDragonkai · 2024-12-15T19:51:24Z

Did you review this: https://github.com/MatrixAI/js-rpc?tab=readme-ov-file#specifications?

CMCDragonkai · 2024-12-15T20:13:21Z

src/client/handlers/AuditEventsGet.ts

    }> & {
      paths: Array<TopicSubPath>;
    },
-    _cancel,
-    _meta,
+    _cancel: (reason?: any) => void,
+    _meta: Record<string, JSONValue>,
    ctx: ContextTimed,
  ): AsyncGenerator<ClientRPCResponseResult<AuditEventSerialized>> {
    const { audit }: { audit: Audit } = this.container;


I'm a bit confused here since I didn't work directly on js-rpc.

But if a handler receives a cancel function. What exactly is it cancelling?

Is it cancelling itself? I find this API a bit strange.

From first principles, the idea is that a caller can "abort" its call, by using the controller to abort: controller.abort(reason);

This is transmitted to the handler side, and tells the handler that it is aborted. You can check synchronously with ctx.signal.aborted, and then the handler essentially deals with this problem gracefully.

In a single-process context, if the caller proceeds to call cancel, the caller's promise is rejected with the cancellation reason immediately, but the handler side may continue to function (this example comes from cancellable.test.ts in js-contexts:

const fCancellable = cancellable(f); // Signal is aborted afterwards const pC1 = fCancellable(); pC1.cancel('cancel reason'); await expect(pC1).rejects.toBe('cancel reason');

Therefore something similar happens across multi-process context as in the case of RPC. Essentially the call site rejects immediately with a reason, but the handler is able to at their discretion to deal with how it wants to cancel.

The problem right now I see in PK's rpc is that it does not seem that this abort signal is being properly handled in every case. The patterns used to handle abort signal from the caller side should be studied in js-contexts and js-rpc tests should be representing this correctly.

Now for the cancel function on the handler side, I don't quite understand this... why does this exist?

If it is to "cancel" the handling, I would argue that you should just raise an exception, or return early, there's no need to have a cancel method. What exactly is it cancelling?

I think this concept generally makes sense under unary handler and server streaming handlers.

In the case of client streaming and duplex streaming, it can be get a bit more complicated. Because now the client is also sending a stream, it's not just the case where the server side is handling a request and returning 1 response or multiple responses, but the client side can send multiple requests, and in cases like duplex streaming, it can also be interleaved order of request, response, response, request request... etc, like a "conversation". That's what I'm referring to as a RPC conversation.

I remember an older issue in js-rpc where I pointed out that the implementation of js-rpc is missing client side transmission of errors: MatrixAI/js-rpc#18.

This could mean that one could also cancel from the server side handling. That would indicate the idea that the SERVER no longer cares about the client's next request. This is important because normally under unary and server streaming, cancelling means "as a client I do not care about your next response", whereas in the case of client streaming and duplex streaming, cancelling from the server side means "as the server I do not care about your next request". Swap request/response for just "message" and you get the idea.

However if you cancel, then you want to make sure that you're also getting the abort reason properly. This is not currently documented or demonstrated in the js-rpc README.

@aryanjassal I expect you to update the README to explain the new implementation and model checked tests in js-rpc to achieve this.

In the case of client streaming and duplex streaming:

abstract class ClientHandler< Container extends ContainerType = ContainerType, Input extends JSONRPCRequestParams = JSONRPCRequestParams, Output extends JSONRPCResponseResult = JSONRPCResponseResult, > extends Handler<Container, Input, Output> { public async handle( /* eslint-disable */ input: AsyncIterableIterator<Input>, cancel: (reason?: any) => void, meta: Record<string, JSONValue> | undefined, ctx: ContextTimed, /* eslint-disable */ ): Promise<Output> { throw new ErrorRPCMethodNotImplemented(); } }

abstract class DuplexHandler< Container extends ContainerType = ContainerType, Input extends JSONRPCRequestParams = JSONRPCRequestParams, Output extends JSONRPCResponseResult = JSONRPCResponseResult, > extends Handler<Container, Input, Output> { /** * Note that if the output has an error, the handler will not see this as an * error. If you need to handle any clean up it should be handled in a * `finally` block and check the abort signal for potential errors. */ public async *handle( /* eslint-disable */ input: AsyncIterableIterator<Input>, cancel: (reason?: any) => void, meta: Record<string, JSONValue> | undefined, ctx: ContextTimed, /* eslint-disable */ ): AsyncIterableIterator<Output> { throw new ErrorRPCMethodNotImplemented('This method must be overwrtitten.'); } }

The cancel function here should mean send a cancel signal to the client side. But @tegefaulkes said that this is being used to cancel the webstream on the server side. This is incorrect and should not be done this way.

The unary and server streaming handler should not receive cancel as a parameter, it should never be used.

In the case of client streaming and server streaming, the cancel should not be cancelling the underlying stream, it should be about sending a abort signal to the client side.

This needs to be a new issue in js-rpc to refactor the cancel parameter being passed into all the high level handlers. Remove from unary and server streaming, while changing the implementation for client streaming and duplex streaming. You have to change it so that client streaming and duplex streaming on the call-site also gives back a ContextCancellable.

This needs to be solved with: MatrixAI/js-rpc#18

I think it's not clear in the js-rpc documentation how you're supposed to properly "cancel" the server handler for all 4 call types. Even in the case of unary calls, you don't get back a PromiseCancellable where you can do:

const callP = rpcClient.methods.testMethod(); callP.cancel();

@aryanjassal this needs a new issue.

We want to use PromiseCancellable on the client callers.

This means it should be possible to do callP.cancel(reason);

This should translate into a stream cancel reason using the reasonToCode.

On the server side where codeToReason occurs, that should try to recover an exception object or a generic exception object. (This will require indexing all exceptions by code numbers).

The handler on the server side should then see it as part of the ctx.signal.reason.

High level handlers on the server side should not have access to the cancel parameter.

@tegefaulkes mentioned that the cancel parameter is being used by isogit. However this shouldn't be done this, way I believe it should be possible to bridge the ctx.signal into the cancellation of the isogit http stream: Add cancellation to RPC handlers #846 (comment), it shouldn't be using the cancel parameter directly.

I don't know if the cancel parameter is needed at the handler level at all. It may be required in the case of client streaming/duplex streaming.

On point 7, it's not being used by isogit. It's just Isogit has no built in cancellation so we'd need to use the cancel() to propagate the cancellation to the stream layer to end the cloning and pulling operation earlier.

MatrixAI/js-rpc#77

CMCDragonkai · 2024-12-15T20:15:05Z

Polykey/src/client/handlers/VaultsClone.ts

Lines 28 to 42 in 19e5278

const {

nodeId,

}: {

nodeId: NodeId;

} = validateSync(

(keyPath, value) => {

return matchSync(keyPath)(

[['nodeId'], () => ids.parseNodeId(value)],

() => value,

);

},

{

nodeId: input.nodeIdEncoded,

},

);

This whole section of parsing nodeId can be shrunk down into this one-line solution, but we aren't using this solution. I asked Brian and he told me that using validateSync and matchSync is just a standard for parsing in handlers, but I can't help but feel that it is a huge waste.
const nodeId = ids.parseNodeId(input.nodeIdEncoded);
This single line is more concise and performant than the validateSync and matchSync block. There have been cases where I had to use the validateSync block to actually do parsing. This has mostly been a case with RawHandler RPC handlers. Using this solution everywhere seems like overkill, so shouldn't we try to simplify the code where we can?

@CMCDragonkai

No it is not equivalent. There are certain exceptions we expect from the validator and matcher.

CMCDragonkai · 2024-12-15T20:23:06Z

Please make sure to review the actual situation where this can cause a big problem: MatrixAI/Polykey-CLI#264 (comment).
If you're fixing it, a test must be created to prevent regression of that specific problem.

All this does is add support for cancelling RPC which are in the middle of something. This would ensure that a RPC won't keep the entire agent alive in a deadlock state.

I'm not sure how this relates to adding support for streaming progress updates back. They can still be done just fine after this PR is implemented, no?

Yes so refer to my comment here: #846 (comment).

Any progress update system SHOULD not be part of the regular RPC call, that over complicates our API and makes it harder to integrate. Remember even a unary call can trigger a long running operation on the agent side (and downstream effects magnify this due to "Call Amplification").

All progress updates depend on the measurement point. A client may measure between calls it makes by chunking up its calls. An example is like MULTIPLE RPC call or a client stream that streams chunks at a time expecting a synchronous response. This doesn't require the server side to necessarily have a separate protocol to give progress updates, it's implicit to the algorithm which makes it elegant.

On the other hand, let's say you want to "break up" a long running async operation to get progress updates, rather than building a separate protocol for that, you could try to just do chunking of the request.

On the other hand, in some cases it really doesn't make sense to chunk the request, since the request is spatially/structurally small on the client side, it just takes a long time on the server side. You don't necessarily want to change your unary call into a server streaming call cause that changes the API and makes it needlessly complex for users who just want to do a regular call.

So in those scenarios, you introduce out-of-band RPC calls to get this information. This is where #444 comes into play. We can make use of a generalized push outlet, and stream information from there, that allows clients at their discretion to get extra progress information from a separate system.

Another way is to introduce a streaming variant of the same call, this enables clients again to use a more complex API if they want the progress update version of the same call. This is common in functional programming where one creates f and f' as part of their API.

CMCDragonkai · 2024-12-15T20:31:35Z

isomorphic-git/isomorphic-git#1867

isomorphic-git does not support cancellation and abort signals. This means that there is no native way to cancel a long-running operation, like git clone.

My idea is to use a deconstructed promise and reject the promise when the context sends an abort signal, but I'm not sure how this would affect other things, especially the underlying code in isomorphic-git.

So isogit is a third party library. We often come across third party library problems, if it is structurally a problem, we fork them and implement it ourselves - we can do so by creating js-virtualgit.

However I remember that in the case of isogit, we actually hacked into the HTTP stream itself, where the HTTP stream is virtualised ontop of our RPC stream protocol.

That would mean, you could implement a sort of abortion at the raw RPC stream level, by aborting it at the stream.

This is because when we use isogit's features, we are actually wrapping its entire IO, so by being able to control the IO we can inject an abortion at the stream rather than relying on the internal mechanism of isogit to do so. This is absolutely possible but you need to diagram out a block diagram of the wrapped structure to demonstrate.

CMCDragonkai · 2024-12-15T20:33:23Z

We check for abortion in multiple ways in Polykey. In some places, it is ctx.signal.throwIfAborted() and in others, it is if (ctx.signal.aborted) throw ctx.signal.reason;

Which one should we follow consistently throughout the repo? @tegefaulkes

Obviously ctx.signal.throwIfAborted() is better choice here.

However beware that this only works synchronously, when dealing with asynchronous operations, you either want to propagate that signal downwards, so that all asynchronous loop steps into a check of the signal property. Sometimes that is not available, so you have to integrate it into the chunked loop step.

CMCDragonkai · 2024-12-15T20:34:43Z

await vaultManager.pullVault(
  {
    vaultId: vaultId,
    pullNodeId: nodeId,
    pullVaultNameOrId: pullVaultId,
    tran: tran,
  },
  ctx,
);
Here, the parameters are being sent as a JSON object for some reason. I want to know when a JSON object should be used for parameters and when should they be passed as-is.

Before, the ctx field did not exist here, so it looked fine. But after adding the context, having the JSON object there just adds unnecessary complications and it can be omitted.
await vaultManager.pullVault({
  vaultId: vaultId,
  pullNodeId: nodeId,
  pullVaultNameOrId: pullVaultId,
  tran: tran,
});
The one benefit I see is that the parameters don't need to be ordered and can be set via setting key-value pairs like in Python. However, I can't do the same with the context, as it needs a special decorator which doesn't work well with JSON objects.
@timedCancellable(true)
public async pullVault(
  {
    vaultId,
    pullNodeId,
    pullVaultNameOrId,
    tran,
  }: {
    vaultId: VaultId;
    pullNodeId?: NodeId;
    pullVaultNameOrId?: VaultId | VaultName;
    tran?: DBTransaction;
  },
  @context ctx: ContextTimed, // The decorator can't go in JSON object
): Promise<void>;
How should I handle these cases, @tegefaulkes?

I don't understand this, js-rpc must take an input JSON object. There's no choice around here. The ctx is part of the meta-protocol that's why it's not part of the input JSON object message.

CMCDragonkai · 2024-12-15T20:35:09Z

We are also inconsistently using async function abc() and async function abc(): Promise<void>. Should a method be decorated with the return type if it is the default return type, or should the code detect that implicitly?

I think we should be explicit about this. Thoughts, @tegefaulkes?

You should make it explicit. Sometimes people are lazy. Don't propagate laziness.

CMCDragonkai · 2024-12-15T20:41:38Z

I went over a couple of things with Brian in a meeting today, and this is the key takeaways.

There is no real easy way to cancel git operations. Instead of racing promises, he suggested an alternate method. I need to proxy the efs which takes in a context, and if the context is aborted, the efs will throw back aborted errors and cancel the git operation. This is not a final solution as it's quite hacky. I think the best solution would be to implement signal support upstream.

Abortion is handled inconsistently throughout the codebase. Some places use if (ctx.signal.aborted) throw ctx.signal.reason); and others use ctx.signal.throwIfAborted(). The throwIfAborted method is a newer addition, and as such, not everything has been updated yet. So, update all abortion to use throwIfAborted.

Wherever I have defined contexts, they are defined as ctx?: Partial<ContextTimed>. This is not quite correct. The correct way to do this is ctx?: Partial<ContextTimedInput>. This will allow passing numbers to convert to a timeout automatically with the decorator. In places, I also was using ContextCancellable thinking ContextTimed is just a timeout, but I later realised that ContextTimed is an intersection with ContextCancellable, so I need to use ContextTimed everywhere instead.

In tests, I did {} as ContextCancellable to create an empty object. Some tests failed as they couldn't access the signal property of the context, but some tests never used the context at all, so all the tests passed even with this. This all needs to change to const ctx = { signal: abortController.signal } as ContextTimed;. The as keyword is used because ContextTimed expects a timer field, too, but its a hassle to set up and is not even used.

In places within the Vaults domain, we have used a JSON object to pass in parameters. This is to avoid spamming a bunch of undefined if a parameter won't be defined. However, some objects have the database transaction also bundled up inside this object. The transaction and the context must be defined outside like function abc(..., ctx, tran); so this will need to change in the code, too.

During a git.clone operation, when I try to cancel the gitPackGet stream, it throws an error along the lines of "Cannot execute readableCancel". I need to look into why this is happening and if this is a js-quic bug or not.

See Add cancellation to RPC handlers #846 (comment)
See Add cancellation to RPC handlers #846 (comment)
Yes see the tests in js-contexts. Also update the readme of js-contexts with this understanding.
Why would you do that, just make a proper context object, go to js-contexts tests to see how to do that.
Yes generally tran is outside the input object. However ctx should always be the last parameter. There should be examples of doing this in other parts of the codebase.
Becareful with stream IO manipulation, WebStreams have their own sets of logic that you need to understand. Both js-ws and js-quic make use of webstreams to standardise stream logic. You have to prototype the code here and show different examples to really understand.

CMCDragonkai · 2024-12-15T20:44:12Z

I think this PR is lacking consideration/discussion of the timeouts/timers. Beware of how we expect timeouts to work.

You must read the README.md of js-rpc on the timeouts section and do some reflection there.

CMCDragonkai · 2024-12-15T20:45:17Z

If MatrixAI/Polykey-CLI#264 (comment) is not fixed by this PR, then this PR is not sufficient. You must target the fixing of MatrixAI/Polykey-CLI#264 (comment).

aryanjassal · 2024-12-16T05:17:05Z

If MatrixAI/Polykey-CLI#264 (comment) is not fixed by this PR, then this PR is not sufficient. You must target the fixing of MatrixAI/Polykey-CLI#264 (comment).

This specific PR lays down the groundwork to allow this. However, adding RPC progress updates is out of scope for this particular PR. I can work on that in a new PR after I close this one.

aryanjassal · 2024-12-19T03:06:21Z

src/vaults/VaultManager.ts

@@ -340,7 +350,7 @@ class VaultManager {
    );
    const vaultIdString = vaultId.toString() as VaultIdString;
    return await this.vaultLocks.withF(
-      [vaultId.toString(), RWLockWriter, 'write'],
+      [vaultId.toString(), RWLockWriter, 'write', ctx],


I've done this in a couple of places, but I don't think this is the right way to do this. The type itself is also pretty complex and I can't understand it. This feels wrong to me, but no error is raised here.

[ci skip]

aryanjassal · 2024-12-24T02:57:13Z

I have noticed inside VaultsSecretsStat, we are returning information like uid, gid, mode, etc. from the handler. Things like file permissions don't actually exist in the vault, so why are we returning redundant information to the user? Shouldn't we ignore this information completely? This would also reduce noise on the output and show only the relevant information to the user.

aryanjassal self-assigned this Nov 19, 2024

aryanjassal commented Nov 19, 2024

View reviewed changes

src/git/utils.ts Show resolved Hide resolved

aryanjassal marked this pull request as draft November 26, 2024 04:29

tegefaulkes and others added 3 commits December 10, 2024 13:59

wip: applying cancellability to handlers

acd0436

fix: build

e90e6e8

chore: added ctx to most of the vault domain

7b14e96

aryanjassal force-pushed the feature-eng-432-review-and-refactor-rpc-handlers-to-handle-cancellation branch from a2baf5f to 7b14e96 Compare December 10, 2024 03:39

feat: added cancellation for most polykey vaults commands

d518114

chore: added ctx to most of secrets commands

b596934

aryanjassal commented Dec 13, 2024

View reviewed changes

aryanjassal mentioned this pull request Dec 13, 2024

Add support for cancellation of potentially long-running operations MatrixAI/js-encryptedfs#86

Open

chore: cleaned up VaultInternal.ts

616884f

CMCDragonkai reviewed Dec 15, 2024

View reviewed changes

aryanjassal mentioned this pull request Dec 16, 2024

Allow clients to cancel a remote call and for the signal to properly propagate through the handler MatrixAI/js-rpc#77

Open

aryanjassal commented Dec 19, 2024

View reviewed changes

chore: added ctx to VaultInternal and VaultManager

d9b1ca0

[ci skip]

Add cancellation to RPC handlers #846

Are you sure you want to change the base?

Add cancellation to RPC handlers #846

Conversation

aryanjassal commented Nov 19, 2024 • edited Loading

Description

Issues Fixed

Tasks

Final checklist

linear bot commented Nov 19, 2024

aryanjassal commented Nov 19, 2024

tegefaulkes commented Nov 19, 2024

aryanjassal commented Nov 19, 2024

CMCDragonkai commented Dec 5, 2024

aryanjassal commented Dec 6, 2024

aryanjassal commented Dec 10, 2024

aryanjassal commented Dec 10, 2024

CMCDragonkai commented Dec 10, 2024

aryanjassal commented Dec 12, 2024 • edited Loading

aryanjassal commented Dec 12, 2024

aryanjassal commented Dec 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aryanjassal commented Dec 13, 2024

aryanjassal commented Dec 13, 2024

aryanjassal commented Dec 13, 2024

CMCDragonkai commented Dec 13, 2024

aryanjassal commented Dec 13, 2024

aryanjassal commented Dec 13, 2024

aryanjassal commented Dec 13, 2024

CMCDragonkai commented Dec 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CMCDragonkai Dec 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CMCDragonkai commented Dec 15, 2024

CMCDragonkai commented Dec 15, 2024

CMCDragonkai commented Dec 15, 2024

CMCDragonkai commented Dec 15, 2024

CMCDragonkai commented Dec 15, 2024

CMCDragonkai commented Dec 15, 2024

CMCDragonkai commented Dec 15, 2024

CMCDragonkai commented Dec 15, 2024

CMCDragonkai commented Dec 15, 2024

aryanjassal commented Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

aryanjassal commented Dec 24, 2024

aryanjassal commented Nov 19, 2024 •

edited

Loading

aryanjassal commented Dec 12, 2024 •

edited

Loading

CMCDragonkai Dec 15, 2024 •

edited

Loading

aryanjassal commented Dec 16, 2024 •

edited

Loading