Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tar archival format to transfer file tree structure and data between Polykey vaults and local file systems #811

Open
aryanjassal opened this issue Sep 23, 2024 · 8 comments
Assignees
Labels
development Standard development technology

Comments

@aryanjassal
Copy link
Member

aryanjassal commented Sep 23, 2024

Specification

There are times when we need to transfer the secrets from a vault to either another vault or the user's file system. Sometimes, only one secret needs to be transferred. Other times, we need to transfer multiple file trees including their directory structure.

As all the vaults are stored on the same encrypted file system (efs), to transfer file trees between vaults, we only need to use regular file copying/moving operations on file systems; something along the lines of fs.promises.copy() should work well to transfer secrets between vaults.

However, doing this between the vaults and the user's file system is not as straightforward. To efficiently transmit the file tree, we will be using an archival format like tar. The tar archival format is inherently streamable, and can be used to zip the file tree into a single file, which can then be transmitted over a RPC call, then be unpacked on the client, effectively transferring the file structure to the user's file system. Of course, we can also compress the resulting file, but we won't get into that quite yet.

Additional context

  • Pokykey#799 has seen discussion regarding streaming over file trees using RPC calls.
  • gera2ld/tarjs can be used as a zero-dependency package to generate tar from a file system.
  • matrixai/js-virtualtar can be looked into, as this was also attempting to make streamable tar bindings for JavaScript.

Tasks

  1. Make an RPC handler responsible of copying/moving file tree.
  2. To move file tree between vaults, just use the fs operations. Multiple locks might be required if transferring between multiple vaults.
  3. To move file tree between vaults and file systems, make a tarball and stream it over RPC instead.
@aryanjassal aryanjassal added the development Standard development label Sep 23, 2024
@aryanjassal aryanjassal self-assigned this Sep 23, 2024
Copy link
Member Author

aryanjassal commented Oct 8, 2024

After taking a look at matrixai/js-virtualtar, I have realised that significant work still needs to go into the repo before it would be ready to be used within the Polykey ecosystem. This alone might take a cycle or two. I will need to play around with it before starting development on it.

@tegefaulkes
Copy link
Contributor

What needs to be done before it's ready to be used?

@aryanjassal
Copy link
Member Author

I haven't taken an in-depth dive into the requirements, just a brief skim-through over the code. As such, I'm not sure what exactly needs to be done for that. I'll need to see what it already has and what still needs to be done when I start working on this issue.

@CMCDragonkai CMCDragonkai changed the title Use tar archival format to transfer file tree structure and data between Polykey vaults and user file systems Use tar archival format to transfer file tree structure and data between Polykey vaults and lox file systems Oct 27, 2024
@CMCDragonkai CMCDragonkai changed the title Use tar archival format to transfer file tree structure and data between Polykey vaults and lox file systems Use tar archival format to transfer file tree structure and data between Polykey vaults and local file systems Oct 27, 2024
Copy link
Member Author

The first iteration of the cp and mv command will only perform the operations between two vaults. Local file paths support is another topic, and will be implemented later on. As such, this issue is no longer blocking secrets cp and secrets mv.

Copy link
Member Author

We should look into tar integration sooner than later. The current state of js-virtualtar is that it is using JavaScript and was last updated a 7 years ago. The repo it was forked from has a commit from this year. So, should I rebase and update our repo? Or use it as-is?

I'd say updating the js-virtualtar seems like our best bet. We will need to convert it to use TypeScripts and update the code to be more modern. This might take a cycle or two, but would be well worth it.

From what I could tell, our library does not include streaming support yet. This is something pretty important to our use case, so this would need to be implemented in whatever solution we go ahead with.

I am yet to look into the repo itself and play around with it to get a better idea of its functionality, so I will mention that later.

@CMCDragonkai
Copy link
Member

You could start from scratch. The existing virtual tar codebase isn't up to our standards anymore. But you can use the same repo.

@CMCDragonkai
Copy link
Member

There's an existing node version of far in npm and it's probably still up to date. You can use chatgpt to help generate the right protocol code to avoid any nodeisms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development technology
Development

No branches or pull requests

4 participants