Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to pack or not to pack ... #8572

Open
ThomasWaldmann opened this issue Nov 27, 2024 · 3 comments
Open

to pack or not to pack ... #8572

ThomasWaldmann opened this issue Nov 27, 2024 · 3 comments
Labels
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 27, 2024

borg 1.x segment files

borg 1.x used:

  • "segment files", elsewhere also known as "pack files" to store multiple repository objects in one file.
  • a "repository index" to be able to find these objects, using a mapping object id --> (segment_name, offset_in_segment).
  • transactions and rollback via log-like appending of operations (PUT, DEL, COMMIT) to these segment files

borg2 status quo: objects stored separately

borg2 is much simpler:

  • implemented using borgstore (k/v store with misc. backends)
  • objects are stored separately: 1 file chunk --> 1 repo object
  • objects can be directly found by their id (e.g. the id is mapped to the fs path / file name)
  • no transactions, no log-like appending - but correct write order

Pros:

  • simplicity
  • no need for some sort of "index" (which could be corrupted or out of date)
  • no segment file compaction needed, the server-side filesystem manages space allocation

Cons:

  • leads to big amounts of relatively small objects transferred and stored individually in the repository
  • latency and other overheads have quite a speed impact for remote repositories
  • depending on the storage type / filesystem, there will be more or less storage space usage overhead due to block size, esp. for many very small objects
  • dealing with lots of objects / doing lots of api calls can be expensive for some cloud storage providers

borg2 alternative idea

  • client assembles packs locally, transfers to store when the pack has reached the desired size or when there is no more data to write.
  • pack files have a per-pack index appended (pointing to the objects contained in the pack), so the per-pack index can be read without reading the full pack.
  • the per-pack index would also contain the RepoObj metadata (e.g. compression type/level, etc.)

Pros:

  • a lot less objects in store, less api calls, less latency impact

Cons:

  • more complex in general
  • will need an addtl. global index mapping object_id -> pack_id, offset_in_pack
  • will need more memory for that global index
  • space is managed clientside, causing more (network) I/O: compact will need to read the pack, drop unused entries and write it back to the store, update indexes

Side note: desired pack "size" could be given by amount of objects in the pack (N) or by the overall size of all objects in the pack (S). For the special case of N == 1 it would be a slightly different implementation (using a different file format) of what we currently have in borg2, not necessarily need that global index and also compact would still be very easy.

Related: #191

@dietmargoldbeck
Copy link

I would really appreciate the use of packs. Currently borg 2 is "incompatible" with most USB hard disks with SMR recording. I used a Toshiba 4TB external USB hard drive for borg2 testing and a borg check was done approx. 50% after 12 hours when i killed it (needed the USB port). The repository was only approx 1,3TB

@RonnyPfannschmidt
Copy link
Contributor

I consider packs essential

An alternative would be key value stores that optimize content addressing

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Dec 3, 2024

@dietmargoldbeck what you've seen are 33MB/s.

That's not too bad for an initial backup to an USB (SMR) HDD.

Initial backups always feel very slow just due to the amount of data and processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants