to pack or not to pack ... #8572

ThomasWaldmann · 2024-11-27T18:03:24Z

borg 1.x segment files

borg 1.x used:

"segment files", elsewhere also known as "pack files" to store multiple repository objects in one file.
a "repository index" to be able to find these objects, using a mapping object id --> (segment_name, offset_in_segment).
transactions and rollback via log-like appending of operations (PUT, DEL, COMMIT) to these segment files

borg2 status quo: objects stored separately

borg2 is much simpler:

implemented using borgstore (k/v store with misc. backends)
objects are stored separately: 1 file chunk --> 1 repo object
objects can be directly found by their id (e.g. the id is mapped to the fs path / file name)
no transactions, no log-like appending - but correct write order

Pros:

simplicity
no need for some sort of "index" (which could be corrupted or out of date)
no segment file compaction needed, the server-side filesystem manages space allocation

Cons:

leads to big amounts of relatively small objects transferred and stored individually in the repository
latency and other overheads have quite a speed impact for remote repositories
depending on the storage type / filesystem, there will be more or less storage space usage overhead due to block size, esp. for many very small objects
dealing with lots of objects / doing lots of api calls can be expensive for some cloud storage providers

borg2 alternative idea

client assembles packs locally, transfers to store when the pack has reached the desired size or when there is no more data to write.
pack files have a per-pack index appended (pointing to the objects contained in the pack), so the per-pack index can be read without reading the full pack.
the per-pack index would also contain the RepoObj metadata (e.g. compression type/level, etc.)

Pros:

a lot less objects in store, less api calls, less latency impact

Cons:

more complex in general
will need an addtl. global index mapping object_id -> pack_id, offset_in_pack
will need more memory for that global index
space is managed clientside, causing more (network) I/O: compact will need to read the pack, drop unused entries and write it back to the store, update indexes

Side note: desired pack "size" could be given by amount of objects in the pack (N) or by the overall size of all objects in the pack (S). For the special case of N == 1 it would be a slightly different implementation (using a different file format) of what we currently have in borg2, not necessarily need that global index and also compact would still be very easy.

Related: #191

The text was updated successfully, but these errors were encountered:

dietmargoldbeck · 2024-12-03T08:23:37Z

I would really appreciate the use of packs. Currently borg 2 is "incompatible" with most USB hard disks with SMR recording. I used a Toshiba 4TB external USB hard drive for borg2 testing and a borg check was done approx. 50% after 12 hours when i killed it (needed the USB port). The repository was only approx 1,3TB

RonnyPfannschmidt · 2024-12-03T11:39:29Z

I consider packs essential

An alternative would be key value stores that optimize content addressing

ThomasWaldmann · 2024-12-03T14:36:30Z

@dietmargoldbeck what you've seen are 33MB/s.

That's not too bad for an initial backup to an USB (SMR) HDD.

Initial backups always feel very slow just due to the amount of data and processing.

ThomasWaldmann added the breaking label Nov 27, 2024

ThomasWaldmann added this to the 2.0.0rc1 milestone Nov 27, 2024

ThomasWaldmann mentioned this issue Nov 27, 2024

proposal draft: indexed segments #4217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to pack or not to pack ... #8572

to pack or not to pack ... #8572

ThomasWaldmann commented Nov 27, 2024 •

edited

Loading

dietmargoldbeck commented Dec 3, 2024

RonnyPfannschmidt commented Dec 3, 2024

ThomasWaldmann commented Dec 3, 2024 •

edited

Loading

to pack or not to pack ... #8572

to pack or not to pack ... #8572

Comments

ThomasWaldmann commented Nov 27, 2024 • edited Loading

borg 1.x segment files

borg2 status quo: objects stored separately

borg2 alternative idea

dietmargoldbeck commented Dec 3, 2024

RonnyPfannschmidt commented Dec 3, 2024

ThomasWaldmann commented Dec 3, 2024 • edited Loading

ThomasWaldmann commented Nov 27, 2024 •

edited

Loading

ThomasWaldmann commented Dec 3, 2024 •

edited

Loading