feat(server/objects): new column to track object size #3750

iainsproat · 2024-12-30T14:05:51Z

Description & motivation

Through memoization of the stored object size during upload of objects, we can store the data for later analysis of the size of objects.
This enables future features, such as calculating largest projects by stored size, rate of change of stored object size, averages of object size etc. etc..
NB approximate, as it JSON stringifies the object and then grossly assumes one string character to one byte.

Changes:

adds new column to objects table
sizeBytes column is a big int data type, is nullable, and defaults to null
calculates the object size (approximately) when uploaded
removes unused methods

To-do before merge:

is a migration the best place for the backfill of data? It could cause the startup to fail for large object tables.
- would it be better as an async process, as it is not required for the operation of the server?
- would it be better as an external service? (though this might cause more work with more docker images & k8s manifests etc.)
should we accurately calculate the actual string size, using textEncoder to calculate the number of bytes? Is it necessary, is our approximation ok?

Screenshots:

Validation of changes:

Checklist:

My pull request follows the guidelines in the Contributing guide?
My pull request does not duplicate any other open Pull Requests for the same update/change?
My commits are related to the pull request and do not amend unrelated code or documentation.
My code follows a similar style to existing code.
I have added appropriate tests.
I have updated or added relevant documentation.

References

- removes unused methods

linear · 2024-12-30T14:05:54Z

WEB-1271 Record write size

- this will cause slow startup for un-backfilled databases with large object tables

iainsproat · 2025-01-02T13:48:55Z

The migration takes too long, so we need to move the backfill to a different process. Either a background worker on the monolith, or a separate microservice.

iainsproat · 2025-01-07T10:15:27Z

Would we just be better doing SELECT pg_column_size("data") FROM "objects";, and no need memo-izing the data size? The sizes isn't exactly the same, but I'm not sure we care about exact values only magnitude of size.

Or, if we care about more closely matching the size calculated by Node, the uncompressed size would be better: SELECT octet_length("data"::text) AS "derivedSize", "sizeBytes" FROM "objects" ORDER BY "derivedSize" DESC;

feat(server/objects): new column to track object size

50ce65e

- removes unused methods

iainsproat added 2 commits December 30, 2024 15:12

Backfill data

f1d69d8

- this will cause slow startup for un-backfilled databases with large object tables

Merge branch 'main' into iain/web-1271-record-write-size

7b04208

iainsproat added 6 commits January 6, 2025 11:59

Merge branch 'main' into iain/web-1271-record-write-size

a26d59b

Prevent infinite loop of database during backfill

15a0d08

temporarily increase logging level for migration

70e0f74

Add test and fix bugs

4ef5c3e

Isolate test in separate stream

e1477b6

Only log debug

fed6158

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server/objects): new column to track object size #3750

feat(server/objects): new column to track object size #3750

iainsproat commented Dec 30, 2024 •

edited

Loading

linear bot commented Dec 30, 2024

iainsproat commented Jan 2, 2025 •

edited

Loading

iainsproat commented Jan 7, 2025 •

edited

Loading

feat(server/objects): new column to track object size #3750

Are you sure you want to change the base?

feat(server/objects): new column to track object size #3750

Conversation

iainsproat commented Dec 30, 2024 • edited Loading

Description & motivation

Changes:

To-do before merge:

Screenshots:

Validation of changes:

Checklist:

References

linear bot commented Dec 30, 2024

iainsproat commented Jan 2, 2025 • edited Loading

iainsproat commented Jan 7, 2025 • edited Loading

iainsproat commented Dec 30, 2024 •

edited

Loading

iainsproat commented Jan 2, 2025 •

edited

Loading

iainsproat commented Jan 7, 2025 •

edited

Loading