Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use external storage for large data attributes [Large Feature] #506

Open
longquanzheng opened this issue Dec 5, 2024 · 1 comment
Open
Labels

Comments

@longquanzheng
Copy link
Contributor

longquanzheng commented Dec 5, 2024

When exceeding the threshold (configurable, e.g. 100KB) for a data attribute, iWF can use others like S3 for storing the data attributes instead of writing into Temporal history. (only storing the keys and the S3 objectIDs)

By storing keys and S3 objectIDs in Temporal history, IWF server will load from S3 before sending to application, and write to S3 for updates. For optimization, server could also load from S3 lazily when application tried to read it.

This is possible because iWF server workflow never really read the value of the DAs -- they are transparent to iWF server.

By offloading the large data attributes to S3, it's much easier for users to deal with large datasets, and more cost effective on using Cadence/Temporal.

@longquanzheng
Copy link
Contributor Author

longquanzheng commented Dec 18, 2024

probably we don’t need to store s3 object id at all. The id can just be based on workflowID+attribueKey

so in the history, we only store the DA is stored somewhere— a flag like “s3” .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant