How to scale Vuu server-side to handle very large tables (10m to 10bn) - Discussion #821
Closed
chrisjstevo
started this conversation in
Ideas
Replies: 2 comments
-
in order to begin this work I will create a few new structures within the Vuu source code.
so 1) will contain org.finos.vuu.feature.table and I will have a new module in 2) called table-hypersql under /vuu/features. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Closed as superseded by #971 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The value proposition of Vuu is that:
One of the restrictions with Vuu at the moment is handling very large data sets (10m to 10bn). The restriction around this is that Vuu currently uses a table model that caches all data in memory. This is performant for up to 1m rows but becomes less performant as we go above that.
One proposal for resolving this would be to break Vuu into its internal interfaces (such as DataTable, Viewport, ViewportKeys, Filter, Sort etc... and then allow the Vuu team or another people to provide implementations for those interfaces in their favorite technology.
Then we could allow users to use these features in a declarative manner:
And they could declare table types in the manner:
What would be the benefit of this approach?
This way we can choose the table storage and query mechanism based on the size and latency requirements.
What are the implications of this approach?
The first implication for this approach is that we'd have to refactor the code base into abstract features and feature implementations. Most of the features are wrapped in interfaces so this shouldn't be a huge ask.
A more substantial impact to the code would be on components such as the join manager. The join managers purpose is to manage across data tables the relationships between keys (for example orders 1,2 and 3 all have the RIC = VOD.L and use that as a foreign key to prices.)
In this new design the join table would have to allow the loading of foreign key relations from an underlying data store. This would add some complexity.
What are the proposed next steps?
The proposed next step would be to abstract the interface definitions for the objects (or features) out, and then provide the default implementation as the default provided impl.
The second step would be to provide a simple second implementation of a "table" feature using another technology to back it. An example of this could be a Hyper SQL DB or an Apache Ignite data cache (TBC).
Discuss...
Beta Was this translation helpful? Give feedback.
All reactions