-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][client] Add schema cache to improve performance #23808
base: master
Are you sure you want to change the base?
Conversation
- Add SchemaCache implementation - Add cache configuration and metrics - Add cleanup strategy - Modify Schema creation methods - Add unit tests and performance tests This closes apache#23707
@yunmaoQu Please add the following content to your PR description and select a checkbox:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several inconsistencies in this PR. For example, the class names and the class file names don't match. Please test this PR in your own fork first to ensure that it passes tests.
It seems that this PR contains a lot of features related to the schema caching. Instead of adding a lot of features, it would be better to keep the implementation to the minimum.
I'm surprised by competing implementations for implementing the schema cache. There's currently already an open PR #23777.
ok,i test all and could you review it and give me some suggestions |
Instead of adding more code to test everything, please reduce to a minimal implementation. This means to remove features to track cache metrics. That's not something that is needed. For the cache implementation, I'd suggest using a example of minimal implementation for private static final ConcurrentMap<Class<?>, Schema<?>> PROTOBUF_CACHE = new MapMaker().weakKeys().makeMap();
public <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> newProtobufSchema(Class<T> clazz) {
return (Schema<T>) PROTOBUF_CACHE.computeIfAbsent(clazz,
k -> ProtobufSchema.of(SchemaDefinition.builder().withPojo(clazz).build())).clone();
} There shouldn't be a need to ever clear the cache since it's bounded by the number of classes with strong references. It won't consume a significant amount of memory in the first place. |
OK.Should i implement it based on the pre commit or what? |
That's something you can decide. Please read my previous message and draw your conclusions. |
Yes. We can work it together.@walkinggo |
I implement a minimal version. Could you review it and give me some suggestion. Thanks for your previous guide. |
close #23707
Motivation
Schema creation (e.g.,
Schema.AVRO(SomeClass.class)
) is fairly CPU intensive. It would be useful it there would be a weak reference cache for caching the schema instance.Modifications
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
N/A