[improve][client] Add schema cache to improve performance #23808

yunmaoQu · 2025-01-03T16:46:14Z

Motivation

Schema creation (e.g., Schema.AVRO(SomeClass.class)) is fairly CPU intensive. It would be useful it there would be a weak reference cache for caching the schema instance.

Modifications

Add SchemaCache implementation using WeakHashMap for schema instance caching
Add cache configuration and metrics for monitoring
Add cleanup strategy for expired cache entries
Modify Schema creation methods (AVRO/JSON/PROTOBUF) to use cache
Add cloning mechanism to maintain schema immutability

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

N/A

- Add SchemaCache implementation - Add cache configuration and metrics - Add cleanup strategy - Modify Schema creation methods - Add unit tests and performance tests This closes apache#23707

github-actions · 2025-01-03T16:46:44Z

@yunmaoQu Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

lhotari

There are several inconsistencies in this PR. For example, the class names and the class file names don't match. Please test this PR in your own fork first to ensure that it passes tests.
It seems that this PR contains a lot of features related to the schema caching. Instead of adding a lot of features, it would be better to keep the implementation to the minimum.
I'm surprised by competing implementations for implementing the schema cache. There's currently already an open PR #23777.

yunmaoQu · 2025-01-03T18:38:06Z

ok,i test all and could you review it and give me some suggestions

lhotari · 2025-01-03T19:06:48Z

ok,i test all and could you review it and give me some suggestions

Instead of adding more code to test everything, please reduce to a minimal implementation. This means to remove features to track cache metrics. That's not something that is needed. For the cache implementation, I'd suggest using a ConcurrentMap created with Guava's MapMaker. Instead of adding yet another abstraction, I'd suggest modifying the PulsarClientImplementationBinding interface and adding a new interface method <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> newProtobufSchema(Class<T> clazz). Then we could keep the cache as an implementation level detail.

example of minimal implementation for newProtobufSchema using Guava's MapMaker with weak keys:

    private static final ConcurrentMap<Class<?>, Schema<?>> PROTOBUF_CACHE = new MapMaker().weakKeys().makeMap();

    public <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> newProtobufSchema(Class<T> clazz) {
        return (Schema<T>) PROTOBUF_CACHE.computeIfAbsent(clazz,
                k -> ProtobufSchema.of(SchemaDefinition.builder().withPojo(clazz).build())).clone();
    }

There shouldn't be a need to ever clear the cache since it's bounded by the number of classes with strong references. It won't consume a significant amount of memory in the first place.

yunmaoQu · 2025-01-03T19:14:15Z

OK.Should i implement it based on the pre commit or what?

lhotari · 2025-01-03T19:26:36Z

OK.Should i implement it based on the pre commit or what?

That's something you can decide. Please read my previous message and draw your conclusions.

walkinggo · 2025-01-04T04:07:11Z

It looks like we're working on similar tasks. I've already created a pull request #23777 to complete this task. Should we work together to finish it, or what do you suggest? @yunmaoQu

yunmaoQu · 2025-01-04T04:57:01Z

Yes. We can work it together.@walkinggo

yunmaoQu · 2025-01-05T18:06:40Z

@lhotari

OK.Should i implement it based on the pre commit or what?

That's something you can decide. Please read my previous message and draw your conclusions.

I implement a minimal version. Could you review it and give me some suggestion. Thanks for your previous guide.

[Enhancement] Add schema cache to improve performance

afca79f

- Add SchemaCache implementation - Add cache configuration and metrics - Add cleanup strategy - Modify Schema creation methods - Add unit tests and performance tests This closes apache#23707

github-actions bot added the doc-label-missing label Jan 3, 2025

github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Jan 3, 2025

lhotari requested changes Jan 3, 2025

View reviewed changes

fix the class name inconsistency

e3bd21d

lhotari mentioned this pull request Jan 3, 2025

[Enhancement] Cache Schema instances for classes in a weak reference cache since creating an instance could be CPU intensive #23777

Open

14 tasks

yunmaoQu added 2 commits January 5, 2025 17:50

make a minimal implementation

d342395

fix the pre delete

ace0e46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][client] Add schema cache to improve performance #23808

[improve][client] Add schema cache to improve performance #23808

yunmaoQu commented Jan 3, 2025 •

edited

Loading

github-actions bot commented Jan 3, 2025

lhotari left a comment

yunmaoQu commented Jan 3, 2025

lhotari commented Jan 3, 2025

yunmaoQu commented Jan 3, 2025

lhotari commented Jan 3, 2025

walkinggo commented Jan 4, 2025 •

edited

Loading

yunmaoQu commented Jan 4, 2025 •

edited

Loading

yunmaoQu commented Jan 5, 2025 •

edited

Loading

[improve][client] Add schema cache to improve performance #23808

Are you sure you want to change the base?

[improve][client] Add schema cache to improve performance #23808

Conversation

yunmaoQu commented Jan 3, 2025 • edited Loading

Motivation

Modifications

Documentation

Matching PR in forked repository

github-actions bot commented Jan 3, 2025

lhotari left a comment

Choose a reason for hiding this comment

yunmaoQu commented Jan 3, 2025

lhotari commented Jan 3, 2025

yunmaoQu commented Jan 3, 2025

lhotari commented Jan 3, 2025

walkinggo commented Jan 4, 2025 • edited Loading

yunmaoQu commented Jan 4, 2025 • edited Loading

yunmaoQu commented Jan 5, 2025 • edited Loading

yunmaoQu commented Jan 3, 2025 •

edited

Loading

walkinggo commented Jan 4, 2025 •

edited

Loading

yunmaoQu commented Jan 4, 2025 •

edited

Loading

yunmaoQu commented Jan 5, 2025 •

edited

Loading