feat: Add performance rules for persistence layer #27

baumeister25 · 2022-11-25T07:43:43Z

No description provided.

hohwille · 2023-02-07T18:10:06Z

@baumeister25 thanks for your PR. It is nice to get performance guidance into the new java stack.

IMHO if we declare something as rule, it should have the true character of a rule:

Rule 1: The performance of a use case is proportional to the number of SQL statements it generates

    Typical values of a query execution are 5 to 100 msec

This is dead wrong: I have seen a use-case in a real project triggering a single SQL that always ended after 6 hours due to a TX timeout. It was written so bad and there were so many many records (multiple billion!!!) that the DB had no chance to ever complete the SQL. Apart from this extreme example I have also seen many other cases where a single bad SQL was killing the UC performance...

What you are trying to archive could maybe stated as

Rule 1: Minimize the number of SQL statements per transaction

Do not repeat the same (SELECT) statement and instead reduce it to a single one (avoid N+1 problems). Also try to operate close to the database so instead of first reading large sets of data into your app (via SELECT) to then send it back to the database (via INSERT or UPDATE), try to combine all into a single statement (e.g. using SELECT sub-query). In general take some time to kick your O/R mapping frameworks out of your mind and think what native SQL you would write to do the task. Then check what SQL(s) are actually produced by your code and try to match this as close as possible.

And further

Rule 2: The performance of a use case is the product of objects in a session with the number of flush operations

while this might be partially true, my rule is to never use flush at all. Whenever you are (explicitly) writing flush in your code, you are doing something wrong. In case you think you have to use flush after all then at least ALWAYS write a detailed comment above explaining exactly why this is needed.

Other rules I would see:

Only load the data you need
Only load the data when you need it (e.g. in a search result screen you do not need to have all the details loaded from DB tier to app tier into frontend but only minimal sets of the records to display, once the user selects a search result for expansion or to open details screen then load the details for that record).
Declare all your JPA relations as lazy but avoid triggering of lazy loading at all - it is a stupid feature of JPA. Either you do not need to load things at all in your TX or you need them. If you do, then lazy loading only leads to N+1 problems and waste so simply load eagerly then (not via Annotations but via the query - e.g. using FETCH JOIN).
Check the execution plans of all your queries, in case of Oracle check AWRs and statistics frequently on production
Create indices - a rule of thumb is create an index for every foreign key column and consider indexes for all searched columns but not blindly but after checking execution plans
General rule: Ensure to have an expert for your DB technology in your project in case you are doing something serious (millions of records, anything below is peanuts and may work without deeper knowledge)
Avoid conversions (Cast, TO_DATE, TO_NUMBER, etc.) in your SQLs
Ensure that data-types in variable bindings fit to your DDL. Oracle does not convert the given variable value to the DDL type to convert to but instead converts every column value to the data-type you provided what can lead to catastrophic performance and DB CPU drain, etc.
Never use Criteria API - it sucks and leads to disaster (I have rescued a big-data project and mean it)
Do explicit joins (selecting from multiple tables and then "joining" in WHERE clause makes hibernate to do cross-join what makes Oracle build the Cartesian product before evaluating the WHERE clause. If both tables to "join" are really big this will lead to an ultimate disaster (DB IO death)
Avoid WHERE ID IN (...) expressions and be aware of limitations (1000 values for Oracle). Instead think if you can use a sub-query to avoid having the list of IDs in the first place.
so much more to tell - we should collect input from DB community, projects, etc. to get all the valuable knowledge collected.

baumeister25 · 2023-02-09T07:05:02Z

Hi @hohwille,
Thanks for the feedback and the additional input.
I like the further rules very much and if that is fine for you I can create a ticket with the list, so that we can add them in a separate PR. For now, it would be enough for me to get the first two rules on the road.

Those rules where meant to be a very general reminder to think of what hibernate does. Therefore, I also like your sentence of put ORM away from thinking.
Rule 1 Should have said: Every generated SQL increases your execution time.
Rule 2 should have added: Check if you're forcing hibernate to do flushes because of the orchestration of your code and try to avoid them.

For rule 1:
I like the rule of minimizing the number of SQL statements. That fits into the meaning of the rule. I'll take that over.

I'll also take over the first sentence (Do not repeat ...) I think that's a good addition. The second sentence I would put into a TIPP-box. I think it's a very good hint to make people aware that the ORM should not replace thinking.

I would keep the typical execution time (maybe in an info box) I still think it's valuable for developers to have at least a number in head of an SQL execution. Of course there are cases that exceed these numbers, but I think in general they're ok and that you can kill everything with long running SQLs should be clear I hope.

For rule2:
It was not meant only for manual flush operations.
You can easily force hibernate to do unnecessary flush operations when orchestrating Queries wrong instead of combining them.

Refs: devonfw#27

baumeister25 · 2023-03-22T12:02:59Z

@hohwille I've updated the text and created a new issue. Can you re-check the proposed article and approve if fine.

baumeister25 requested a review from SchettlerKoehler November 25, 2022 07:43

baumeister25 force-pushed the feature/sql_rules branch from 96b5cf5 to d944232 Compare November 25, 2022 11:36

feat: Add performance rules for persistence layer

c6badad

baumeister25 force-pushed the feature/sql_rules branch from d944232 to c6badad Compare November 25, 2022 11:37

fix: Incorporate PR review into the JPA performance article

6aebc55

Refs: devonfw#27

baumeister25 mentioned this pull request Mar 21, 2023

Add further JPA performance rules #38

Open

baumeister25 requested a review from hohwille March 22, 2023 12:03

SchettlerKoehler approved these changes May 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add performance rules for persistence layer #27

feat: Add performance rules for persistence layer #27

baumeister25 commented Nov 25, 2022

hohwille commented Feb 7, 2023 •

edited

Loading

baumeister25 commented Feb 9, 2023

baumeister25 commented Mar 22, 2023

feat: Add performance rules for persistence layer #27

Are you sure you want to change the base?

feat: Add performance rules for persistence layer #27

Conversation

baumeister25 commented Nov 25, 2022

hohwille commented Feb 7, 2023 • edited Loading

baumeister25 commented Feb 9, 2023

baumeister25 commented Mar 22, 2023

hohwille commented Feb 7, 2023 •

edited

Loading