Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add performance rules for persistence layer #27

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

baumeister25
Copy link
Contributor

No description provided.

@hohwille
Copy link
Member

hohwille commented Feb 7, 2023

@baumeister25 thanks for your PR. It is nice to get performance guidance into the new java stack.

IMHO if we declare something as rule, it should have the true character of a rule:

Rule 1: The performance of a use case is proportional to the number of SQL statements it generates

    Typical values of a query execution are 5 to 100 msec

This is dead wrong: I have seen a use-case in a real project triggering a single SQL that always ended after 6 hours due to a TX timeout. It was written so bad and there were so many many records (multiple billion!!!) that the DB had no chance to ever complete the SQL. Apart from this extreme example I have also seen many other cases where a single bad SQL was killing the UC performance...

What you are trying to archive could maybe stated as

Rule 1: Minimize the number of SQL statements per transaction

Do not repeat the same (SELECT) statement and instead reduce it to a single one (avoid N+1 problems). Also try to operate close to the database so instead of first reading large sets of data into your app (via SELECT) to then send it back to the database (via INSERT or UPDATE), try to combine all into a single statement (e.g. using SELECT sub-query). In general take some time to kick your O/R mapping frameworks out of your mind and think what native SQL you would write to do the task. Then check what SQL(s) are actually produced by your code and try to match this as close as possible.

And further

Rule 2: The performance of a use case is the product of objects in a session with the number of flush operations

while this might be partially true, my rule is to never use flush at all. Whenever you are (explicitly) writing flush in your code, you are doing something wrong. In case you think you have to use flush after all then at least ALWAYS write a detailed comment above explaining exactly why this is needed.

Other rules I would see:

  • Only load the data you need
  • Only load the data when you need it (e.g. in a search result screen you do not need to have all the details loaded from DB tier to app tier into frontend but only minimal sets of the records to display, once the user selects a search result for expansion or to open details screen then load the details for that record).
  • Declare all your JPA relations as lazy but avoid triggering of lazy loading at all - it is a stupid feature of JPA. Either you do not need to load things at all in your TX or you need them. If you do, then lazy loading only leads to N+1 problems and waste so simply load eagerly then (not via Annotations but via the query - e.g. using FETCH JOIN).
  • Check the execution plans of all your queries, in case of Oracle check AWRs and statistics frequently on production
  • Create indices - a rule of thumb is create an index for every foreign key column and consider indexes for all searched columns but not blindly but after checking execution plans
  • General rule: Ensure to have an expert for your DB technology in your project in case you are doing something serious (millions of records, anything below is peanuts and may work without deeper knowledge)
  • Avoid conversions (Cast, TO_DATE, TO_NUMBER, etc.) in your SQLs
  • Ensure that data-types in variable bindings fit to your DDL. Oracle does not convert the given variable value to the DDL type to convert to but instead converts every column value to the data-type you provided what can lead to catastrophic performance and DB CPU drain, etc.
  • Never use Criteria API - it sucks and leads to disaster (I have rescued a big-data project and mean it)
  • Do explicit joins (selecting from multiple tables and then "joining" in WHERE clause makes hibernate to do cross-join what makes Oracle build the Cartesian product before evaluating the WHERE clause. If both tables to "join" are really big this will lead to an ultimate disaster (DB IO death)
  • Avoid WHERE ID IN (...) expressions and be aware of limitations (1000 values for Oracle). Instead think if you can use a sub-query to avoid having the list of IDs in the first place.
  • so much more to tell - we should collect input from DB community, projects, etc. to get all the valuable knowledge collected.

@baumeister25
Copy link
Contributor Author

Hi @hohwille,
Thanks for the feedback and the additional input.
I like the further rules very much and if that is fine for you I can create a ticket with the list, so that we can add them in a separate PR. For now, it would be enough for me to get the first two rules on the road.

Those rules where meant to be a very general reminder to think of what hibernate does. Therefore, I also like your sentence of put ORM away from thinking.
Rule 1 Should have said: Every generated SQL increases your execution time.
Rule 2 should have added: Check if you're forcing hibernate to do flushes because of the orchestration of your code and try to avoid them.

For rule 1:
I like the rule of minimizing the number of SQL statements. That fits into the meaning of the rule. I'll take that over.

I'll also take over the first sentence (Do not repeat ...) I think that's a good addition. The second sentence I would put into a TIPP-box. I think it's a very good hint to make people aware that the ORM should not replace thinking.

I would keep the typical execution time (maybe in an info box) I still think it's valuable for developers to have at least a number in head of an SQL execution. Of course there are cases that exceed these numbers, but I think in general they're ok and that you can kill everything with long running SQLs should be clear I hope.

For rule2:
It was not meant only for manual flush operations.
You can easily force hibernate to do unnecessary flush operations when orchestrating Queries wrong instead of combining them.

@baumeister25
Copy link
Contributor Author

@hohwille I've updated the text and created a new issue. Can you re-check the proposed article and approve if fine.

@baumeister25 baumeister25 requested a review from hohwille March 22, 2023 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants