Parametrize SELECT queries #1777

henadzit · 2024-11-18T12:53:29Z

Description

This PR:

Parametrizes SELECT queries (including .count(), exists(), .values(), .values_list()). It does NOT implement parametrization of UPDATE and DELETE queries. I'll work on UPDATE and DELETE parametrization as part of a separate PR since this one is already quite big.
Ensures that AwaitableQuery._execute uses the same SQL as returned by AwaitableQuery.sql() by changing the signatures of _execute and_make_query.

This PR is dependent on changes in pypika-tortoise tortoise/pypika-tortoise#16. ⚠️ we should merge them first and release a new version of pypika-tortoise!

Motivation and Context

Parameterized queries are crucial for preventing SQL injection attacks and but also can improve performance of database operations.

Related ticket Change to all-parametrised queries #81
v1.0.0 target Discussion: v1.0.0 target #19

How Has This Been Tested?

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added the changelog accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

henadzit · 2024-11-18T12:54:56Z

tortoise/backends/psycopg/executor.py

+
+    @classmethod
+    def parameterizer(cls) -> Parameterizer:
+        return Parameterizer(placeholder_factory=lambda _: "%s")


psycopg decided to use non-standard placeholder for Postgres, so we have to account for that.

coveralls · 2024-11-18T14:56:45Z

Pull Request Test Coverage Report for Build 11958477626

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

146 of 152 (96.05%) changed or added relevant lines in 11 files are covered.
3 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.03%) to 89.53%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
tortoise/queryset.py	103	109	94.5%

Files with Coverage Reduction	New Missed Lines	%
tortoise/backends/mysql/executor.py	3	88.89%

Totals
Change from base Build 11922917874:	0.03%
Covered Lines:	6246
Relevant Lines:	6867

💛 - Coveralls

abondar · 2024-11-18T15:12:44Z

tests/test_case_when.py

+        if self.dialect == "mysql":
+            expected_sql = "SELECT `intnum` `intnum`,CASE WHEN `intnum`>=%s THEN %s ELSE `intnum_null` END `category` FROM `intfields`"
+        elif self.dialect == "postgres":
+            if self.is_psycopg:
+                expected_sql = 'SELECT "intnum" "intnum",CASE WHEN "intnum">=%s THEN %s ELSE "intnum_null" END "category" FROM "intfields"'
+            else:
+                expected_sql = 'SELECT "intnum" "intnum",CASE WHEN "intnum">=$1 THEN $2 ELSE "intnum_null" END "category" FROM "intfields"'


Shouldn't we add parameters asserts, if we are not checking them inside query text anymore?

Good point!

I did a bit of research. Django's equivalent of sql() method returns SQL with values in place but the SQL itself might be invalid. So it's basically just for reference when you want to have a quick check what SQL the query generates.

In Ruby on Rails they return SQL with values in place too. I read a few issues in their repo and looks like it's just for reference too.

In SQLAlchemy they return the parametrized query without parameters by default but you can pass an argument to the function to have values in place.

I'm a bit torn on this one. It feels to me that returning a not parametrized query with values in place means that we are returning a different query from the one actually being executed. So I'm inclined to introduce a function argument to sql() to configure whether to return parametrized query or a query with inlined values. WDYT?

Yeah, I think doing it with parameter is okay, if someone needs it for debug reasons - shouldn't be problem to pass parameter

Added params_inline arg to QuerySet.sql().

abondar · 2024-11-18T15:19:20Z

tortoise/queryset.py

+        parameterizer = pypika_kwargs.pop("parameterizer", self._db.executor_class.parameterizer())
+        return (
+            self.query.get_sql(parameterizer=parameterizer, **pypika_kwargs),
+            parameterizer.values,
+        )
+


Why you decided to implement logic of parametrisation in queryset, and not in executor?

If it is about reusing same logic between different backends - it seems to me that it should be no problem to store that logic in base executor

It just seem odd for queryset to get down in such details of query formatting as forming sql string and deriving params from it.

I would imagine queryset as quite abstract entity that doesn't go in such details in making query and pypika query seems to be okay mediator between queryset and executors

Why you decided to implement logic of parametrisation in queryset, and not in executor?

Honestly, I haven't thought about it before you asked but it's a great question.

I tried to refactor the code to have parametrization as part of executors and realized a few things:

The executor interface will need to be extended with a method to return SQL, parametrized or not

When using Subquery and parametrizing the query, pypika's keywords are passed from the parent query, so they will need to be passed to the executor too, basically, making the the logic a bit more complicated

I tried to see how we can have better abstractions but honestly right now pypika is all over the place. QuerySet modifies private fields of pypika's query and executor builds its own query, etc. So this PR doesn't change the situation by much.

However, I moved the parametrization code to a private method which doesn't seem't seem to be complicated, it's 5 lines of code - please see _parametrize_query

I'm kinda inclined to think that it's Okay to keep parametrization as part of queryset but please let me know what you think!

abondar · 2024-11-21T08:15:35Z

tortoise/expressions.py

@@ -215,10 +215,13 @@ def __init__(self, query: "AwaitableQuery") -> None:
        self.query = query

    def get_sql(self, **kwargs: Any) -> str:
-        return self.query.as_query().get_sql(**kwargs)
+        self.query._choose_db_if_not_chosen()
+        return self.query._make_query(**kwargs)[0]


It's seems unclear - what exactly are pypika_kwargs and what can be passed there

Because as I am reading code - it uses ambigious **pypika_kwargs everywhere, even though only parameter passed I see explicitly used - is parametrizer

What else can be there, that is passed further down to query.get_sql?

Are we able to give more explicit typing to such params?

@abondar the things that could be in kwargs: dialect, with_namespace, secondary_quote_char, alias_quote_char, as_keyword, etc. Basically it holds the context when pypika builds SQL.

I'm not fond of this either but this is a legacy of pypika, you can search for get_sql there to see what I mean. Since Subquery extends Term it needs to use this approach. Same for _make_query in queryset.py but I guess the issue became more obvious since I renamed kwargs to pypika_kwargs there. This PR does not change anything fundamentally about this approach. I think we can improve this by doing something similar as ResolveContext but I'm not sure that this PR is a good place for it since it's already quite large.

henadzit · 2024-11-21T17:43:47Z

@abondar, I replaced pypika's version, can you approve again? thanks!

henadzit commented Nov 18, 2024

View reviewed changes

henadzit mentioned this pull request Nov 18, 2024

Add Parametrized query support for SELECTs and EXPLAINs #1725

Closed

7 tasks

henadzit force-pushed the feat/parametrize-select-queries-2 branch 2 times, most recently from 7ad8250 to ca5e37d Compare November 18, 2024 14:32

henadzit requested a review from abondar November 18, 2024 14:57

abondar reviewed Nov 18, 2024

View reviewed changes

henadzit and others added 7 commits November 19, 2024 11:04

Parametrize SELECT queries

9fa413f

Use new parameter interface

07114ac

Make sure _execute() uses the same query as returned from sql()

505d29a

Parametrize .values, .values_list, .exists and .count queries

03796ae

Configure Parameterizer for psycopg

3bde481

Fix Postgres issues

46d47ac

Add params_inline arg to QuerySet.sql()

0d2fe63

henadzit force-pushed the feat/parametrize-select-queries-2 branch from ca5e37d to 0d2fe63 Compare November 19, 2024 22:52

abondar reviewed Nov 21, 2024

View reviewed changes

abondar previously approved these changes Nov 21, 2024

View reviewed changes

Use pypika-tortoise 0.3.0

dfe7553

henadzit dismissed abondar’s stale review via dfe7553 November 21, 2024 17:27

henadzit requested a review from abondar November 21, 2024 17:41

abondar approved these changes Nov 21, 2024

View reviewed changes

henadzit merged commit 7f077c1 into tortoise:develop Nov 21, 2024
7 checks passed

henadzit mentioned this pull request Nov 24, 2024

Parametrize UPDATE, DELETE and bulk operations #1785

Merged

7 tasks

Invisi mentioned this pull request Dec 5, 2024

add tests: update on filtered query fails when using subquery #1802

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parametrize SELECT queries #1777

Parametrize SELECT queries #1777

henadzit commented Nov 18, 2024 •

edited

Loading

henadzit Nov 18, 2024

coveralls commented Nov 18, 2024 •

edited

Loading

abondar Nov 18, 2024

henadzit Nov 19, 2024 •

edited

Loading

abondar Nov 19, 2024

henadzit Nov 19, 2024 •

edited

Loading

abondar Nov 18, 2024 •

edited

Loading

henadzit Nov 19, 2024

abondar Nov 21, 2024 •

edited

Loading

henadzit Nov 21, 2024

henadzit commented Nov 21, 2024

Parametrize SELECT queries #1777

Parametrize SELECT queries #1777

Conversation

henadzit commented Nov 18, 2024 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Checklist:

henadzit Nov 18, 2024

Choose a reason for hiding this comment

coveralls commented Nov 18, 2024 • edited Loading

Pull Request Test Coverage Report for Build 11958477626

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

abondar Nov 18, 2024

Choose a reason for hiding this comment

henadzit Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

abondar Nov 19, 2024

Choose a reason for hiding this comment

henadzit Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

abondar Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

henadzit Nov 19, 2024

Choose a reason for hiding this comment

abondar Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

henadzit Nov 21, 2024

Choose a reason for hiding this comment

henadzit commented Nov 21, 2024

henadzit commented Nov 18, 2024 •

edited

Loading

coveralls commented Nov 18, 2024 •

edited

Loading

henadzit Nov 19, 2024 •

edited

Loading

henadzit Nov 19, 2024 •

edited

Loading

abondar Nov 18, 2024 •

edited

Loading

abondar Nov 21, 2024 •

edited

Loading