Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametrize SELECT queries #1777

Merged
merged 8 commits into from
Nov 21, 2024

Conversation

henadzit
Copy link
Contributor

@henadzit henadzit commented Nov 18, 2024

Description

This PR:

  • Parametrizes SELECT queries (including .count(), exists(), .values(), .values_list()). It does NOT implement parametrization of UPDATE and DELETE queries. I'll work on UPDATE and DELETE parametrization as part of a separate PR since this one is already quite big.
  • Ensures that AwaitableQuery._execute uses the same SQL as returned by AwaitableQuery.sql() by changing the signatures of _execute and_make_query.

This PR is dependent on changes in pypika-tortoise tortoise/pypika-tortoise#16. ⚠️ we should merge them first and release a new version of pypika-tortoise!

Motivation and Context

Parameterized queries are crucial for preventing SQL injection attacks and but also can improve performance of database operations.

How Has This Been Tested?

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added the changelog accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.


@classmethod
def parameterizer(cls) -> Parameterizer:
return Parameterizer(placeholder_factory=lambda _: "%s")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

psycopg decided to use non-standard placeholder for Postgres, so we have to account for that.

@henadzit henadzit force-pushed the feat/parametrize-select-queries-2 branch 2 times, most recently from 7ad8250 to ca5e37d Compare November 18, 2024 14:32
@coveralls
Copy link

coveralls commented Nov 18, 2024

Pull Request Test Coverage Report for Build 11958477626

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 146 of 152 (96.05%) changed or added relevant lines in 11 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.03%) to 89.53%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tortoise/queryset.py 103 109 94.5%
Files with Coverage Reduction New Missed Lines %
tortoise/backends/mysql/executor.py 3 88.89%
Totals Coverage Status
Change from base Build 11922917874: 0.03%
Covered Lines: 6246
Relevant Lines: 6867

💛 - Coveralls

@henadzit henadzit requested a review from abondar November 18, 2024 14:57
Comment on lines 115 to 121
if self.dialect == "mysql":
expected_sql = "SELECT `intnum` `intnum`,CASE WHEN `intnum`>=%s THEN %s ELSE `intnum_null` END `category` FROM `intfields`"
elif self.dialect == "postgres":
if self.is_psycopg:
expected_sql = 'SELECT "intnum" "intnum",CASE WHEN "intnum">=%s THEN %s ELSE "intnum_null" END "category" FROM "intfields"'
else:
expected_sql = 'SELECT "intnum" "intnum",CASE WHEN "intnum">=$1 THEN $2 ELSE "intnum_null" END "category" FROM "intfields"'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add parameters asserts, if we are not checking them inside query text anymore?

Copy link
Contributor Author

@henadzit henadzit Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

I did a bit of research. Django's equivalent of sql() method returns SQL with values in place but the SQL itself might be invalid. So it's basically just for reference when you want to have a quick check what SQL the query generates.

In Ruby on Rails they return SQL with values in place too. I read a few issues in their repo and looks like it's just for reference too.

In SQLAlchemy they return the parametrized query without parameters by default but you can pass an argument to the function to have values in place.

I'm a bit torn on this one. It feels to me that returning a not parametrized query with values in place means that we are returning a different query from the one actually being executed. So I'm inclined to introduce a function argument to sql() to configure whether to return parametrized query or a query with inlined values. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think doing it with parameter is okay, if someone needs it for debug reasons - shouldn't be problem to pass parameter

Copy link
Contributor Author

@henadzit henadzit Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added params_inline arg to QuerySet.sql().

Comment on lines 1127 to 1139
parameterizer = pypika_kwargs.pop("parameterizer", self._db.executor_class.parameterizer())
return (
self.query.get_sql(parameterizer=parameterizer, **pypika_kwargs),
parameterizer.values,
)

Copy link
Member

@abondar abondar Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you decided to implement logic of parametrisation in queryset, and not in executor?

If it is about reusing same logic between different backends - it seems to me that it should be no problem to store that logic in base executor

It just seem odd for queryset to get down in such details of query formatting as forming sql string and deriving params from it.

I would imagine queryset as quite abstract entity that doesn't go in such details in making query and pypika query seems to be okay mediator between queryset and executors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you decided to implement logic of parametrisation in queryset, and not in executor?

Honestly, I haven't thought about it before you asked but it's a great question.

I tried to refactor the code to have parametrization as part of executors and realized a few things:

  • The executor interface will need to be extended with a method to return SQL, parametrized or not
  • When using Subquery and parametrizing the query, pypika's keywords are passed from the parent query, so they will need to be passed to the executor too, basically, making the the logic a bit more complicated
  • I tried to see how we can have better abstractions but honestly right now pypika is all over the place. QuerySet modifies private fields of pypika's query and executor builds its own query, etc. So this PR doesn't change the situation by much.

However, I moved the parametrization code to a private method which doesn't seem't seem to be complicated, it's 5 lines of code - please see _parametrize_query

I'm kinda inclined to think that it's Okay to keep parametrization as part of queryset but please let me know what you think!

@henadzit henadzit force-pushed the feat/parametrize-select-queries-2 branch from ca5e37d to 0d2fe63 Compare November 19, 2024 22:52
@@ -215,10 +215,13 @@ def __init__(self, query: "AwaitableQuery") -> None:
self.query = query

def get_sql(self, **kwargs: Any) -> str:
return self.query.as_query().get_sql(**kwargs)
self.query._choose_db_if_not_chosen()
return self.query._make_query(**kwargs)[0]
Copy link
Member

@abondar abondar Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's seems unclear - what exactly are pypika_kwargs and what can be passed there

Because as I am reading code - it uses ambigious **pypika_kwargs everywhere, even though only parameter passed I see explicitly used - is parametrizer

What else can be there, that is passed further down to query.get_sql?

Are we able to give more explicit typing to such params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abondar the things that could be in kwargs: dialect, with_namespace, secondary_quote_char, alias_quote_char, as_keyword, etc. Basically it holds the context when pypika builds SQL.

I'm not fond of this either but this is a legacy of pypika, you can search for get_sql there to see what I mean. Since Subquery extends Term it needs to use this approach. Same for _make_query in queryset.py but I guess the issue became more obvious since I renamed kwargs to pypika_kwargs there. This PR does not change anything fundamentally about this approach. I think we can improve this by doing something similar as ResolveContext but I'm not sure that this PR is a good place for it since it's already quite large.

abondar
abondar previously approved these changes Nov 21, 2024
@henadzit
Copy link
Contributor Author

@abondar, I replaced pypika's version, can you approve again? thanks!

@henadzit henadzit merged commit 7f077c1 into tortoise:develop Nov 21, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants