-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support multi-statement .execute()
calls
#162
Comments
Hi @dataders thanks for opening this issue. i appreciate you taking the time to do so. As discussed in dbt-labs/dbt-core#39, moving to support simple query protocol requires a non-trivial amount of work on the redshift-connector side, and some pitfalls related to text transfer protocol. Given that, I will raise this request with our redshift driver team and let you know if it can be prioritized/added to our roadmap as we are currently prioritizing other features. In the mean time, my best recommendation would be to utilize a sql parsing utility like sqlparse to split multi-statements into individual statements than can be executed by redshift-connector to help immediately relieve the pain caused by redshift-connectors sole use of extended query protocol. I apologize we haven't been able to incorporate this functionality into the project earlier, it's something I've always wanted to see come to fruition since multi-statement execution has been a long running ask from folks on GitHub. 😢 I will provide an update here after discussing with the team! :) |
Hey Brooke and OP, I'm glad to find this discussion here. I'm switching to redshift_connector for similar reasons as described (IAM auth), and my main concern with the multiple statements not being possible (and sqlparse being the workaround) is in the case of a longer set of SQL commands that generate temporary tables along the way. For instance, in this oversimplified example: query_tmp = """ SELECT a My understanding is that sqlparse will split this into two statements, execute the first, then the second will fail because it doesn't see the temporary table. Similarly, submitting this to redshift_connector directly will lead to the 'cannot insert multiple commands into a prepared statement.' Am I missing something here? Is there a way to get this to run with some combination of sqlparse and redshift_connector without refactoring the query to make non-temporary tables (and drop them at the end), which may be a time-consuming refactor depending on the size of the codebase? Appreciate all the work on this! |
Hey @merubhanot , Here's an example of how sqlparse could be used. Please note I expanded your example a bit to include creation of table1 and insertion of some testing data so it can be run to completion. This is possible because when using a temporary table, it remains in scope for the duration of your session. You can read a bit more about this in the Official AWS Docs for CREATE TABLE, but the short of it is that these split up sql statements are executed in the same session.
import sqlparse
import redshift_connector
query_tmp = sqlparse.split("""
CREATE TEMP TABLE table1(a int, b int);
INSERT INTO table1 (a, b) VALUES (1,2);
CREATE TEMP TABLE t1 AS (
SELECT a,b
FROM table1
);
SELECT a FROM t1;
""")
with redshift_connector.connect(...) as conn:
with conn.cursor() as cursor:
for stmt in query_tmp:
cursor.execute(stmt)
print(cursor.fetchall()) >>> ([1],) |
.execute()
calls.execute()
calls
ask
related to: dbt-labs/dbt-core#39 dbt-labs/dbt-core#46 dbt-labs/dbt-core#82
allow for dbt (and other consumers/users) to make use of this driver in a simple query flow manner. two specific places that introduce friction are:
;
-delimited SQL statements cannot be sent to `cursor.execute()I love @Brooke-white 's summary in #39 (comment). I've been looking for this terminology all week!
background
dbt is a framework that effectively takes
SELECT
statements, and wraps them within boilerplate DDL to create a DAG of tables and views. Only a somewhat auxillary feature, seeds, make use of prepared statements with anINSERT INTO ... VALUES
statement.In some ways, dbt is a stand in for the parsing and preparation that a database might do in an extended query flow pattern. Using both both complicates things greatly.
recently, in dbt-labs/dbt-redshift#251, dbt-redshift migrated away from
psycopg2
toredshift_connector
in order to enablehowever this change has introduced a number of regressions for our users. While we have both:
autocommit=True
for all connections, as well asexample
Error message (full traceback)
counter-counter-argument
@Brooke-white I saw you mention somewhere that the spec for .execute() in the PEP 249 – Python Database API Specification v2.0
lays out that the input may only be a single
operation
and that.executemany()
should be used for multi-operation/-statement queries. However, if I'm reading this right, the purpose of.executemany()
is to execute the the operation over a list of parameter sets. The first argument is still a singularoperation
.sidebar
I'm very unclear on exact meanings of the following terms here:
query
,statement
,operation
further complications
all of this confusion is making it even harder to determine how transactions are handled by default between the simple and extended query flows
The text was updated successfully, but these errors were encountered: