You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar issues.
Version
2.1.5
What's Wrong?
SQL 为简单的 select into s3
SELECT file_id, cast(data_start_time as String ) as data_start_time, cast(data_end_time as String ) as data_end_time, device_type... FROM t_table_name where data_start_time >= '2024-11-28 00:00:00.000' and data_start_time < '2024-11-28 01:00:00.000' order by file_id,data_start_time INTO OUTFILE "s3://xxx/xxx/2024_11_28_00/part_v2_" FORMAT AS PARQUET PROPERTIES( "s3.endpoint" = "http://xxx.com/", "s3.access_key" = "xxx", "s3.secret_key"="xxx", "s3.region" = "xxx", "max_file_size" = "120MB" );
Search before asking
Version
2.1.5
What's Wrong?
SQL 为简单的 select into s3
SELECT file_id, cast(data_start_time as String ) as data_start_time, cast(data_end_time as String ) as data_end_time, device_type... FROM t_table_name where data_start_time >= '2024-11-28 00:00:00.000' and data_start_time < '2024-11-28 01:00:00.000' order by file_id,data_start_time INTO OUTFILE "s3://xxx/xxx/2024_11_28_00/part_v2_" FORMAT AS PARQUET PROPERTIES( "s3.endpoint" = "http://xxx.com/", "s3.access_key" = "xxx", "s3.secret_key"="xxx", "s3.region" = "xxx", "max_file_size" = "120MB" );
batch_size 设置为 10 万时
set batch_size=100000;
+------------+-----------+-----------+------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+-----------+------------------------------------------------------------------------------------------------+
| 5 | 14602938 | 671738439 | s3://xxx/2024_11_28_00/part_v2_66d16409bc2a4b37-9905759434b51248_* |
+------------+-----------+-----------+------------------------------------------------------------------------------------------------+
batch_size 设置为默认值
set batch_size=4096;
+------------+-----------+------------+------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+------------+------------------------------------------------------------------------------------------------+
| 10 | 29803106 | 1316012670 | s3://xxx/2024_11_28_00/part_v2_2cef1e9dba2a4749-a9a688a90843fa53_* |
+------------+-----------+------------+------------------------------------------------------------------------------------------------+
batch_size 为默认值的行数应该是对的,batch_size 比较大的情况下,就会少数。batch_size 如果更大,SQL 会卡死,看不出原因。
之所以要设置 batch_size 是因为导出 parquet 时,希望能够设置 block 的行数,减少 parquet 中 block 的数量。因为除了设置 batch_size 没有别的方法能够控制这个数量。
What You Expected?
skip
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: