You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pypgstac load items command does not handle item loading atomically. There are scenarios where some items fail to load without pypgstac raising any exceptions. This is a serious issue because it creates the illusion that everything has been loaded correctly from the client's perspective, while in reality, some items are missing. Below is a reproducible test case that demonstrates this behavior:
Create 2 collections with a month partition
Prepare two files with items: items1.njson and items2.njson. Each file contains 10 items with the same datetime, belonging to collection1 (items1.njson) and 10 to collection2 (items2.njson).
Check the count. There should be 20 items, but only 10 were ingested:
postgis=# SELECT COUNT(1) FROM items;
count
-------
10
(1 row)
Logs indicate a deadlock during the update_partition_stats function execution:
pgstac | 2024-07-01 20:44:12.047 UTC [141] ERROR: deadlock detected
pgstac | 2024-07-01 20:44:12.047 UTC [141] DETAIL: Process 141 waits for AccessExclusiveLock on relation 18386 of database 16384; blocked by process 140.
pgstac | Process 140 waits for AccessExclusiveLock on relation 18972 of database 16384; blocked by process 141.
pgstac | Process 141: SELECT update_partition_stats_q($1);
pgstac | Process 140: SELECT update_partition_stats_q($1);
pgstac | 2024-07-01 20:44:12.047 UTC [141] HINT: See server log for query details.
pgstac | 2024-07-01 20:44:12.047 UTC [141] CONTEXT: SQL statement "REFRESH MATERIALIZED VIEW partitions"
pgstac | PL/pgSQL function update_partition_stats(text,boolean) line 35 at SQL statement
pgstac | SQL statement "SELECT update_partition_stats('_items_3_202003', 'f');"
pgstac | PL/pgSQL function run_or_queue(text) line 11 at EXECUTE
pgstac | SQL statement "SELECT run_or_queue(
pgstac | format('SELECT update_partition_stats(%L, %L);', _partition, istrigger)
pgstac | )"
pgstac | PL/pgSQL function update_partition_stats_q(text,boolean) line 4 at PERFORM
pgstac | 2024-07-01 20:44:12.047 UTC [141] STATEMENT: SELECT update_partition_stats_q($1);
I understand that concurrent data ingestion into the same partition is not recommended. However, it would be helpful to receive a clear exception when something goes wrong. Additionally, if I'm correct, in this example we are ingesting data into different partitions.
The text was updated successfully, but these errors were encountered:
The
pypgstac load items
command does not handle item loading atomically. There are scenarios where some items fail to load withoutpypgstac
raising any exceptions. This is a serious issue because it creates the illusion that everything has been loaded correctly from the client's perspective, while in reality, some items are missing. Below is a reproducible test case that demonstrates this behavior:items1.njson
anditems2.njson
. Each file contains 10 items with the same datetime, belonging tocollection1
(items1.njson
) and 10 tocollection2
(items2.njson
).Logs indicate a deadlock during the update_partition_stats function execution:
I understand that concurrent data ingestion into the same partition is not recommended. However, it would be helpful to receive a clear exception when something goes wrong. Additionally, if I'm correct, in this example we are ingesting data into different partitions.
The text was updated successfully, but these errors were encountered: