ClickHouse: pause KeeperMap insertion momentarily during backup #252

aris-aiven · 2024-10-29T15:46:58Z

The PR alters replicated users' privileges (through GRANT & REVOKE) during the backup. This ensures the KeeperMap tables' snapshot is consistent with the frozen table data.

[DDB-1237]

joelynch · 2024-10-30T12:05:05Z

astacus/coordinator/plugins/clickhouse/plugin.py

@@ -137,6 +139,7 @@ def get_backup_steps(self, *, context: OperationContext) -> Sequence[Step[Any]]:
            FreezeTablesStep(
                clients=clickhouse_clients, freeze_name=self.freeze_name, freeze_unfreeze_timeout=self.freeze_timeout
            ),
+            KeeperMapTablesReadOnlyStep(clients=clickhouse_clients, allow_writes=True),


I feel like we need to have stronger guarantees that this will run in the case that there is a failure in RetrieveKeeperMapTableDataStep or FreezeTablesStep. Maybe each Step could optionally add a function onto a stack and in the case of a failure, all the functions are popped from the stack and run.

Or you could have a step that wraps other steps?

The only exception the KeeperMapTablesReadOnlyStep should be emitting is ClickHouseClientQueryError, so the handle_step_failure should be noexcept.

Khatskevich · 2024-10-30T12:12:41Z

tests/integration/coordinator/plugins/clickhouse/test_steps.py

+        await user_client.execute(
+            b"INSERT INTO `keeperdata`.`keepertable` SETTINGS wait_for_async_insert=1  SELECT *, materialize(2) FROM numbers(3)"
+        )
+    with pytest.raises(ClickHouseClientQueryError, match=".*ACCESS_DENIED.*"):


So, we are basically do it for the "ClickHouse connector for Kafka". I have not read it's code, but I suspect some new problems in the normal operation.
What does exactly once mean: every part is processed, every part is not processed 2 times.
Before this commit if there are not backup restore and errors during the operation it was exactly once.
Now, during backup some part will not be able to say that the work is done.
How exactly will it behave?
Because it theoretically could

fail instantly and someone else repeats the whole processing

be restarted and the processing is done again

If so, it is no better than it was, because previously it was not happening during normal operation

The sink connector pauses writes to ClickHouse for these exceptions: https://github.com/ClickHouse/clickhouse-kafka-connect/blob/main/src/main/java/com/clickhouse/kafka/connect/util/Utils.java#L68

If the exception is retriable, the connector won't commit the offsets to the KeeperMap table, and won't try to write that part to ClickHouse until later. The design docs describes the state machine transitions: https://github.com/ClickHouse/clickhouse-kafka-connect/blob/main/docs/DESIGN.md

Critically, the raised ACCESS_DENIED corresponds to exception code 497 which isn't included in the retriable exceptions list. So we need to change the approach a bit.

Khatskevich · 2024-10-30T12:19:30Z

astacus/coordinator/plugins/clickhouse/steps.py

+        statements = [
+            privilege_altering_fun(table, escape_sql_identifier(user))
+            for table in keeper_map_table_names
+            for user in replicated_users_names


this looks a bit dangerous, because num_users*num_tables may blow up for some cases which we do not control.

and it might be not only just long, but will grow replication queue very fast

This replication queue not only should be stored, but also replayed by new replicas, repeating all those steps (as I understand)

if other problems are solved, may be it makes sense to add a special setting for the read only keepermap?

I don't think that's a problem in practice, we usually have few users and little to no KeeperMap tables.

Khatskevich · 2024-10-30T12:19:47Z

astacus/coordinator/plugins/clickhouse/steps.py

+            for table in keeper_map_table_names
+            for user in replicated_users_names
+        ]
+        await asyncio.gather(*(self.clients[0].execute(statement) for statement in statements))


The effect is not immediate. Ideally, we should wait until this statement propagates to all the replicas

Right, that makes sense, will update the logic to wait for the effect. Thanks!

I've played around with waiting for the correct grants to be in the system.grants table, or by using the SHOW GRANTS FOR ALL. Unfortunately that doesn't work for inherited grants, since they're computed on the fly and the table is "compressed".
The other solution would be to have a sentinel value that we try to insert in the table, but that's not possible since we don't know the table structure: we would have to go read the types to create a valid INSERT statement.

Anyhow there's no good way to wait for a GRANT statement to propagate. Let me know if the updated solution doesn't look to bad (I don't think we can aim for much better).

The PR alters replicated users' privileges (through GRANT & REVOKE) during the backup. This ensures the KeeperMap tables' snapshot is consistent with the frozen table data. [DDB-1237]

joelynch

Some suggestions that can be covered in a follow up

joelynch · 2024-11-04T15:06:19Z

astacus/coordinator/coordinator.py

-                        logger.info("Step %s failed: %s", step, str(e))
+                    except (StepFailedError, WaitResultError) as exc:
+                        logger.info("Step %s failed: %s", step, str(exc))
+                        await step.handle_step_failure(cluster, context)


Oh I was thinking maybe more using a stack like

failure_functions = [] for i, step in enumerate(self.steps, 1): failure_functions.append(step.handle_step_failure) ... try: ... except ...: for func in reversed(failure_functions): func(cluster, context)

But this is probably good enough

maybe the except needs to be more broad?

Reading the code, if I understand correctly there are two scenarios:

If the failure is Transient then the step will be retried, so there's nothing to put on the stack yet.

If the failure is permanent (astacus.common.exceptions.PermanentException subclass) then there won't be a next step. The exception clause in question sets the step to failed, and higher up the caller stack the whole Op is aborted (in this case the BackupOp).

Which case do you think could result in multiple steps needing to be unwound?

maybe the except needs to be more broad?

Looking at the usages for other PermanentException subclasses they're either raised during the restore steps or by the Rohmu storage subclasses (which would be caught when the server is being started/configured I think). Might be worth broadening StepFailedError to PermanentException..

aris-aiven force-pushed the aris-pause-keepermap-tables branch 2 times, most recently from c91dac7 to f48b634 Compare October 29, 2024 16:09

aris-aiven marked this pull request as ready for review October 29, 2024 16:12

aris-aiven requested a review from a team October 29, 2024 16:12

aris-aiven force-pushed the aris-pause-keepermap-tables branch from f48b634 to 2ca258a Compare October 29, 2024 16:14

joelynch reviewed Oct 30, 2024

View reviewed changes

joelynch self-requested a review October 30, 2024 12:06

Khatskevich reviewed Oct 30, 2024

View reviewed changes

aris-aiven force-pushed the aris-pause-keepermap-tables branch 4 times, most recently from 76d7ad4 to ee5469f Compare October 31, 2024 12:28

aris-aiven added 2 commits October 31, 2024 16:43

ClickHouse: pause KeeperMap insertion momentarily during backup

aa819e8

The PR alters replicated users' privileges (through GRANT & REVOKE) during the backup. This ensures the KeeperMap tables' snapshot is consistent with the frozen table data. [DDB-1237]

ClickHouse: do a best-effort wait for grant changes

d51a48b

aris-aiven force-pushed the aris-pause-keepermap-tables branch from ee5469f to 1f613c2 Compare October 31, 2024 15:44

aris-aiven requested review from a team and Khatskevich October 31, 2024 15:47

Handle step failure by running optional exception handling function

e92f68c

aris-aiven force-pushed the aris-pause-keepermap-tables branch from 1f613c2 to e92f68c Compare October 31, 2024 23:11

joelynch approved these changes Nov 4, 2024

View reviewed changes

joelynch merged commit 44520ef into main Nov 4, 2024
2 checks passed

joelynch deleted the aris-pause-keepermap-tables branch November 4, 2024 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClickHouse: pause KeeperMap insertion momentarily during backup #252

ClickHouse: pause KeeperMap insertion momentarily during backup #252

aris-aiven commented Oct 29, 2024

joelynch Oct 30, 2024 •

edited

Loading

aris-aiven Oct 31, 2024

Khatskevich Oct 30, 2024 •

edited

Loading

aris-aiven Oct 30, 2024

aris-aiven Oct 30, 2024 •

edited

Loading

Khatskevich Oct 30, 2024

Khatskevich Oct 30, 2024

aris-aiven Oct 30, 2024

Khatskevich Oct 30, 2024

aris-aiven Oct 30, 2024

aris-aiven Oct 31, 2024 •

edited

Loading

aris-aiven Oct 31, 2024

joelynch left a comment

joelynch Nov 4, 2024

joelynch Nov 4, 2024

aris-aiven Nov 4, 2024 •

edited

Loading

ClickHouse: pause KeeperMap insertion momentarily during backup #252

ClickHouse: pause KeeperMap insertion momentarily during backup #252

Conversation

aris-aiven commented Oct 29, 2024

joelynch Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Khatskevich Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aris-aiven Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aris-aiven Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelynch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aris-aiven Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

joelynch Oct 30, 2024 •

edited

Loading

Khatskevich Oct 30, 2024 •

edited

Loading

aris-aiven Oct 30, 2024 •

edited

Loading

aris-aiven Oct 31, 2024 •

edited

Loading

aris-aiven Nov 4, 2024 •

edited

Loading