Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: additional options to allow for batching and asynchronous batch…
… handling for BroadwayAdapter (#103) ## Related Ticket(s) SIGNAL-7088 <!-- Enter the Jira issue below in the following format: PROJECT-## --> ## Checklist <!-- For each bullet, ensure your pr meets the criteria and write a note explaining how this PR relates. Mark them as complete as they are done. All top-level checkboxes should be checked regardless of their relevance to the pr with a note explaining whether they are relevant or not. --> - [x] Code conforms to the [Elixir Styleguide](https://github.com/christopheradams/elixir_style_guide) ## Problem Using Kafee's BroadwayAdapter with the FirehoseConsumer in wms-service, I ran a comparison test with a similar dummy handler for ProcessAdapter along with the dummy handler for BroadwayAdapter. ProcessAdapter version took 7 seconds with a LOT of DBConnection pool errors, and the BroadwayAdapater took 66 seconds (at 12 partitions). Roughly 10x difference, so a crude way to match would be to jack up partitions from 12 to 120. Naturally I then jacked up partition to 120, keeping consumer_concurrency to 12, and it did help - now it took only 44 seconds. Then I jacked up the consumer_concurrency to 120, and then I started to get DBConnection pool / page issues because my local DB couldn’t power 120 connection pooling, even with upping it in dev.exs. Therefore this “vanilla” approach to scaling won’t work. <!-- What is the problem you're solving or feature you're implementing? Link to any Jira tickets or previous discussions of the issue. --> ## Details My first thought was if a naive way of tweaking batch config values would work - changing concurrency, batch size, or partitioning function. Partitioning function (to override Kafka's partitioning) is locked down by BroadwayKafka so that's not possible - concurrent processes used would only be up to the number of partitions, which I guess should be abiding by the rules of Kafka. But regardless, [batching](https://hexdocs.pm/broadway/Broadway.html#module-batching) does expose a place where some chunked operation can happen. I decided our events are pretty idempotent and protected from rigid rules of being chronologically ordered - this is battle tested already. I mean the current ProcessAdapter runs the event handlers [asynchronously already](https://github.com/stordco/wms-service/blob/main/lib/warehouse/helpers.ex#L22). We're exactly mimicing the pattern that ProcessAdapter goes through, if we use Batching: * ProcessAdapter records the events as they happen for that request_id, in chronological order, in its process state. Then, it runs it through the event handlers asynchronously. * BroadwayAdapter in this PR would have capability to group messages that are coming in chronological order from partitions into batches, and for the messages in each batch, there's a new configuration to allow for the messages to be handled asynchronously. For simplicity of code paths, the pragmatic approach is chosed - it's always going to go through a default batching, with a size of 1 unless overriding config options is passed. ### Local test result Using the same 400 threshold automation config trigger (see description in this [PR](stordco/wms-service#4156)), doing it async with following settings resulted in **7 seconds (previously 66 seconds)** to go through the same number of events, with NO DBConnection errors popping up! The batch options used were: ``` batching: [ concurrency: System.schedulers_online() * 2, size: 100, timeout: 500, async_run: true ] ``` <!-- Include a brief overview of the technical process you took (or are going to take!) to get from the problem to the solution. -->
- Loading branch information