Using Benthos as a Conduit Processor #1614
Replies: 3 comments 20 replies
-
It was briefly mentioned in the original issue in that this would face challenges as standalone processors need to be wasm. Here is an article from the Benthos dev about how to build Benthos in wasm. Perhaps this can be done here? https://www.benthos.dev/blog/2019/05/27/compiling-benthos-to-wasm/ |
Beta Was this translation helpful? Give feedback.
-
This is definitely an interesting idea. While Conduit provides a set of builtin processors including a JavaScript processor and a Go SDK for standalone processors, it's currently missing an easy way to enrich data in the pipeline. This is the part where I think Benthos could provide a lot of value. It could also be valuable to users who are already familiar with Benthos and would like to use Bloblang to process their data. Sadly, we'll hit an issue if we try to tackle this as a standalone processor. The thing is that standalone processors currently run in a constrained WASM sandbox and don't have access to the file system or network. This makes enriching pretty much impossible, with or without Benthos. To solve this we'd have to bind system calls for the standard In other words, implementing a Benthos standalone processor would require significant effort to enable the needed functionality. However, there's another way - a builtin processor. A builtin processor is running inside of Conduit directly (regular Go code), so it doesn't have any of the WASM sandbox limitations. This approach could work, although we don't plan to tackle this in the next few months, so I'll give you some pointers, in case you want to give it a go yourself. If you wanted to create a fully functional Benthos processor, you would have to create your own entrypoint (like we have here) and add the Benthos processor to the global map of builtin processors before calling Let us know if this was helpful and if we can provide any further guidance, in case you decide to give it a go. |
Beta Was this translation helpful? Give feedback.
-
I use https://github.com/calmera/jetscript Its NATS and Benthos together with BlobLang... If Conduit CDC can output to NATS then Jetscript can react to that. So Nats is the Bus between Conduit and Benthos in this case. https://github.com/julien040/anyquery is like Conduit, but keeps a SQL DB of the changed records, allowing you to Query it. Ironically you could build a connector in Conduit to do CDC on the AnyQuery SQLite DB records. Just like Conduit CDC, AnyQuery can fire out to NATS, and so JetScript can react to those messages in NATS using Benthos. The Benthos Scripts are themselves stored in NATS also, so you can easily Scale out 1,00 of servers that just catchup to the Scripts and then the messages. It's just like how devs store binaries in NATS, and when a new servers provisions, it asks for them from NATS and then starts processing Message send in to it. When The Servers dies or whatever, it naturally falls on the NATS System for another day. I really like using NATS for everything :) You can built rings of NATS like we built rings of Caches. |
Beta Was this translation helpful? Give feedback.
-
I'm relatively new to these sorts of tools, but it seems to me that Conduit and Benthos are somewhat redundant as they are both stream processors. As such, it seems somewhat silly to use them together (as with the PoC Benthos Connector - conduitio-labs/conduit-connector-benthos#4) - better to choose one.
The main difference between them seems to be that Conduit is much more focused on CDC from data sources/stores/bases while Benthos is much more focused on the actual pipeline processing/transformation - it has many dozens of processors while Conduit has only a handful. Their processors also allow for enrichment via Sql queries, nats kv etc...
It seems to me that the universal/OpenCDC of Conduit is far more fundamental/important, since a pipeline ultimately needs to start from some data source, and should therefore be used as the main tool. But it would be a shame to not leverage the immense processing power of Benthos.
So, what I'm thinking is that rather than use Benthos as a Conduit Source/Destination, as was attempted in this repo, why not just embed it's pipeline processors into Conduit as a Standalone Processor? It could have some sort of Benthos bloblang mechanism for choosing the desired Benthos processors.
This would allow Conduit to focus on its strength of CDC, while leveraging Benthos' strength in stream processing. You could, of course, always make other custom processors in Go or JavaScript to suit your needs (or probably even use existing Benthos custom processors).
It's a topic that has been brought up various times in Benthos' Github and Discord, and they're generally responded to with the following links:
Apparently this can be used to embed Benthos into a golang app/binary
https://pkg.go.dev/github.com/benthosdev/benthos/v4/public/service#example-package-StreamBuilderConfig
One more example of that api here redpanda-data/connect#1727 (comment)
And here's a repo that apparently has relevant examples
https://github.com/benthosdev/benthos-plugin-example
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions