You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here I'd like to share some of the findings related to an attempt to integrate it using an @rx plugin and Hyperscan 5.4, hoping that these are useful for future implementation.
In our findings:
Bare Hyperscan would not work with significant number of SecRule in OWASP or most other rulesets. It will fail with basic things like !@rx ^$ and error: Start of match is not currently supported for patterns which match an empty buffer.
We then tried to use Chimera, and the problem with regex support was solved.
Our initial implementation was roughly based on the patch, accounting for Chimera Go API:
varglobalHandler chimera.HandlerFunc=func(iduint, from, touint64, flagsuint, captured []*chimera.Capture, ctxinterface{}) chimera.Callback {
// Type-assert ctx to *HandlerContexthandlerCtx, ok:=ctx.(*HandlerContext)
if!ok {
returnchimera.Terminate// Skip if ctx is not of the expected type
}
// Access the counter and matches fields from handlerCtxifatomic.LoadInt32(handlerCtx.counter) >=10 {
*handlerCtx.matches=truereturnchimera.SkipPattern// Stop after 10 matches
}
atomic.AddInt32(handlerCtx.counter, 1)
// Capture matches if capturing is enabledifhandlerCtx.tx.Capturing() {
bts:=handlerCtx.btsstart:=bytes.LastIndexByte(bts[:from], '\n')
end:=int(to) +bytes.IndexByte(bts[to:], '\n')
// Ensure start and end are within boundsifstart==-1 {
start=0
} else {
start++
}
ifend==-1 {
end=len(bts)
}
handlerCtx.tx.CaptureField(int(*handlerCtx.counter), string(bts[start:end]))
}
returnchimera.Continue
}
followed by o.db.Database.Scan(bts, scratch, globalHandler, handlerCtx).
I do not provide the complete plugin code as it results in degraded performance as compared to even the native Go regexp capabilities. And why is where things get interesting:
Hyperscan will work faster only if relevant patterns are pre-compiled into a single Hyperscan database
The patch I referenced and the first version of our @rx plugin allocated a separate hyperscan database per pattern which is completely not what Hyperscan design principle is about. This approach is slower, and so we looked into how shared Hyperscan database can be used with Coraza.
A second iteration of @rx plugin was created that simply registered all patterns into a single Hyperscan database. This furthermore degraded performance at runtime, due to "bloated Hyperscan database" with irrelevant patterns being used across all data points, furthermore because Evaluate is invoked on each pattern at runtime anyway.
So there are multiple things that need to be accounted for in order to implement it in a way that will work faster, and those are in Coraza itself. Primarily batching patterns that look into the same data point into a single Hyperscan database, per phase e.g.:
These rules' patterns can be batched into a hyperscan database because:
They all deal with a single entry point REQUEST_URI
Belong to the same phase
Not chained
But currently from what I can tell looking at Coraza code, rules are sequentially processed. How feasible would it be to pre-aggregate patterns in order to supporting Hyperscan "multiple patterns in a database" design requirement? Or do you think it completely contradict the nature of how modSecurity rulesets must be processed? As immediate roadblocks, I see:
Data transformations would require separate scans/Hyperscan database
Dealing with aggregation of patterns from patterns that come from chained rules
The text was updated successfully, but these errors were encountered:
Hey! Thank you for your detailed report.
Indeed, we know we would benefit from this, but we have a minor setback. Most of the core team uses M1/M2 MacBooks and we cannot test this locally. It breaks compatibility with ARM and it doesn't scale performance for AMD
Regarding your implementation, I was looking at a similar approach integrating with our MEMOIZE feature. But I believe we would have to create a regex service that would take care of a regex pool asynchronously.
This discussion is open and I'm personally interested in this feature
I have the same setback but used Goland and set it up to compile/run the code on a remote VPS with Intel. But it looks like it's quite easier to go with https://github.com/VectorCamp/vectorscan which supports more platforms including ARM.
Coraza would greatly benefit from Hyperscan support, and basically solve issues like Performance drop with larger request body.
Here I'd like to share some of the findings related to an attempt to integrate it using an
@rx
plugin and Hyperscan 5.4, hoping that these are useful for future implementation.In our findings:
SecRule
in OWASP or most other rulesets. It will fail with basic things like!@rx ^$
and error:Start of match is not currently supported for patterns which match an empty buffer.
followed by
o.db.Database.Scan(bts, scratch, globalHandler, handlerCtx)
.I do not provide the complete plugin code as it results in degraded performance as compared to even the native Go regexp capabilities. And why is where things get interesting:
Hyperscan will work faster only if relevant patterns are pre-compiled into a single Hyperscan database
The patch I referenced and the first version of our
@rx
plugin allocated a separate hyperscan database per pattern which is completely not what Hyperscan design principle is about. This approach is slower, and so we looked into how shared Hyperscan database can be used with Coraza.A second iteration of
@rx
plugin was created that simply registered all patterns into a single Hyperscan database. This furthermore degraded performance at runtime, due to "bloated Hyperscan database" with irrelevant patterns being used across all data points, furthermore becauseEvaluate
is invoked on each pattern at runtime anyway.So there are multiple things that need to be accounted for in order to implement it in a way that will work faster, and those are in Coraza itself. Primarily batching patterns that look into the same data point into a single Hyperscan database, per phase e.g.:
These rules' patterns can be batched into a hyperscan database because:
REQUEST_URI
But currently from what I can tell looking at Coraza code, rules are sequentially processed. How feasible would it be to pre-aggregate patterns in order to supporting Hyperscan "multiple patterns in a database" design requirement? Or do you think it completely contradict the nature of how modSecurity rulesets must be processed? As immediate roadblocks, I see:
The text was updated successfully, but these errors were encountered: