-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thresholds/v20 #11358
Thresholds/v20 #11358
Conversation
Add support for 'by_flow' track option. This allows using the various threshold options in the context of a single flow. Example: alert tcp ... stream-event:pkt_broken_ack; \ threshold:type limit, track by_flow, count 1, seconds 3600; The example would limit the number of alerts to once per hour for packets triggering the 'pkt_broken_ack' stream event. Implemented as a special "flowvar" holding the threshold entries. This means no synchronization is required, making this a cheaper option compared to the other trackers. Ticket: OISF#6822.
Allow rate_filter and thresholds from the global config to specify tracking "by_flow".
Traffic variables (flowvars, flowbits, xbits, etc) use a smaller int for their type than detection types. As a workaround make sure the values fit in a uint8_t.
Limits propegation checked for DETECT_DEPTH as a content flag, which appears to have worked by chance. After reshuffling the keyword id's it no longer worked. This patch uses the proper flag DETECT_CONTENT_DEPTH.
Thresholding often has 2 stages: 1. recording matches 2. appling an action, like suppress E.g. with something like: threshold:type limit, count 10, seconds 3600, track by_src; the recording state is about counting 10 first hits for an IP, then followed by the "suppress" state that might last an hour. By_src/by_dst are expensive, as they do a host table lookup and lock the host. If many threads require this access, lock contention becomes a serious problem. This patch adds a thread local cache to avoid the synchronization overhead. When the threshold for a host enters the "apply" stage, a thread local hash entry is added. This entry knows the expiry time and the action to apply. This way the action can be applied w/o the synchronization overhead. A rbtree is used to handle expiration.
Packet pointer is not used during allocation.
Add a callback and helper function to handle data expiration. Update datasets to explicitly not use expiration.
Instead of a Host and IPPair table thresholding layer, use a dedicated THash to store both. This allows hashing on host+sid+tracker or ippair+sid+tracker, to create more unique hash keys. This allows for fewer hash collisions. The per rule tracking also uses this, so that the single big lock is no longer a single point of contention. Reimplement storage for flow thresholds to reuse as much logic as possible from the host/ippair/rule thresholds. Ticket: OISF#426.
Use the same hash key as for the regular threshold storage, so include gid, rev, tentant id.
Information: QA ran without warnings. Pipeline 21192 |
if (track == TRACK_SRC) { | ||
addr = p->src.addr_data32[0]; | ||
} else if (track == TRACK_DST) { | ||
addr = p->dst.addr_data32[0]; | ||
} else { | ||
return -1; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPv6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, implementation is IPv4 only right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ticket required? Or implement before merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? It's an optimization, nothing is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreso for completeness. It is not discussed in any of the commits, or the tickets referenced by this PR that this is an IPv4 only optimization, so the lack of IPv6 looks like an oversight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call sites all use PacketIsIPv4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least to me that means where is the IPv6 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can create a ticket if you want
{ | ||
if (SCTIME_CMP_GTE(a->expires_at, b->expires_at)) { | ||
return 1; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unneeded else
SCFree(n); | ||
} | ||
return -1; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unneeded else
} | ||
|
||
/** \brief Check Thread local thresholding cache | ||
* \retval negative value cache miss (-1 miss, -2 expired) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: retval values missing
.retval = retval, | ||
.addr = addr, | ||
.sid = sid, | ||
.expires_at = expires, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Do we need expires_at
and retval
here as they're not used for lookup?
}; | ||
ThresholdCacheItem *found = HashTableLookup(threshold_cache_ht, &lookup, 0); | ||
if (found) { | ||
if (SCTIME_CMP_GT(p->ts, found->expires_at)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: what if it has just timed out? So the =
condition
THRESHOLD_CACHE_RB_REMOVE(&threshold_cache_tree, found); | ||
HashTableRemove(threshold_cache_ht, found, 0); | ||
cache_lookup_miss_expired++; | ||
return -2; // CACHE MISS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: also mention expired?
src/detect-engine-threshold.c
Outdated
typedef struct ThresholdEntry_ { | ||
uint32_t sid; /**< Signature id */ | ||
uint32_t gid; /**< Signature group id */ | ||
int track; /**< Track type: by_src, by_src */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo by_src
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, mostly. Will take another look when it is staged.
Nice thoughts around optimizations!
/* expired, so reset */ | ||
lookup_tsh->tv1 = p->ts; | ||
lookup_tsh->current_count = 1; | ||
if (SCTIME_CMP_LTE(p->ts, entry)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. so =
is considered within the time limit.
Please do the full review at this stage, not when things are staged into a next branch. |
ok. Will take another look tomorrow. Thank you. This is big work. |
.key[TC_GID] = gid, | ||
.key[TC_REV] = rev, | ||
.key[TC_TENANT] = p->tenant_id, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be vlan aware here? And elsewhere with respect to thresholding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created https://redmine.openinfosecfoundation.org/issues/7125 for this
It's an issue in existing code. I do not intend to address it in this work.
comments addressed in #11388 |
SV_BRANCH=OISF/suricata-verify#1938
#11324 rebased