Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Code for #758 #2297

Open
wants to merge 48 commits into
base: next
Choose a base branch
from
Open

Updated Code for #758 #2297

wants to merge 48 commits into from

Conversation

johann8384
Copy link
Member

Fixes #758

Eduardo95 and others added 30 commits December 18, 2018 09:54
#1458)

* For branch next, add an expression function named FirstDifference, which calculates the first difference of a time series. I noticed there is MovingAverage calculation, so I thought maybe I can enrich the mathematics functions into that.

* add some unit tests for FirstDifference
- Add RpcResponder for handling callbacks asynchronously
UTILS:
 - Add two convenient methods in Config

Signed-off-by: Chris Larsen <[email protected]>
…#2034)

* Jackson has a serious security problem in 2.9.5, which will cause RCE

FasterXML/jackson-databind#2295

* Jackson has a serious security problem in 2.9.5, which will cause RCE

FasterXML/jackson-databind#2295

Co-authored-by: chi-chi weng <[email protected]>
* Make UniqueIdRpc aware of the mode

* Update javadoc on new method and rename test methods to be more descriptive

Co-authored-by: Simon Matic Langford <[email protected]>
Co-authored-by: Itamar Turner-Trauring <[email protected]>
Co-authored-by: Itamar Turner-Trauring <[email protected]>
* Jackson has a serious security problem in 2.9.5, which will cause RCE

FasterXML/jackson-databind#2295

* Jackson has a serious security problem in 2.9.5, which will cause RCE

FasterXML/jackson-databind#2295

Co-authored-by: chi-chi weng <[email protected]>
Enhanced check_tsd script evaluates each individual metric group separately when given a filter
* Fix SaltScanner race condition on spans maps

* Fix 1.6 compatibility
Synchronises the list that holds the KeyValues that have been produced
by the scanner callbacks. The list is accessed from multiple threads at
a time and wasn't thread-safe, causing inconsistent results and partial
loss of data in the response.

Relates to: #1753
Resolves: #1760
…x filter

and properly ignore rows that don't match the explicit filter. Also sort the
fuzzy filter list in ascending order and implement a static comparator instead
of instantiating one on each call.
Fixes a concurrency bug where scanners report their results into a map
and would overwrite each other's results

Resolves: #1753
* renamed instancename of logger

The previous name was copied from another script, cosmetic change only

* Change behaviour of --ignore-recent option

Previous option would fetch data from opentsdb from --duration seconds
ago to time.now(), and then try to remove timestamps that was inside the
--ignore-recent seconds ago, however the logic was flawed and it
actually only included these seconds. Furthermore opentsdb supports
setting an "end" parameter, so we use this to only get the data we want.

for example -d 180 -I 80, would render a query parameter that looks like
`?start=180s-ago&end=80s-ago`. Keeps it simple.

Also added debuglogging to output the actual query sent to OpenTSDB if
--debug option is enabled.

* fixed logic of --percent-over parameter

Previous behaviour didn't work due to wrong logic, would set "crit" or
"warn" to True regardless. This change fixes that.

* better output from logging

Add logmessages to be consistent across alerting-scenarios, and changed
format of some floats. Fixed a log messaged that displayed "crit" value
where it should have been "warn" value.

* Fixed bug in logic that parses results

Removed an if statement that `continue`:ed the for-loop if a result was
neither a `crit` or `warn` already, however this check also made the logic
skip the test to see if no values were returned by opentsdb and -A flag was
specified to alert in such scenarios.

* changed check for timestamps type

Previous behaviour was to check if a timestamp could be cast as a float,
which is a bit weird, because opentsdb will return integers.

I do doubt that opentsdb would return a timestamp that is not an integer
to begin with, so i suspect this check is redundant, but leaving it in
for now regardless, as per discussion in PR.
manolama and others added 18 commits October 26, 2020 17:49
* Add an SLA config flag for rollup intervals

Adds a configuration option for rollup intervals to specify their
maximum acceptable delay. Queries that cover a time between now and that
maximum delay will need to query other tables for that time interval.

* Add global config flag to enable splitting queries

Adds a global config flag to enable splitting queries that would hit the
rollup table, but the rollup table has a delay SLA configured.
In that case, this feature allows splitting a query into to; one that
gets the data from the rollups table until the time where it's
guaranteed to be available, and the rest from the raw table.

* Add a new SplitRollupQuery

Adds a SplitRollupQuery class that suports splitting a rollup query into
two separate queries.
This is useful for when a rollup table is filled by e.g. a batch job
that processes the data from the previous day on a daily basis. Rollup
data for yesterday will then only be available some time today. This
delay SLA can be configured on a per-table basis. The delay would
specify by how much time the table can be behind real time.

If a query comes in that would query data from that blackout period
where data is only available in the raw table, but not yet guaranteed to
be in the rollup table, the incoming query can be split into two using
the SplitRollupQuery class. It wraps a query that queries the rollup
table until the last guaranteed to be available timestamp based on the
SLA; and one that gets the remaining data from the raw table.

* Extract an AbstractQuery

Extracts an AbstractQuery from the TsdbQuery implementation since we'd
like to reuse some parts of it in other Query classes (in this case
SplitRollupQuery)

* Extract an AbstractSpanGroup

* Avoid NullPointerException when setting start time

Avoids a NullPointerException that happened when we were trying to set
the start time on a query that would be eligible to split, but due to
the SLA config only hit the raw table anyway.

* Scale timestamps to milliseconds for split queries

Scales all timestamps for split queries to milliseconds. It's important
to maintain consistent units between all the partial queries that make
up the bigger one.

* Fix starting time error for split queries

Fixes a bug that would happen when the start time of a query aligns
perfectly with the time configured in the SLA for the delay of a rollup
table.
For a defined SLA, e.g. 1 day, if the start time of the query was
exactly 1 day ago, the end time of the rollups part of the query would
be updated and then be equal to its start time. That isn't allowed and
causes a query exception.
* Allow end_time = start_time to be able to query and delete a single datapoint.

* Updating tests to support start and end time being the same

Co-authored-by: Øyvind Matheson Wergeland <[email protected]>
* Start of new HTTP API.

* Start of new "is this table available across all/some regions" API.

* Finish plausibly OK implementation of checking table availability.

* Finish plausibly ok Telnet/HTTP status endpoint.

* It compiles.

* Minimal live test works (for success case).

* Set a timeout on RPC queries.

* Unit test for the HTTP status RPC query.

* Start tests for checkNecessaryTablesAvailability().

* Checkpoint. Not working, probably because of old Mockito.

* Fix API usage.

* Got the test to pass.

* More unit tests.
* added tsdb_list_running_queries.py

* updated error handling

* updated tsdb_list_running_queries.py

* various cleanup, removal of old code etc

* switched to custom ConnectionException

* added comment for now_length

Co-authored-by: Hari Sekhon <[email protected]>
* Made HTTP Request method checking consistent, fixes a few cases where behavior is unexpected.
Simplified loading of internal RPC Handlers
Stop Sending BAD_REQUEST response as a PNG, allowed random code execution!

Fixes #793
Fixes #781
Fixes #831
Fixes #830

* Fixes for #831, #830, #781, #793
@johann8384 johann8384 changed the base branch from master to next September 26, 2024 17:22
@johann8384 johann8384 force-pushed the next branch 2 times, most recently from a47781e to e6dd3f3 Compare December 12, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.