Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction verification enhancements #60

Merged
merged 9 commits into from
Nov 25, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Nov 14, 2024

Adds new flags and validation enhancements to kgo-verifier:

--tombstone-probability, which allows for random production of tombstone records per the input parameter.

--compacted, which indicates that the topic to be verified is compacted. A warning about gaps in consumed offsets will not be shown if this is true

--validateLatestValues, which if true, performs verification of the consumed keys against the latest values produced by a worker. The log should be fully compacted before the consumer is started if this flag is passed as true.

Allows the `producer_worker` and `repeater_worker` to generate tombstone
records.
@@ -108,7 +109,10 @@ func (pw *ProducerWorker) newRecord(producerId int, sequence int64) *kgo.Record
pw.Status.AbortedTransactionMessages += 1
}

payload := make([]byte, pw.config.messageSize)
payload := pw.config.valueGenerator.Generate()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why valueGenerator wasn't used here before instead of an empty make([]byte)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😨

nvartolomei
nvartolomei previously approved these changes Nov 14, 2024
Copy link
Contributor

@nvartolomei nvartolomei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a PR to redpanda and check that it works as expect by pointing kgo at this commit sha before merging this PR. Otherwise if we merge and detect a problem we get a "head of line blocking" on making concurrent changes to kgo.

Suggestion: For testing compacted topics I think we can do more. kgo-verifier should keep track of the keys and and the end checkpoint the expected set of keys and associated values (last values produced for a key minus tombstones). Then during consumption we can track the same and compare at the end.

Or maybe we have a test like this already?

@@ -108,7 +109,10 @@ func (pw *ProducerWorker) newRecord(producerId int, sequence int64) *kgo.Record
pw.Status.AbortedTransactionMessages += 1
}

payload := make([]byte, pw.config.messageSize)
payload := pw.config.valueGenerator.Generate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😨

@@ -53,7 +53,7 @@ type ValidatorStatus struct {
lastLeaderEpoch map[int32]int32
}

func (cs *ValidatorStatus) ValidateRecord(r *kgo.Record, validRanges *TopicOffsetRanges) {
func (cs *ValidatorStatus) ValidateRecord(r *kgo.Record, validRanges *TopicOffsetRanges, compacted bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall this be a field on the ValidatorStatus struct instead? it doesn't change for the lifetime entire lifetime anyway why pass it as a param then?

@WillemKauf
Copy link
Contributor Author

WillemKauf commented Nov 14, 2024

Please make a PR to redpanda and check that it works as expect by pointing kgo at this commit sha before merging this PR. Otherwise if we merge and detect a problem we get a "head of line blocking" on making concurrent changes to kgo.

Ack, will do with an upcoming PR.

Suggestion: For testing compacted topics I think we can do more. kgo-verifier should keep track of the keys and and the end checkpoint the expected set of keys and associated values (last values produced for a key minus tombstones). Then during consumption we can track the same and compare at the end.

Yes, this is possible. Will push some follow up commits to this PR.

Offset gaps exist in compacted topics. Suppress a warning log
in the read workers if the user passes a `--compacted` flag
as a parameter.
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Move compacted flag to the ValidatorStatus struct
  • Add LatestValueProduced and LatestValueConsumed to the ProducerWorkerStatus and ValidatorStatus structs respectively. These nested maps track the latest (and expected) key-value pair per partition for compacted topic verification.
  • Because these are nested map structures, JSON output may grow very large. Alter the String() function for both ProducerWorkerStatus and ValidatorStatus so that they are excluded from output. There will have to be changes on the ducktape side in the redpanda repo to ensure this output isn't in the logs there as well.

WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 15, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters,
and `LatestValueProduced`/`LatestValueConsumed` as return values in
producer/worker status structs.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 15, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters,
and `LatestValueProduced`/`LatestValueConsumed` as return values in
producer/worker status structs.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 15, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters,
and `LatestValueProduced`/`LatestValueConsumed` as return values in
producer/worker status structs.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 18, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 18, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 18, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.

Also, use `errors='replace'` in `kafka_cat.py`, to avoid UTF-8 decoding issues
with randomly generated bytes in `kgo-verifier` records.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 18, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.

Also, use `errors='replace'` in `kafka_cat.py`, to avoid UTF-8 decoding issues
with randomly generated bytes in `kgo-verifier` records.
@WillemKauf WillemKauf changed the title Add --tombstone-probability and --compacted Compaction verification enhancements Nov 18, 2024
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 19, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.

Also, use `errors='replace'` in `kafka_cat.py`, to avoid UTF-8 decoding issues
with randomly generated bytes in `kgo-verifier` records.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 19, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.

Also, use `errors='replace'` in `kafka_cat.py`, to avoid UTF-8 decoding issues
with randomly generated bytes in `kgo-verifier` records.
@WillemKauf WillemKauf force-pushed the tombstones branch 2 times, most recently from 0d2c91a to df9a29a Compare November 19, 2024 15:05
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 19, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 19, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 19, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 21, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 21, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
For verification of compacted topics, we can track the last
expected key-value pair that will be seen after the log has been
fully compacted.
As a means to verify the results for a compacted topic. Use the flag
`--validate-latest-values` to trigger validation.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 21, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 21, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
@WillemKauf WillemKauf force-pushed the tombstones branch 2 times, most recently from 5ef813d to 65fa3e8 Compare November 22, 2024 14:39
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 22, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 22, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
nvartolomei
nvartolomei previously approved these changes Nov 25, 2024
pkg/worker/verifier/producer_worker.go Outdated Show resolved Hide resolved
And use it in the verifier workers.

In the case that the topic being consumed from had tombstones produced,
the high watermark may be given by a tombstone record that has been removed.
In trying to consume until this point, sequential readers will become stuck
polling for new records. Persist the last consumable offset in order to
adjust the offset we attempt to read up to in the read workers.
There was a race-y access to a map within the producer worker.
Add `OnAcked()` to allow for proper access of the producer's lock
during reading and writing.
@WillemKauf WillemKauf merged commit 27986ea into redpanda-data:main Nov 25, 2024
2 checks passed
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 25, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Nov 26, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Dec 3, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Dec 4, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Dec 4, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Dec 5, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Dec 6, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Dec 6, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
WillemKauf added a commit to WillemKauf/redpanda that referenced this pull request Dec 9, 2024
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.

(cherry picked from commit 85bea2b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants