Releases: redpanda-data/redpanda
Releases · redpanda-data/redpanda
v24.2.14
Bug Fixes
- Fixes a bug in which a segment being rolled and closed could race, leading to a triggered
vassert
. by @WillemKauf in #24559
Improvements
- Added metrics for pandaproxy resource usage. by @IoannisRP in #24604
- Show leader id in
/v1/cluster/partitions
response. by @ztlpn in #24584
Full Changelog: v24.2.13...v24.2.14
v24.3.2
Features
- Improve the user messages when the
topic_partitions_reserve_shard0
cluster config is used and a user tries to create a topic with more partitions than the core-based partition limit. by @pgellert in #24461
Bug Fixes
- Ensure
redpanda_cloud_storage_cloud_log_size
metric consistent across all replicas. We used to update it seldomly from the leader replica only which lead to inconsistent/stale values. by @nvartolomei in #24364 - Fixed a bug in which sliding window compaction may become stuck on failing to build an index map for a single segment. by @WillemKauf in #24424
- Fixes a bug in which a segment being rolled and closed could race, leading to a triggered
vassert
. by @WillemKauf in #24560 - Fixes a bug in which segments which may have tombstones in them were not considered eligible for self-compaction. by @WillemKauf in #24500
- Fixes a bug that could prevent topic recovery on ABS object storage when there are objects in a bucket from multiple clusters (e.g. following a whole cluster restore). by @andrwng in #24455
- Fixes a bug where
rpk
wasn't parsing--help
when used alongside--redpanda-id
inrpk cloud <provider> byoc apply
by @r-vasquez in #24396 - Fixes a bug where serializing manifests for Iceberg topics with decimal fields could cause Redpanda to crash or upload invalid manifests by @oleiman in #24467
- Fixes a crash resulting from incorrect cleanup of log readers used for iceberg translation. by @bharathv in #24576
- Fixes a race that could prevent Iceberg translation from happening following a leadership change. by @andrwng in #24562
- Fixes accounting of iceberg commit lag metric that can remain erroneously high in some cases even though the translation if fully caught up. Additionally the change ensures that only partition leaders emit lag metrics while followers emit 0 lag. by @bharathv in #24575
- If a discrete disk is used for cloud storage cache Redpanda previously rejected writes if that disk (cache disk) was full (in degraded state). This is incorrect since the cache disk isn't in the way of writes. From now on, reject writes only if the data disk is full (in degraded state). by @nvartolomei in #24486
- #24428 Schema Registry: fixes a bug in the Avro compatibility check reader_field_missing_default_value where it was too lenient for missing default values of null-able types. by @pgellert in #24430
- #24587 Redpanda will now permit topics to be created with
redpanda.remote.[read|write]
set totrue
when a license is expired or missing provided that the cluster configcloud_storage_enabled
is set tofalse
. by @michael-redpanda in #24588
Improvements
- Adds additional debug log messages in the datalake coordinator regarding files to be committed to Iceberg. by @andrwng in #24563
- Beta version of Iceberg support was incorrectly classified as "enterprise only". by @oleiman in #24443
- Leader balancer: don't treat each core as independent and balance total number of leaders on each node as well. by @ztlpn in #24440
- Show leader id in
/v1/cluster/partitions
response. by @ztlpn in #24585 - #24539 Disable datalake services in recovery mode by @ztlpn in #24549
rpk topic describe
now supports the--format
flag to display the output in either JSON or YAML. by @r-vasquez in #24438
Full Changelog: v24.3.1...v24.3.2
v24.2.13
Features
- Improve the user messages when the
topic_partitions_reserve_shard0
cluster config is used and a user tries to create a topic with more partitions than the core-based partition limit. by @pgellert in #24462
Bug Fixes
- Ensure
redpanda_cloud_storage_cloud_log_size
metric consistent across all replicas. We used to update it seldomly from the leader replica only which lead to inconsistent/stale values. by @nvartolomei in #24365 - Fixes a bug that could prevent topic recovery on ABS object storage when there are objects in a bucket from multiple clusters (e.g. following a whole cluster restore). by @andrwng in #24454
- Fixes a bug where
rpk
wasn't parsing--help
when used alongside--redpanda-id
inrpk cloud <provider> byoc apply
by @r-vasquez in #24397 - If a discrete disk is used for cloud storage cache Redpanda previously rejected writes if that disk (cache disk) was full (in degraded state). This is incorrect since the cache disk isn't in the way of writes. From now on, reject writes only if the data disk is full (in degraded state). by @nvartolomei in #24484
- #24431 Schema Registry: fixes a bug in the Avro compatibility check reader_field_missing_default_value where it was too lenient for missing default values of null-able types. by @pgellert in #24432
- PR #24200 [v24.2.x] cst/cache: fix use-after-move caused by calling get_exception twice by @nvartolomei
- PR #24329 [v24.2.x] Fixed race condition between appends and prefix truncation by @mmaslankaprv
- PR #24335 rm_stm: remove always true assert on transaction_ga feature by @bharathv
- PR #24349 [v24.2.x] c/balancer_planner: check if topic exists in node count map by @mmaslankaprv
- PR #24372 [v24.2.x] c/controller_backend: allow
shutdown_partition
to fail on app shutdown by @bashtanov - PR #24459 [v24.2.x] raft/c: fix an indefinite hang in transfer leadership by @bharathv
Full Changelog: v24.2.12...v24.2.13
v24.3.1
Features
- Added support for Iceberg Topics (various improvements below)
- New REST API for mounting/unmounting topics by @mmaslankaprv in #23167
- adds rpk cluster storage topic mount, unmount, list-mount, status-mount, cancel-mount by @gene-redpanda in #23575
- Add leadership pinning: ability to set preferred racks for topic partition leaders. To configure, set
redpanda.leaders.preference
topic config property ordefault_leaders_preference
cluster config property. by @ztlpn in #23691 - Enable
node_local_core_assignment
feature by default by @ztlpn in #23453 - Adds Schema Registry support for the JavaScript Data Transforms SDK by @oleiman in #21491
- Adds list-mountable to allow listing mountable topics by @gene-redpanda in #23924
- Adds the topic property
delete.retention.ms
, as well as the cluster propertytombstone_retention_ms
. Configuring these allow for the removal of tombstone records in compacted topics with tiered storage disabled inredpanda
. by @WillemKauf in #23662 - Schema Registry: Support
normalize=true
by @BenPope in #22519 - Schema Registry: added support for the "verbose" query parameter on the schema compatibility checker endpoint by @pgellert in #22877
- Schema Registry: verbose compatibility error reporting is now supported for JSON as well by @pgellert in #23208
- #17984 Adds a new broker configuration transaction_max_timeout_ms. The configuration controls the maximum allowed user set timeout for transactions. If a client requested transaction timeout exceeds this configuration, the broker will return
an error during transactional producer initialization. This guardrail prevents hanging transactions from blocking consumer progress. The default value is 15mins. by @bharathv in #21504 - rpk: Add
rpk registry mode
to manage the schema registry mode. by @r-vasquez in #22675 - rpk: supports triggering on-demand partition balancer by @daisukebe in #22855
- Added support for using PKCS#12 files for TLS services by @michael-redpanda in #21313
- Adds admin API endpoint for enterprise feature info
GET /v1/features/enterprise
by @oleiman in #23314 - A new metric (cluster_features_enterprise_license_expiry_sec) is added for easier monitoring of the enterprise license's expiry time. by @pgellert in #23367
- After the cluster is first formed, a trial license is automatically loaded to provide an evaluation period of enterprise features. by @pgellert in #23893
Improvements
- --regex flag in
rpk topic describe
now supports internal topics. by @r-vasquez in #23487 - A number of optimizations to local storage compaction. by @WillemKauf in #23380
- Add an LRU caching layer to Rust transform SDK Schema Registry client by @oleiman in #19859
- Add support for differentiating tombstone records from empty-string value records in
rpk produce
andrpk consume
. by @WillemKauf in #23264 - Added support for Metadata API v8 by @michael-redpanda in #22669
- Added vectorized_kafka_rpc_connections_rejected_rate_limit metric which counts incoming Kafka connections rejected due to the connection rate limit (if set), analogously to the existing vectorized_kafka_rpc_connections_rejected metric which counts rejected connections due to the hitting the open connection limit. by @travisdowns in #22803
- Adds a shard label to some consumer group metrics. by @ballard26 in #23339
- Adds support for setting schema registry connection parameters in the
rpk
stanza ofredpanda.yaml
. by @andrewstucki in #24017 - Adds the
cloud_storage_backend::oracle
value, and helps thes3_client
properly configure for OCI storage. by @WillemKauf in #22902 - Adds the ability to configure Node UUID and ID overrides at broker startup. by @oleiman in #22972
- Allow
rpk cluster self-test start
to run, even in a cluster with mixed versions ofredpanda
(before and aftercloudcheck
addition in24.2.x
). by @WillemKauf in #21370 - Allows
DeleteRecords
requests from Kafka clients orrpk topic trim-prefix
to be called withtruncation_offset <= start_offset
without returning an error. The request is instead treated as a no-op. by @WillemKauf in #22905 - Allows the self-test to be completely compatible with a mixed version cluster, in the case of a rolling upgrade. by @WillemKauf in #22831
- Deprecate
leader_balancer_mode
cluster config property. by @ztlpn in #23780 - Implements
@redpanda-data/transform-sdk-sr.SchemaFormat
for the WASM Transforms JS module by @oleiman in #23164 - Improve handling of boolean property values during a
CreateTopics
request by making parsing case-insensitive. by @WillemKauf in #23682 - Improve handling of boolean values during a
CreateTopics
request by no longer silently ignoring an invalid value, instead throwing a configuration error. by @WillemKauf in #23682 - Improve handling of certain invalid topic configuration parameters that would lead to a timeout failure instead of a graceful error code during a
CreateTopics
request. by @WillemKauf in #23682 - Improve property configuration descriptions. by @Deflaimun in #23347
- Minimizes data loss in recovery scenarios by @mmaslankaprv in #24071
- Reduce the memory overhead of many small segments. by @rockwotj in #22962
- Return core assignments from health report in
/v1/cluster/partitions
admin API output. by @ztlpn in #22695 - Schema Registry: 5 new compatibility checks are added for protobuf (ONEOF_FIELD_REMOVED, MULTIPLE_FIELDS_MOVED_TO_ONEOF, REQUIRED_FIELD_{ADDED,REMOVED}, FIELD_NAMED_TYPE_CHANGED, MESSAGE_REMOVED) by @pgellert in #22798
- Schema Registry: Improve AVRO Normalization by @BenPope in #22519
- Schema Registry: now reports more specific error messages for Avro and Protobuf schemas when they are incompatible with earlier schemas. by @pgellert in #22958
- Set the default value of
topic_partitions_reserve_shard0
to zero. This means that we no longer weight shard 0 as if it has 2 more partitions than it actually has, leading to more even partition distribution in cases where the total number of partitions is close to the vCPU count. by @travisdowns in #22841 - The command line is now printed to the log at startup by the Redpanda process. by @travisdowns in #22826
- Upgrade data transforms tinygo compiler to version 0.34.0 by @rockwotj in #23969
- #17682 Schema Registry: Remove spurious log entry:
No syntax specified for the proto file
by @BenPope in #22633 - #21536
rpk topic describe-storage
can be used now with internal topics. by @r-vasquez in #22338 - #22333 rpk debug bundle: include the result of
uname -a
by @JFlath in #22334 - #22666 Allows users to query the value of a cluster property with
rpk cluster config get
using either the original property name, or any of its aliases. Whereas before,rpk cluster config get
using a property's aliased name would return aProperty {} not found
result. by @WillemKauf in #22674 - [#23038](https://github.com/redpanda-dat...
v24.2.12
Bug Fixes
- Fixed an issue where creating a topic with a huge number of partitions could lead to a crash. by @IoannisRP in #24232
Improvements
- Schema Registry: Add Some metrics for resource usage taken by in-memory schemas by @BenPope in #24270
Full Changelog: v24.2.11...v24.2.12
v24.2.11
Bug Fixes
- Construct audit metrics probe during service initialization to prevent null pointer access. by @michael-redpanda in #24127
- Fixed an issue where creating a topic with a huge number of partitions could lead to a crash. by @IoannisRP in #24232
- Fixes a bug in which upload candidates made from segments with missing batches would trigger metadata related errors in the
ntp_archiver_service
, due to assigned start offsets being lower than they should be. by @WillemKauf in #24106 - #24076 Fixes a rare bug during remote partition manifest downloads where broken pipe exceptions weren't retried in an edge case. by @pgellert in #24080
- #24144 This fixes a bug in the audit client where if the cluster config value
kafka_batch_max_bytes
was greater thanaudit_client_max_buffer_size
, the audit client ends up not producing any messages and becomes stuck filling up the audit log buffers. by @pgellert in #24148 - #24207 Redpanda neglected to include ECDSA based ciphers in the cipher strings used for TLSv1.2 and below. This caused TLS connections that used ECDSA based certificates to fail cipher negotiation when using TLSv1.2 and below. ECDSA ciphers are now in the list of supported ciphers. by @michael-redpanda in #24209
Full Changelog: v24.2.10...v24.2.11
v24.1.18
Features
- #23454 A new metric (cluster_features_enterprise_license_expiry_sec) is added for easier monitoring of the enterprise license's expiry time. by @pgellert in #23467
- #23760 Adds admin API endpoint for enterprise feature info
GET /v1/features/enterprise
by @oleiman in #23761
Bug Fixes
- Construct audit metrics probe during service initialization to prevent null pointer access. by @michael-redpanda in #24128
- Fixes a bug in which upload candidates made from segments with missing batches would trigger metadata related errors in the
ntp_archiver_service
, due to assigned start offsets being lower than they should be. by @WillemKauf in #24105 - Fixes a bug where only a group static member's protocols would be updated on rejoin, even if more properties had been passed to the rejoin command by @IoannisRP in #23733
- #23863 Fixes a bug where audit log manager would retry a bad request forever, causing buffers to fill up, blocking audit log appends and preventing authZ. by @oleiman in #23868
- #23930 Ignore heartbeat requests/replies to/from unexpected node ids. by @ztlpn in #23934
- #24056 Cleanup tiered storage temporary cache file if exceptions are thrown during download. by @nvartolomei in #24064
- #24077 Fixes a rare bug during remote partition manifest downloads where broken pipe exceptions weren't retried in an edge case. by @pgellert in #24079
- #24143 This fixes a bug in the audit client where if the cluster config value
kafka_batch_max_bytes
was greater thanaudit_client_max_buffer_size
, the audit client ends up not producing any messages and becomes stuck filling up the audit log buffers. by @pgellert in #24149
Improvements
- --regex flag in
rpk topic describe
now supports internal topics. by @r-vasquez in #23605 - Adds a shard label to some consumer group metrics. by @ballard26 in #23626
- #23404 Adds the ability to configure Node UUID and ID overrides at broker startup. by @oleiman in #23412
- fixed large allocation in Raft implementation by @mmaslankaprv in #24009
- rpk:
redpanda admin brokers list
exposes Host/Port/Rack/UUID additionally by @daisukebe in #23688 - PR #23414 [v24.1.x]
archival
: uselog_level_for_error()
for failed reupload candidates by @WillemKauf - PR #23450 [v24.1.x]
storage
: catchss::gate_closed_exception
inlog_manager
(manual backport) by @WillemKauf - PR #23501 [v24.1.x] tests/failure_injector: undo the failures on exit by @bashtanov
- PR #23506 [v24.1.x] cluster_recovery_backend_test: only reset relevant config by @andrwng
- PR #23515 [v24.1.x] rptest: produce more data in FullDiskReclaimTest to trigger gc conditions by @nvartolomei
- PR #23527 [v24.1.x] CORE-7689 dt/rp_installer: ensure cache directory exists by @pgellert
- PR #23537 [v24.1.x] rptest: do not expect cached segment readers at the end of the test by @nvartolomei
- PR #23555 [v24.1.x] ssx: exit early from sleep_abortable if already aborted by @nvartolomei
- PR #23608 [v24.1.x] tests: fix rpk generate test by @r-vasquez
- PR #23646 [24.1.x] tests: bump ducktape to latest of 0.11.x by @ivotron
- PR #23674 [v24.1.x] tests: test legacy dashboard in rpk generate by @r-vasquez
- PR #23708 [v24.1.x] gha: rm use of rp_storage_tool_uploader by @andrewhsu
- PR #23746 [v24.1.x] Keep producer inflight requests queue bounded by @mmaslankaprv
- PR #23766 [v24.1.x] gha: fix pip install on python actions by @ivotron
- PR #23818 [v24.1.x] rpk: debug bundle collecting broker UUIDs by @daisukebe
- PR #23831 [v24.1.x] rpk: introduce license warnings messages by @r-vasquez
- PR #23849 [v24.1.x] [CORE-7957] tests: wait for license information comparisons by @r-vasquez
- PR #23861 [v24.1.x] kafka: oversized alloc in list_offsets_topic by @IoannisRP
- PR #23866 [v24.1.x] [CORE-7719] Add has_valid_license & has_enterprise_features to phone-home metrics by @oleiman
- PR #23923 [v24.1.x] rpk: fill schema registry information in cloud profiles by @r-vasquez
- PR #23965 [v24.1.x] [DEVEX-36] rpk: change expiry check for free_trial by @r-vasquez
- PR #23988 [v24.1.x] rpk: fix printing new lines by @r-vasquez
- PR #24015 [v24.1.x] metrics: Add list of enterprise features to call-home POST by @oleiman
- PR #24025 [v24.1.x] transform-sdk/go/tests: remove -quiet flag by @rockwotj
- PR #24044 [v24.1.x] [CORE-8141] Add host information to metrics report by @michael-redpanda
- PR #24051 [v24.1.x] storage: housekeeping metrics by @nvartolomei
- PR #24062 [v24.1.x] [CORE-1478] rptest: fix retention value in
archive_retention_test
by @WillemKauf - PR #24070 [v24.1.x] rptest: reduce cache eviction throttling for space leak test by @nvartolomei
- PR #24089 [v24.1.x] storage: remove assertion on
is_cloud_retention_active
by @ballard26
Full Changelog: v24.1.17...v24.1.18
v24.2.10
Bug Fixes
- #24057 Cleanup tiered storage temporary cache file if exceptions are thrown during download. by @nvartolomei in #24063
Full Changelog: v24.2.9...v24.2.10
v24.2.9
Improvements
- Adds a shard label to some consumer group metrics. by @ballard26 in #23627
- fixed large allocation in Raft implementation by @mmaslankaprv in #24010
- PR #24012 [v24.2.x] metrics: Add list of enterprise features to call-home POST by @oleiman
- PR #24027 [v24.2.x] transform-sdk/go/tests: remove -quiet flag by @rockwotj
- PR #24045 [v24.2.x] [CORE-8141] Add host information to metrics report by @michael-redpanda
- PR #24048 [v24.2.x] storage: housekeeping metrics by @nvartolomei
Full Changelog: v24.2.8...v24.2.9
v24.2.8
Bug Fixes
- #23846 fixes the term of recovered topic moving backward by @mmaslankaprv in #23879
- #23864 Fixes a bug where audit log manager would retry a bad request forever, causing buffers to fill up, blocking audit log appends and preventing authZ. by @oleiman in #23867
- #23928 Ignore heartbeat requests/replies to/from unexpected node ids. by @ztlpn in #23933
Improvements
rpk registry schema get
will now fail if is incorrectly invoked using the--print-schema
and--format
flag. by @r-vasquez in #23901rpk cluster license info
now includes more details about the license, including:- Whether your loaded license is expired. By @r-vasquez in #23636
- License status and possible violations. By @r-vasquez in #23744
Full Changelog: v24.2.7...v24.2.8