Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bookie 99% ledger disk usage #4100

Closed
mnit016 opened this issue Oct 8, 2023 · 4 comments
Closed

Bookie 99% ledger disk usage #4100

mnit016 opened this issue Oct 8, 2023 · 4 comments
Labels

Comments

@mnit016
Copy link

mnit016 commented Oct 8, 2023

BUG REPORT

Somehow 1 bookie in cluster is full ledger disk and turned to read-only mode
Hi there, I'm facing a issue in Pulsar-2.9.5, bookie ledger usage increased to 99.9%. Similar with #1908, but I can't see any ledger information in error logs there to do clean up.
image

To Reproduce
<N/A>

Expected behavior
How could I get over this issue? any workaround / solution for this?

Screenshots
Through Splunk, from several days ago, a lot of "Entering Safepoint region..." "Leaving safepoint region..." appear in logs
After a day, I found this "Exception ledger flush" / "Error in Rocksdb put" starting
image
And 4 hours later, it turned to "Error during flush"
image
Above logs happen until "Ledger directory ... is out-of-space" and continuos
image

Additional context
Pulsar 2.9.5 - K8s 1.26.3

@mnit016
Copy link
Author

mnit016 commented Oct 9, 2023

Updated:
I've just found a loop in logs, the frequency of creating new log file was getting faster and faster, until the bookie ledger disk all full.

SingleDirectoryDbLedgerStorage - Write cache is full, triggering flush
SyncThead - Exception flushing ledgers
RocksDBException: while fdatasync ..../current/ledgers/000xxx.log: Resource temporarily unavailable

After that, several logs like below:

EntryLogManagerBase - Creating a new entry log file : createNewLog = false, reachEntryLogLimit = true
EntryLogManagerBase - Flusing entry logger xxxxx back to filesystem, pending for syncing entry loggers : [....]
EntryLoggerAllocator - Created new entry log file .../ledgers/current/xxxx.log for logId xxxxx
...
SyncThead - Exception flushing ledgers
RocksDBException: while fdatasync ..../current/ledgers/000xxx.log: Resource temporarily unavailable

trying to figure out which topic's ledger was unavailable all the time.

@mnit016
Copy link
Author

mnit016 commented Oct 9, 2023

I can't see the mentioned file at below log in Bookie Storage anywhere

RocksDBException: while fdatasync ....bookkeeper/ledgers/current/ledgers/000xxx.log: Resource temporarily unavailable

@mnit016
Copy link
Author

mnit016 commented Oct 9, 2023

They're more than 5000 files as below in /bookkeeper/ledgers/current/*.log
image

@mnit016
Copy link
Author

mnit016 commented Oct 10, 2023

might be something wrong with retention data.
Reduce the retention config solved the problem, bookie back to normal

@mnit016 mnit016 closed this as completed Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant