-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lightning-liquidity
crate to the workspace
#3436
Add lightning-liquidity
crate to the workspace
#3436
Conversation
1cd0e69
to
510ae5d
Compare
AFAICT, remaining CI failures should be unrelated. |
510ae5d
to
d408ac6
Compare
Rebased on main to re-run CI with fixes. |
check_commits and linting are still failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran out of time to finish really digging into this, but at least have some doc requests and such.
Yup, both are expected. |
Sure, no worries! I'm generally happy to address any comments that come up during review, but would lean towards more or less only addressing anything that's needed to move the crate in this PR, and address anything that might be more substantial in follow-ups. |
1608102
to
dca99f4
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3436 +/- ##
==========================================
- Coverage 89.69% 89.44% -0.26%
==========================================
Files 130 149 +19
Lines 107335 116152 +8817
Branches 107335 116152 +8817
==========================================
+ Hits 96277 103894 +7617
- Misses 8658 9793 +1135
- Partials 2400 2465 +65 ☔ View full report in Codecov by Sentry. |
fe85637
to
8b9c6e0
Compare
Addressed all the pending feedback. |
8b9c6e0
to
4d20463
Compare
3a318f8
to
71212ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the new comments we can freely ignore and are more things to address later, but some of them probably are quite important.
71212ad
to
6f5875c
Compare
Responded to all pending comments and addressed most of them. |
@TheBlueMatt Let me know if I can squash fixups. |
Btw now moved all open issues from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to squash. I think I'm okay with landing this, AFAIU there's no remaining trivial "anyone can run us out of memory" vulnerabilities, but would like @johncantrell97 to take a glance at the set of changes and confirm that matches his understanding as well.
.. to which end we also need to make some additions to `debug_sync` and `nostd_sync`.
We add a size limit on the event queue, after which we'll just start dropping events to ensure we could never OOM. Additionally, we document the requirement that users need to handle generated events ASAP.
We introduce a new `MAX_PENDING_REQUESTS_PER_PEER` limit for the number of pending (get_info and buy) requests per peer.
.. which is a prefactor to also start checking the total number of pending requests in the next commit.
To this end we introduce a new counter keeping track of overall requests pending and reject inbound requests if they would put us over the limit.
We clean up any `get_info` request state when peers disconnect.
91c6f62
to
08fd499
Compare
Squashed fixups without further changes. |
Exciting for this to be upstreamed! @johncantrell97 is out currently, I went over the changes briefly. With regard to OOM vulnerabilities, it seems only limits on pending requests have been introduced i.e. requests that need to be handled by the server. Could memory not still increase with a constant throughput of handling requests?
Basically, we might need to limit the number of I also wonder if silently dropping events from |
Thanks for taking a look!
Yes, but to a degree that's unavoidable, at assuming least in the current protocol. Having users means having to keep some kind of state around, means increasing memory and/or storage requirements :)
Hm, that's a valid point, although I was hesitant to introduce a limit on this in this PR as we'll need to work out a larger strategy how we want to handle/when we want to drop the
Right, I don't think we can't do better than keeping them around as long as they are needed, and garbage collecting them eventually.
The number of
Yes, we hopefully never hit this, but timing out channels (which we need to develop a strategy, as mentioned above) or even force-closing individual channels is better than running out of memory and failing to service existing customers, losing funds, etc. FWIW, the limit on the event queue is really a last-resort limit that never should be reached.
Yes, LDK already does this at the network level, as well as limiting the number of peers. We could eventually think about adding overly spammy peers to the ignore list (which currently is only used for peer sending us bogus messages), but for the most part we should be good leaning on LDK here (or if we find that not, we probably want to improve it there). |
Given those limits are only applied in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking another look at the per-peer limit! I think that should cover all the DoS cases as far as I can tell.
I'm not sure about the interaction between ChannelManager
's limit in ChannelMessageHandler::peer_connected
, and the implementation here through CustomMessageHandler
(Does it require the former to be called first?). Will defer to Matt there.
Alright, I'm not convinced it's strictly necessary, but probably better safe than sorry here. Now added a commit adding such a 100k limit. |
19ddd84
to
c5c0d72
Compare
.. we clean up any pending buy requests that hit their `valid_until` time when the counterparty disconnects.
We're now also pruning any expired `OutboundJITChannels` if we haven't seen any related HTLCs.
In addition to pruning expired requests on peer disconnection we also regularly prune for all peers on block connection, and also remove the entire `PeerState` if it's empty after pruning (i.e., has no pending requsts or in-flight channels left).
We include any `OutboundJITChannel` that has not made it further than `PendingInitialPayment` in the per-peer request limit, and will of course prune it once it expires.
While LDK/`ChannelManager` should already introduce an upper-bound on the number of peers, here we assert that our `PeerState` map can't grow unboundedly. To this end, we simply return an `Internal error` and abort when we would hit the limit of 100000 peers.
c5c0d72
to
f68c6c5
Compare
Force-pushed with minor fixup to avoid warning with disabled > git diff-tree -U2 19ddd84 HEAD
diff --git a/lightning-liquidity/src/lsps2/utils.rs b/lightning-liquidity/src/lsps2/utils.rs
index 3de2d2eb5..8a085b76c 100644
--- a/lightning-liquidity/src/lsps2/utils.rs
+++ b/lightning-liquidity/src/lsps2/utils.rs
@@ -32,4 +32,5 @@ pub fn is_valid_opening_fee_params(
/// Determines if the given parameters are expired, or still valid.
+#[cfg_attr(not(feature = "std"), allow(unused_variables))]
pub fn is_expired_opening_fee_params(fee_params: &OpeningFeeParams) -> bool {
#[cfg(feature = "std")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna go ahead and land this to get it out of the way. A handful of doc nits are still here, and one Real DoS vuln that will need fixing before we can release an RC.
@@ -0,0 +1,50 @@ | |||
[package] | |||
name = "lightning-liquidity" | |||
version = "0.1.0-alpha.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to bump this to 0.1 with the rest of LDK in the next release or will we just bump to a new alpha?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question, it would of course be nice to tag this 0.1 to ensure lock-step with lightning
.
As far as I understood you correctly above, you consider the absence of service-side persistence a blocker to consider this 0.1? I tend to agree that it's a critical feature for the service-side, but we could still consider to tag this 0.1 w.r.t. to the client side? That is, we could either add doc comments making the current limitations clear and labeling service support as alpha/experimental (or even introduce an cfg(lsps2_service)
flag for now?)?
And, given we might have an extended beta phase for 0.1, I hope to have a PR up to add service-side persistence by the time we'd get around to the 0.1 release (IIUC currently targetting mid-/late January?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a super strong opinion, certainly happy to just say its 0.1 "but service-side is in beta". We also really should have fuzzers for this logic.
In terms of 0.1, no, I don't think its mid/late January? We intend to get the beta out this week and then probably release kinda whenever bindings are done. We'll probably have an 0.1.1 with some minor fixes but that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a super strong opinion, certainly happy to just say its 0.1 "but service-side is in beta".
Will add a respective note to the docs in #3493.
We also really should have fuzzers for this logic.
The idea was to write proptests for these, which we already kinda started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to write proptests for these, which we already kinda started.
proptests aren't sufficient for the same level of coverage. They're useful to get a different kind of coverage, though.
user_channel_id: jit_channel.user_channel_id, | ||
intercept_scid, | ||
}); | ||
self.pending_events.enqueue(event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true in quite a few places in this crate, but just noting it here - enque
ing an event on the queue calls all the way into user code with the Waker
notify. Its mostly probably fine cause the Waker
is actually tokio and it doesn't immediately poll the future, but it could, and if it does, holding a lock when it does so is pretty gross. Where possible we should drop locks before pushing an event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as mentioned above we'll want to do some more refactoring like this for the message queues, too. We probably should switch to a RAII-style Notifier
that does all the (event/message) queue flushing when dropped for us.
use lightning::ln::channelmanager::InterceptId; | ||
use lightning_types::payment::PaymentHash; | ||
|
||
/// Holds payments with the corresponding HTLCs until it is possible to pay the fee. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to somehow handle timeouts of HTLCs in the queue? Presumably I could send one HTLC to the LSP which times out in a few blocks, then wait until it times out, then send another for one more satoshi, causing the LSP to open the channel but there's not actually payment for them. After that, I assume the client can then use the non-intercept SCID to bypass fees. Its maybe not a huge deal cause the client can already effectively bypass fees by just rejecting the initial HTLC forwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to somehow handle timeouts of HTLCs in the queue?
We should already pop any HTLCs for which we receive the HTLCHandlingFailed
event and then try to forward the next one in line.
Presumably I could send one HTLC to the LSP which times out in a few blocks, then wait until it times out, then send another for one more satoshi, causing the LSP to open the channel but there's not actually payment for them. After that, I assume the client can then use the non-intercept SCID to bypass fees. Its maybe not a huge deal cause the client can already effectively bypass fees by just rejecting the initial HTLC forwards?
Yes, this is a general issue applicable to the LSP-trusts-client flow. It can only mitigated by the LSP delaying broadcasting the funding transaction until they're positive the client claimed all parts required to pay for the channel open.
@@ -0,0 +1,685 @@ | |||
#![cfg(test)] | |||
// TODO: remove these flags and unused code once we know what we'll need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, I'd like to wait until we added LSPS1 service work and possibly finished the LSPS2 integration test flow (note it currently only test up until the invoice generation step, not the channel opening/payment flow). However, the latter might additionally(?) get tested (and easier) in LDK Node integration tests as part of adding service-side support there.
We upstream the
lightning-liquidity
crate into therust-lightning
workspace. First, all source files are copied over as per current
main
of thelightning-liquidity
repository (commit c80eb75f5a31bea5c2b73e41c50ca382ec0020f8).Then, fixup commits adjust the crate to the current state of
rust-lightning
, and add it to our local CI script.Once this is merged I'll transfer any relevant open issues from https://github.com/lightningdevkit/lightning-liquidity and archive the repo.