Probabilistic channel scoring #1227

jkczyz · 2022-01-05T05:52:37Z

Add a Score implementation based on Optimally Reliable & Cheap Payment Flows on the Lightning Network by Rene Pickhardt and Stefan Richter [1].

Given the uncertainty of channel liquidity balances, probability distributions are defined based on knowledge learned from successful and unsuccessful attempts. Then the negative log of the success probability is used to determine the cost of routing a specific HTLC amount through a channel.

Knowledge about channel liquidity balances takes the form of upper and lower bounds on the possible liquidity. Certainty of the bounds is decreased over time using a decay function.

Resolves #1170.
Resolves #1172.

TheBlueMatt · 2022-01-05T22:48:57Z

lightning/src/routing/router.rs

 	// The RouteHintHop fields which will eventually be used if this hop is used in a final Route.
 	// Note that node_features is calculated separately after our initial graph walk.
 	node_id: NodeId,
 	short_channel_id: u64,
-	channel_features: &'a ChannelFeatures,
+	channel_features: ChannelFeatures,


Holy malloc-traffic-in-a-tight-loop Batman! Somewhat astoundingly to me this doesn't actually seem to slow things down much, but I'd still strongly prefer to avoid the additional malloc traffic if possible. This does so and drops further fields on this struct too, which is nice:

diff --git a/lightning/src/routing/router.rs b/lightning/src/routing/router.rs index 14844171e..a822a68d7 100644 --- a/lightning/src/routing/router.rs +++ b/lightning/src/routing/router.rs @@ -331,6 +331,7 @@ impl cmp::PartialOrd for RouteGraphNode { /// A wrapper around the various hop representations. /// /// Used to construct a [`PathBuildingHop`] and to estimate [`EffectiveCapacity`]. +#[derive(Clone, Debug)] enum CandidateRouteHop<'a> { /// A hop from the payer to the next hop to the payee, where the outbound liquidity is known. FirstHop { @@ -409,22 +410,19 @@ impl<'a> CandidateRouteHop<'a> { /// Fee values should be updated only in the context of the whole path, see update_value_and_recompute_fees. /// These fee values are useful to choose hops as we traverse the graph "payee-to-payer". #[derive(Clone, Debug)] -struct PathBuildingHop { +struct PathBuildingHop<'a> { // The RouteHintHop fields which will eventually be used if this hop is used in a final Route. // Note that node_features is calculated separately after our initial graph walk. node_id: NodeId, short_channel_id: u64, - channel_features: ChannelFeatures, + candidate: CandidateRouteHop<'a>, fee_msat: u64, - cltv_expiry_delta: u32, /// Minimal fees required to route to the source node of the current hop via any of its inbound channels. src_lowest_inbound_fees: RoutingFees, - /// Fees of the channel used in this hop. - channel_fees: RoutingFees, /// All the fees paid *after* this channel on the way to the destination next_hops_fee_msat: u64, - /// Fee paid for the use of the current channel (see channel_fees). + /// Fee paid for the use of the current channel (see candidate.fees()). /// The value will be actually deducted from the counterparty balance on the previous link. hop_use_fee_msat: u64, /// Used to compare channels when choosing the for routing. @@ -432,10 +430,6 @@ struct PathBuildingHop { /// an estimated cost of reaching this hop. /// Might get stale when fees are recomputed. Primarily for internal use. total_fee_msat: u64, - /// This is useful for update_value_and_recompute_fees to make sure - /// we don't fall below the minimum. Should not be updated manually and - /// generally should not be accessed. - htlc_minimum_msat: u64, /// A mirror of the same field in RouteGraphNode. Note that this is only used during the graph /// walk and may be invalid thereafter. path_htlc_minimum_msat: u64, @@ -459,11 +453,11 @@ struct PathBuildingHop { // Instantiated with a list of hops with correct data in them collected during path finding, // an instance of this struct should be further modified only via given methods. #[derive(Clone)] -struct PaymentPath { - hops: Vec<(PathBuildingHop, NodeFeatures)>, +struct PaymentPath<'a> { + hops: Vec<(PathBuildingHop<'a>, NodeFeatures)>, } -impl PaymentPath { +impl<'a> PaymentPath<'a> { // TODO: Add a value_msat field to PaymentPath and use it instead of this function. fn get_value_msat(&self) -> u64 { self.hops.last().unwrap().0.fee_msat @@ -514,7 +508,7 @@ impl PaymentPath { // set it too high just to maliciously take more fees by exploiting this // match htlc_minimum_msat logic. let mut cur_hop_transferred_amount_msat = total_fee_paid_msat + value_msat; - if let Some(extra_fees_msat) = cur_hop.htlc_minimum_msat.checked_sub(cur_hop_transferred_amount_msat) { + if let Some(extra_fees_msat) = cur_hop.candidate.htlc_minimum_msat().checked_sub(cur_hop_transferred_amount_msat) { // Note that there is a risk that *previous hops* (those closer to us, as we go // payee->our_node here) would exceed their htlc_maximum_msat or available balance. // @@ -542,7 +536,7 @@ impl PaymentPath { // Irrelevant for the first hop, as it doesn't have the previous hop, and the use of // this channel is free for us. if i != 0 { - if let Some(new_fee) = compute_fees(cur_hop_transferred_amount_msat, cur_hop.channel_fees) { + if let Some(new_fee) = compute_fees(cur_hop_transferred_amount_msat, cur_hop.candidate.fees()) { cur_hop.hop_use_fee_msat = new_fee; total_fee_paid_msat += new_fee; } else { @@ -870,18 +864,15 @@ where L::Target: Logger { PathBuildingHop { node_id: $dest_node_id.clone(), short_channel_id: 0, - channel_features: $candidate.features(), + candidate: $candidate.clone(), fee_msat: 0, - cltv_expiry_delta: 0, src_lowest_inbound_fees: RoutingFees { base_msat: fee_base_msat, proportional_millionths: fee_proportional_millionths, }, - channel_fees: $candidate.fees(), next_hops_fee_msat: u64::max_value(), hop_use_fee_msat: u64::max_value(), total_fee_msat: u64::max_value(), - htlc_minimum_msat: $candidate.htlc_minimum_msat(), path_htlc_minimum_msat, path_penalty_msat: u64::max_value(), was_processed: false, @@ -977,11 +968,8 @@ where L::Target: Logger { old_entry.total_fee_msat = total_fee_msat; old_entry.node_id = $dest_node_id.clone(); old_entry.short_channel_id = short_channel_id; - old_entry.channel_features = $candidate.features(); + old_entry.candidate = $candidate.clone(); old_entry.fee_msat = 0; // This value will be later filled with hop_use_fee_msat of the following channel - old_entry.cltv_expiry_delta = $candidate.cltv_expiry_delta(); - old_entry.channel_fees = $candidate.fees(); - old_entry.htlc_minimum_msat = $candidate.htlc_minimum_msat(); old_entry.path_htlc_minimum_msat = path_htlc_minimum_msat; old_entry.path_penalty_msat = path_penalty_msat; #[cfg(any(test, feature = "fuzztarget"))] @@ -1287,12 +1275,10 @@ where L::Target: Logger { // so that fees paid for a HTLC forwarding on the current channel are // associated with the previous channel (where they will be subtracted). ordered_hops.last_mut().unwrap().0.fee_msat = new_entry.hop_use_fee_msat; - ordered_hops.last_mut().unwrap().0.cltv_expiry_delta = new_entry.cltv_expiry_delta; ordered_hops.push((new_entry.clone(), NodeFeatures::empty())); } ordered_hops.last_mut().unwrap().0.fee_msat = value_contribution_msat; ordered_hops.last_mut().unwrap().0.hop_use_fee_msat = 0; - ordered_hops.last_mut().unwrap().0.cltv_expiry_delta = final_cltv_expiry_delta; log_trace!(logger, "Found a path back to us from the target with {} hops contributing up to {} msat: {:?}", ordered_hops.len(), value_contribution_msat, ordered_hops); @@ -1463,7 +1449,7 @@ where L::Target: Logger { // Now, substract the overpaid value from the most-expensive path. // TODO: this could also be optimized by also sorting by feerate_per_sat_routed, // so that the sender pays less fees overall. And also htlc_minimum_msat. - cur_route.sort_by_key(|path| { path.hops.iter().map(|hop| hop.0.channel_fees.proportional_millionths as u64).sum::<u64>() }); + cur_route.sort_by_key(|path| { path.hops.iter().map(|hop| hop.0.candidate.fees().proportional_millionths as u64).sum::<u64>() }); let expensive_payment_path = cur_route.first_mut().unwrap(); // We already dropped all the small channels above, meaning all the // remaining channels are larger than remaining overpaid_value_msat. @@ -1481,16 +1467,21 @@ where L::Target: Logger { drawn_routes.sort_by_key(|paths| paths.iter().map(|path| path.get_total_fee_paid_msat()).sum::<u64>()); let mut selected_paths = Vec::<Vec<Result<RouteHop, LightningError>>>::new(); for payment_path in drawn_routes.first().unwrap() { - selected_paths.push(payment_path.hops.iter().map(|(payment_hop, node_features)| { + let mut path = payment_path.hops.iter().map(|(payment_hop, node_features)| { Ok(RouteHop { pubkey: PublicKey::from_slice(payment_hop.node_id.as_slice()).map_err(|_| LightningError{err: format!("Public key {:?} is invalid", &payment_hop.node_id), action: ErrorAction::IgnoreAndLog(Level::Trace)})?, node_features: node_features.clone(), short_channel_id: payment_hop.short_channel_id, - channel_features: payment_hop.channel_features.clone(), + channel_features: payment_hop.candidate.features(), fee_msat: payment_hop.fee_msat, - cltv_expiry_delta: payment_hop.cltv_expiry_delta, + cltv_expiry_delta: payment_hop.candidate.cltv_expiry_delta(), }) - }).collect()); + }).collect::<Vec<_>>(); + let mut prev_cltv_expiry_delta = final_cltv_expiry_delta; + for hop in path.iter_mut().rev() { + core::mem::swap(&mut prev_cltv_expiry_delta, &mut hop.as_mut().unwrap().cltv_expiry_delta); + } + selected_paths.push(path); } if let Some(features) = &payee.features {

Heh, thought there might be some concern over that. Thanks for the workaround! Will incorporate it.

Note that you can also drop the short_channel_id field trivially. We should also be able to drop the node_id field, even, given that its all findable, but its a bit more work (and probably involves dropping the use of the DirectedChannelInfo constructors entirely, or at least moving the NodeId into the DirectionalChannelInfos instead of being outside of them in the ChannelInfo

Any chance you could edit the patch to use tabs for indentation? Probably a copy/paste issue.

Sure, see https://git.bitcoin.ninja/index.cgi?p=rust-lightning;a=log;h=refs/heads/2022-01-1227-router-fasters

Note that you can also drop the short_channel_id field trivially.

Looks like your most recent version did so. 🙂

We should also be able to drop the node_id field, even, given that its all findable, but its a bit more work (and probably involves dropping the use of the DirectedChannelInfo constructors entirely, or at least moving the NodeId into the DirectionalChannelInfos instead of being outside of them in the ChannelInfo

Haven't taken a look at this in detailed, but note that DirectedChannelInfo has accessors for the source and target NodeIds.

Haven't taken a look at this in detailed, but note that DirectedChannelInfo has accessors for the source and target NodeIds.

Right, I guess we could store the DirectedChannelInfo directly in the CandidateRouteHop::PublicHop variant, but for now its not there and there's no easy way to get at the NodeId because it loses direction information. I hacked up a solution to this to see and it wasn't really any faster despite saving the 32 bytes. I think there's a direction there but I may have added more branching which made things worse enough that saving the memcpys didn't help.

Ah, yeah.. I think I couldn't get that to work when returning the DirectedChannelInfo from a closure. Probably just need to have the variant to hold the ChannelInfo reference, too. But I also liked that direction was not an Option.

Maybe not worth worrying about now.

TheBlueMatt

All generally pretty straightforward once I wrapped my head around the type indirection, which I always find a bit confusing, but its not clear how to cut down the size of the patch much more without making it harder to reason about the logic.

lightning/src/routing/network_graph.rs

TheBlueMatt · 2022-01-06T02:49:17Z

lightning/src/routing/scoring.rs

+enum Probability {
+	Zero,
+	One,
+	Ratio { numerator: u64, denominator: u64 },


Why not just store an f64 given its gonna be divided at the callsite? I'm not really sure why we need a struct here at all, f64 is pretty well understood for probability and you can still check for equality with zero and one without loss of precision.

Had been considering if having it in this form would be useful for approximating the log as per #1166 (comment). Basically, avoid the division and approximate the log of the numerator and denominator separately and taking the difference. Though, admittedly, the approximation technique wasn't very clear to me from that comment. Could use f64 if you prefer.

Right, fair, avoiding the FPU in the inner loop of the router may be useful, but given this is an internal API so we can change it later, I don't see why we should eat the complexity now vs just adding it later if we want to do some approximation. Sadly our benchmarks for routing currently don't hit the scoring code much because the scorer is uninitialized initially. Doesn't need to happen in this PR, but ideally we'd add the scorer to the benchmark(s).

Leaving in for now to try and update the benchmark in parallel with review, but happy to remove it before merge if needed.

while the approximation of the negative log probability is just this mentioned fraction all my tests show that this linear approximation is not too useful as it (similarly with the linear fee rate) tends to saturate channels when computing the optimal split. As far as I understand here we are not computing optimal splits anyway this but I thought it is useful to point out that the convexity of the negativ log prob is actually very desirable.
This problem of saturating channels fully when switching to the linearized problem can be avoided by changing the model of the uncertainty network and constrain channel capacities to be lower than they actually are. However as of right now it is totally not clear to me how to do this properly without arbitrary choices. Yet it would allow us to use simplex based min cost flow solvers for linear min cost flow problems which would yield a huge runtime improvement (assuming the base fee is set to zero or heavily deincentivised as ZmnSCPxj suggested)

@renepickhardt Not sure I quite understand the comment. The discussion here is only about approximating log10, not about approximating the negative log of the success probability.

So you're saying we should just use a linear division instead of log entirely? That is confusing to me.

TheBlueMatt · 2022-01-06T02:51:14Z

lightning/src/routing/scoring.rs

+/// [`ProbabilisticScoringParameters`] for details.
+///
+/// [1]: https://arxiv.org/abs/2107.05322
+pub type ProbabilisticScorer<G> = ProbabilisticScorerUsingTime::<G, ConfiguredTime>;


Hmm, if we're not gonna delete Scorer (I think we can, tbh, given its naive and we have a trait for it anyway, but if you don't want to that's fine), then we should probably put this new stuff in its own module, no?

I figured we get rid of Scorer eventually. Could do so now, if you want.. I have no strong preference. But I didn't want to use the same name given the serialization formats are different. Not sure if it makes sense to blindly upgrade users (if we could even detect that we're reading the old format?) or force them to by changing the type name to ProbabilisticScorer. Let me know what you think.

Hmmm, fair, I guess if the intent is to leave both for a version or two and then drop Scorer after a version or two that seems fine to me. Fine to leave them in the same module for now, then.

TheBlueMatt · 2022-01-06T02:53:55Z

lightning/src/routing/scoring.rs

+			Probability::One => 0,
+			Probability::Ratio { numerator, denominator } => {
+				let success_probability = numerator as f64 / denominator as f64;
+				(-(success_probability.log10()) * amount_msat as f64) as u64


Does it make sense to multiply by the amount here given we're not trying to do any bandwidth-based scoring techniques? If we do eventually get a smarter router it seems like that will just blow up the path into a ton of MPP retries. I'm a bit confused cause this seems non-intuitive to me at all, I'd think you'd want some (configurable) constant multiplied by the success probability log.

Wasn't really sure about this and had posed a question here: #1172 (comment). Had left the amount in the draft while waiting for a response, but yeah maybe a constant/param makes more sense.

I was primarily concerned with mixing two costs (a probability and the channel fees). Wasn't sure if we should normalize to use msats (as done here) --and if so how -- or have the scorer work in terms of either a probability or a penalty (e.g., by returning an enum) and having the router determine if the channel fees should be considered at all based on the variant.

Right, we have to normalize, but to normalize we'd want to make it a knob, I think - "msats charged per log(failure %)", though. The lack of knob here means users aren't able to control by how much they want to avoid failures, ie they may be willing to take a little longer to pay by retrying a few times in order to avoid paying additional fees. Multiplying by the amount being sent doesn't seem like a useful "penalty per log(failure rate)" answer.

Added a liquidity_penalty_multiplier_msat parameter and used that instead of the amount.

lightning/src/routing/scoring.rs

lightning/src/routing/network_graph.rs

codecov-commenter · 2022-01-07T16:18:05Z

Codecov Report

Merging #1227 (18e467b) into main (457e48e) will increase coverage by 0.14%.
The diff coverage is 93.51%.

❗ Current head 18e467b differs from pull request most recent head d5db85e. Consider uploading reports for the commit d5db85e to get more accurate results

@@            Coverage Diff             @@
##             main    #1227      +/-   ##
==========================================
+ Coverage   90.41%   90.56%   +0.14%     
==========================================
  Files          71       71              
  Lines       38356    39248     +892     
==========================================
+ Hits        34680    35544     +864     
- Misses       3676     3704      +28

Impacted Files	Coverage Δ
lightning-invoice/src/payment.rs	`92.75% <ø> (ø)`
lightning/src/util/test_utils.rs	`82.26% <ø> (ø)`
lightning/src/routing/router.rs	`91.79% <80.79%> (-0.02%)`	⬇️
lightning/src/routing/network_graph.rs	`89.98% <90.74%> (+0.25%)`	⬆️
lightning/src/routing/scoring.rs	`95.03% <96.66%> (+1.76%)`	⬆️
lightning-background-processor/src/lib.rs	`93.04% <100.00%> (ø)`
lightning-invoice/src/utils.rs	`84.07% <100.00%> (ø)`
lightning/src/ln/channelmanager.rs	`84.47% <100.00%> (ø)`
lightning/src/ln/functional_test_utils.rs	`95.23% <100.00%> (ø)`
lightning/src/ln/functional_tests.rs	`97.21% <100.00%> (-0.07%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 457e48e...d5db85e. Read the comment docs.

lightning/src/routing/network_graph.rs

jkczyz · 2022-01-13T17:48:07Z

Benchmark using Scorer::with_fixed_penalty(0):

test routing::router::benches::generate_mpp_routes        ... bench:  50,453,566 ns/iter (+/- 34,709,542)
test routing::router::benches::generate_routes            ... bench:  49,923,040 ns/iter (+/- 37,960,924)

Benchmark using ProbabilisticScorer with log10:

test routing::router::benches::generate_mpp_routes        ... bench:  71,292,994 ns/iter (+/- 47,668,027)
test routing::router::benches::generate_routes            ... bench:  68,730,551 ns/iter (+/- 48,475,471)

Benchmark using ProbabilisticScorer with log10_approx:

test routing::router::benches::generate_mpp_routes        ... bench:  67,494,709 ns/iter (+/- 44,359,544)
test routing::router::benches::generate_routes            ... bench:  65,806,830 ns/iter (+/- 48,331,735)

Note that each iteration of the ProbabilisticScorer versions needs to crate a new ProbabilisticScorer and thus must copy the node's pubkey with NodeId::from_pubkey.

TheBlueMatt · 2022-01-13T20:34:06Z

lightning/src/routing/router.rs

@@ -4968,6 +4994,8 @@ mod benches {
 				let dst = PublicKey::from_slice(nodes.keys().skip(seed % nodes.len()).next().unwrap().as_slice()).unwrap();
 				let payee = Payee::from_node_id(dst).with_features(InvoiceFeatures::known());
 				let amt = seed as u64 % 1_000_000;
+				let params = ProbabilisticScoringParameters::default();
+				let scorer = ProbabilisticScorer::new(params, &src, &graph);


Can you add new benchmarks - one with the classic Scorer with no data (or, better yet, dummy scorer that always returns 0 that can be inlined), one with a Scorer with random data, and one with ProbabilisticScorer. In the interest of expediting this PR we can skip the second because the first just requires copying the function and maybe a five-line struct definition.

Is it worth keeping around both generate_routes and generate_mpp_routes variations of the benchmarks? AFAICT, get_route will use MPP if node announcement in the NetworkGraph indicates support, even if the invoice doesn't. Looking at some results, there isn't a material difference between the two for a given scorer.

test routing::router::benches::generate_mpp_routes_with_default_scorer ... bench: 92,654,109 ns/iter (+/- 31,531,162) test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 107,804,711 ns/iter (+/- 53,006,509) test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer ... bench: 75,926,475 ns/iter (+/- 37,127,351) test routing::router::benches::generate_routes_with_default_scorer ... bench: 92,888,708 ns/iter (+/- 32,657,001) test routing::router::benches::generate_routes_with_probabilistic_scorer ... bench: 107,996,503 ns/iter (+/- 51,133,247) test routing::router::benches::generate_routes_with_zero_penalty_scorer ... bench: 75,003,204 ns/iter (+/- 36,967,684)

Ah, seems I can hit the non-MPP case by using Some(InvoiceFeatures::empty()) instead of None.

Added some commits up front that:

Refactors out a benchmark utility to use with different scorers and features

Exercises first_hops, which simplifies testing ProbabilisticScorer

Seeds the scorer with success and failure data

Benchmarks a zero-penalty scorer

The commits are fairly self-contained, but let me know if you prefer that I pull them out into a separate PR.

lightning/src/routing/network_graph.rs

lightning/src/routing/router.rs

lightning/src/routing/scoring.rs

lightning/src/routing/router.rs

lightning/src/routing/network_graph.rs

valentinewallace · 2022-01-18T17:49:28Z

lightning/src/routing/router.rs

+		match self {
+			CandidateRouteHop::FirstHop { .. } => 0,
+			CandidateRouteHop::PublicHop { info, .. } => {
+				info.direction().unwrap().cltv_expiry_delta as u32


Not obvious to me that this is safe to unwrap, similar elsewhere 🤔 Could we add a comment?

Good catch! One case was inadvertently removed in a fixup. The other is guarded by an if let. Added a comment to the field.

valentinewallace · 2022-01-18T19:00:26Z

lightning/src/routing/router.rs

+		let mut prev_cltv_expiry_delta = final_cltv_expiry_delta;
+		for hop in path.iter_mut().rev() {
+			core::mem::swap(&mut prev_cltv_expiry_delta, &mut hop.as_mut().unwrap().cltv_expiry_delta);
+		}


I see tests fail without this but I'm confused on what it's accomplishing, comment?

This change was from Matt's fixup which removed modification of this field in ordered_hops inside the path_walk loop. Added a comment and used fold to propagate the delta backwards. @TheBlueMatt Let me know if the comment makes sense or could be expanded somehow.

Right, this was previously in the if node_id == our_node_id loop, above, see comment there for more info, it moved here to handle moving the data into the CandidateRouteHop.

lightning/src/routing/scoring.rs

valentinewallace · 2022-01-18T20:37:15Z

lightning/src/routing/scoring.rs

+	let leading_zeros = x.leading_zeros();
+	let integer_part = (63 - leading_zeros) as f64;
+	let fractional_part = LOG2_FRACTIONAL_PART[
+		(((x << leading_zeros) >> (63 - FRACTIONAL_BITS)) & FRACTIONAL_BITMASK) as usize


This commit kinda adds a good amount of complexity imo... Are we sure the perf gainz are worthwhile? Maybe better for follow-up?

Here are the latest benchmark numbers both before and after this optimization.

Without approximation:

test routing::router::benches::generate_mpp_routes_with_default_scorer ... bench: 88,374,024 ns/iter (+/- 19,792,155) test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 104,953,746 ns/iter (+/- 48,003,617) test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer ... bench: 74,455,389 ns/iter (+/- 37,400,486) test routing::router::benches::generate_routes_with_default_scorer ... bench: 43,661,367 ns/iter (+/- 11,777,146) test routing::router::benches::generate_routes_with_probabilistic_scorer ... bench: 52,373,973 ns/iter (+/- 25,136,830) test routing::router::benches::generate_routes_with_zero_penalty_scorer ... bench: 36,233,816 ns/iter (+/- 18,920,800)

With approximation:

test routing::router::benches::generate_mpp_routes_with_default_scorer ... bench: 90,098,517 ns/iter (+/- 20,624,987) test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 102,870,120 ns/iter (+/- 50,438,276) test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer ... bench: 74,644,550 ns/iter (+/- 39,539,073) test routing::router::benches::generate_routes_with_default_scorer ... bench: 43,759,396 ns/iter (+/- 10,977,942) test routing::router::benches::generate_routes_with_probabilistic_scorer ... bench: 50,402,866 ns/iter (+/- 25,601,834) test routing::router::benches::generate_routes_with_zero_penalty_scorer ... bench: 35,911,935 ns/iter (+/- 17,529,777)

So roughly 2 ms faster, though the numbers may be a little noisy. Happy to drop the commit if you both agree. Would save me the effort from updating the unit test assertions. 😛

My initial reaction is that those gains seem somewhat not worth it or at least better suited for follow-up, but would have to get @TheBlueMatt's thoughts

Hmm, this seems....not useful at all? Its maybe useful but clearly dwarfed by noise, so unclear if useful. I'd say avoid it until a followup PR?

Yeah, FWIW I think the noise is somewhat inherent to the benchmark setup. There's bound to be variance if the distance between source and destination nodes vary between iteration. We may want to use a smaller set of pairs but use all of them on each iteration instead of just one per iteration.

Ran some benchmarks with a single path greater than 7 hops. The variance went down but similar results in terms of performance. Reverted the approximation commit and removed the Probability enum.

lightning/src/routing/scoring.rs

renepickhardt

I am in the middle of review and couldn't add more comments as the diff was changed. I hope the comments are not lost but I thought maybe I can just review the comments that have already been made

renepickhardt · 2022-01-19T16:28:35Z

lightning/src/routing/scoring.rs

+	fn default() -> Self {
+		Self {
+			liquidity_penalty_multiplier_msat: 1000,
+			liquidity_offset_half_life: Duration::from_secs(3600),


How is that value being chosen? Is there emperical evidence or is this just an arbitrary choice to start with and collect experience? Might make sense to make this configureable either in config or in the API that invokes path selection. In this way if the returned candidate paths are not satisfiable one might be able to give the half life duration as an argument and make all the knowledge to be forgotten for example by setting the value to 0 or 1 second.

How is that value being chosen? Is there emperical evidence or is this just an arbitrary choice to start with and collect experience? Might make sense to make this configureable either in config or in the API that invokes path selection.

This is an arbitrary value and is only configurable upon initialization as of now.

In this way if the returned candidate paths are not satisfiable one might be able to give the half life duration as an argument and make all the knowledge to be forgotten for example by setting the value to 0 or 1 second.

Hmm... not sure the best way to go about this (and if we'd want to do so). The caller is parameterized with any scorer, one which may not have any concepts of decay. I suppose the Score interface could have a method that returns an "empty" instance of itself to use temporarily.

Because the data is updated during processing I'm not sure how it could be configurable per-route-finding call, but it is configurable.

renepickhardt · 2022-01-19T16:35:59Z

lightning/src/routing/network_graph.rs

+	/// Returns the [`EffectiveCapacity`] of the channel in a specific direction.
+	///
+	/// This is either the total capacity from the funding transaction, if known, or the
+	/// `htlc_maximum_msat` for the direction as advertised by the gossip network, if known,


Let there be a channel of capacity 100M msats as seen in the funding tx and let the htlc_maxiumum_msat be 10M msats. This means of course that the probability for any single htlc to be forwarded that has an amount larger than 10M msats should be 0. However I don't the probability to forward exactly 10M msats should be close to 0 bur rather close to 0.9 = (100M - 10M )/ (100M) ~ (100M - 10M +1)/ (100M +1). What I am trying to say: "in probabilistic socring we should always start from the full capacity". of course given prior knowledge we might use terms as effective capacity that reduces the interval in which uncertainty exists. I will have to find the position in the code where the actual score is being computed. Still I thought i leave this comment here.

Yeah, I had spoken the @TheBlueMatt about this but in the context of MPP. In get_route we reduce the effective capacity by any potential in-flight HTLCs for the same payment. But, as you noted, if htlc_maximum_msat <<< capacity then this may not be desirable as we should be able route more through the channel using multiple HTLCs. IIRC, Matt mentioned that implementations typically default htlc_maximum_msat to a value close the channel capacity, which may alleviate both concerns even though not necessarily correct.

One thought I had was to pass the EffectiveCapacity to the scorer (instead of converting to msats) possibly with in-flight amounts and have it determine whether or not to subtract those. This is partly why I wanted to add EffectiveCapacity. I guess ideally htlc_maximum_msat would only need to be considered in get_route, obviating the need for the MaximumHTLC variant.

One further issue is many clients only have the htlc_maximum_msat value - some clients do not verify the on-chain output at all (subject to some potential DoS attacks). Depending on the direction gossip goes this may be increasingly common.

renepickhardt · 2022-01-19T16:55:42Z

lightning/src/routing/scoring.rs

+enum Probability {
+	Zero,
+	One,
+	Ratio { numerator: u64, denominator: u64 },


while the approximation of the negative log probability is just this mentioned fraction all my tests show that this linear approximation is not too useful as it (similarly with the linear fee rate) tends to saturate channels when computing the optimal split. As far as I understand here we are not computing optimal splits anyway this but I thought it is useful to point out that the convexity of the negativ log prob is actually very desirable.
This problem of saturating channels fully when switching to the linearized problem can be avoided by changing the model of the uncertainty network and constrain channel capacities to be lower than they actually are. However as of right now it is totally not clear to me how to do this properly without arbitrary choices. Yet it would allow us to use simplex based min cost flow solvers for linear min cost flow problems which would yield a huge runtime improvement (assuming the base fee is set to zero or heavily deincentivised as ZmnSCPxj suggested)

renepickhardt · 2022-01-19T17:00:45Z

lightning/src/routing/scoring.rs

+impl Default for ProbabilisticScoringParameters {
+	fn default() -> Self {
+		Self {
+			liquidity_penalty_multiplier_msat: 1000,


This seems only relevent if (as one should) also fees are taken into consideration as a feature to compute the cost of a hop / path (I think you internally name the cost Score, so please don't get confused when I talk about cost instead of score). As our benchmarks and prior to them simulations in c-lightning demonstrated (ElementsProject/lightning#4771 (comment)) using the harmonic mean instead of a weighted arithmetic mean to combine several features (negative log probs, fees, cltv delta) worked better. In case you would do the same this value seems useless. Otherwise one could of course ask why is it chosen as 1000 right now.

This seems only relevent if (as one should) also fees are taken into consideration as a feature to compute the cost of a hop / path (I think you internally name the cost Score, so please don't get confused when I talk about cost instead of score). As our benchmarks and prior to them simulations in c-lightning demonstrated (ElementsProject/lightning#4771 (comment)) using the harmonic mean instead of a weighted arithmetic mean to combine several features (negative log probs, fees, cltv delta) worked better. In case you would do the same this value seems useless.

Currently, the scorer returns a penalty in msats for using a channel, which is simply added to the fees to get the cost:

rust-lightning/lightning/src/routing/router.rs

Lines 957 to 962 in d2806cb

let old_cost = cmp::max(old_entry.total_fee_msat, old_entry.path_htlc_minimum_msat)

.checked_add(old_entry.path_penalty_msat)

.unwrap_or_else(|| u64::max_value());

let new_cost = cmp::max(total_fee_msat, path_htlc_minimum_msat)

.checked_add(path_penalty_msat)

.unwrap_or_else(|| u64::max_value());

Given a scorer that computes a negative log of a success probability, how would you recommend combining this "score" with the fees to get a cost?

Otherwise one could of course ask why is it chosen as 1000 right now.

FYI, the value is an arbitrary default which can be customized by the user. That said, we may want a more reasonable default.

What do you mean "better", here? There is no objective metric by which one could argue one way of combining different goals is "better" than another - hence why this is configurable.

What do you mean "better", here? There is no objective metric by which one could argue one way of combining different goals is "better" than another - hence why this is configurable.

Yes you have already mentioned that on the mailing list and I agree with you (never disagreed) that there is not one single objective, universally true metric here. The way I used the word "better" was having in mind to combine the relevant features which had previously been identified (eg probabilities or fees)

That being said the topic here goes far beyond Lightning Path finding / payment delivery. In many cases where you have several features (like in our case Probability to model reliability (and with it potentially speed), fees to model actual costs, latency or geodistance to model speed, ...) you will always have the questions: "How should those features be combined into one single score / metric?". It turns out that many people intuitively take the arithmetic mean which is essentially just adding the features or a weighted arithmetic mean which is almost the same as normalizing the features and adding them without weight. You are currently also doing this and I would strongly advice against doing that.

You take your old score and add the new one and you currently weight it with a factor of 1000 as in liquidity_penalty_multiplier_msat. In almost all situations in my life when I was doing data science / machine learning it turned out that in practice the arithmetic mean a+w*b is not as useful as using a*b/(a+b) (I ignore constants). The later seems to have several advantages in practise:

You do not need to learn and test for the weight(s)

features (in our case for channels) might not be correlated. so a weighting for one channel might be totally wrong for another channel

It seems to be more resilient against outliers and seems to be able to capture the mix of features better even if they are in different orders of magnitude or are differently distributed.

I have discussed this in great length with the c-lightning team - in particular @cdecker - and my preliminary simulations as well as our mainnet experiments and evaluations show that arithmetic mean heavily underperformed in comparison to the other way to compute a mean. Also recently I was made aware that there might actually be deeper mathematical motivation in there which in the field of economics and risk management is being used all the time and might have already been discovered in the early 18th hundred by Bernoulli in this work - though I haven't looked deeper into this yet.

Also on a side note. The arithmetic mean obviously has the advantage of being linear and useful for the case of min cost flows if all the features are linear and convex. This is not true for the mean that I suggest. But as your code currently only looks at dijkstra we don't need to have a linear or convex cost function.

Also as suggested to the c-lightning team. if you don't believe me it should be very easy to have both versions of mean computation and do a simple A/B test against any of your favourite test criteria (I usually like "number of attempts" or time to delivery) to see if you want to keep the arithmetic mean here

I'm still very confused by this, I read through the links you provide and none of them discuss failure modes or seem to provide any justification for harmonic mean here, aside from some anecdotal suggestions. The discussion from c-lightning just links to an lnd issue, which provides no justification, and it looks like c-lightning stuck with simply adding the scores as well?

Indeed, the code here (and the current model) focuses entirely on fees as the metric we're optimizing for. As a result of this, we denominate the scorer as "amount of fee you're willing to pay to avoid this channel". This is not only nicely simple from a theoretical perspective, but also simple enough that our users can at least roughly understand the metric and can set the tunables here without much interaction from us - they can easily run trivial calculations to determine how much their "avoid channels" cost will be, and can tune things appropriately based on how much fee they're willing to pay. In the future we could even allow these tunables to be set on a per-payment basis, given we have a model here that doesn't incorporate the concrete fee values.

Indeed, in the future, I'd very much like to have multiple top-to-bottom routing algorithms, eg there has been very little work done on how you'd design a lightning routing algorithm "for privacy", without seriously considering fees or time-to-pay/-retries, but even just as-is, users having the ability to drastically tune the knob between retry-count and fees-paid seems pretty important, which gets much harder with the harmonic mean, IIUC.

https://github.com/ElementsProject/lightning/blob/a98ccac7774297f255f0a4f585384609af87c3d6/plugins/libplugin-pay.c#L735

shows what c-lightning does and it was evaluated bu conducting mainnet evaluation: https://medium.com/blockstream/c-lightning-v0-10-2-bitcoin-dust-consensus-rule-33e777d58657

The blog article however left out the comparison between the means but I remember evaluating the statistics for test payments myself.

Of course the evaluation that @cdecker and I did might not have been sufficiently large or realistic from the assumptions.

btw I think the much bigger argument for using an arithmetic mean is that it is linear which means that if the features are linear the cost function stays linear (of if features are convex the cost function stays convex). When using min cost flow solvers to compute optimal splits (given the cost function and assuming the cost function properly captures the goals of the node) then the arithmetic mean will not have a negative impact on the runtime of the min cost flows solvers (assuming all features are at worst convex). As the cost function is currently only used in dijkstra we need neither linear nor convex so we don't care.

lightning/src/routing/scoring.rs

renepickhardt · 2022-01-19T17:08:37Z

lightning/src/routing/scoring.rs

+	/// Returns the success probability of routing the given HTLC `amount_msat` through the channel
+	/// in this direction.
+	fn success_probability(&self, amount_msat: u64) -> Probability {
+		let max_liquidity_msat = self.max_liquidity_msat();


regarding one of my earlier comments I tried to chase down the logic of how this value is being derived and as far as I can tell this is the channel capacity or a potentially decayed learned bound but it is not constrained by htlc_max_msat like in the effective capacity data structure. I just leave the comment as I would like to suggest a second pair of eyes double checks this.

The EffectiveCapacity::as_msats is passed to the scorer's channel_penalty_msat method, which is used as the capacity here via ChannelLiquidity::as_directed.

I believe this value comes from the as_directed call in channel_penalty_msat which gets its information from the available_liquidity in the router, which calls DirectedChannelInfo::effective_capacity which is bound by the htlc_maximum_msat. Not quite sure whether you were saying this should or should not be, but it is.

yeah I addressed this in another comment. I say for probability computation you should use the actual capacity (reduced by potential knowlege about existing liquidity) instead of capping this at htlc_maximum_msat. Using htlc_maxiumum_msat will result in potentially strange result (see this comment: #1227 (comment))

renepickhardt · 2022-01-19T17:11:43Z

lightning/src/routing/scoring.rs

+
+impl<L: DerefMut<Target = u64>, T: Time, U: DerefMut<Target = T>> DirectedChannelLiquidity<L, T, U> {
+	/// Adjusts the channel liquidity balance bounds when failing to route `amount_msat`.
+	fn failed_at_channel(&mut self, amount_msat: u64) {


I just have to say that I really love to see this and the following functions being implemented here and that you will track the uncertainty / information gain properly! I will be really curios how much this also improves payment delivery in practice in comparison to the recent updates that we have seen from c-lightning using only probabilistic scoring but forgetting most of the learnt information directly

Hopefully Cash App switches and uses the new one and we can get some good data from that!

renepickhardt · 2022-01-19T17:17:37Z

lightning/src/routing/scoring.rs

+
+	/// Adjusts the channel liquidity balance bounds when successfully routing `amount_msat`.
+	fn successful(&mut self, amount_msat: u64) {
+		let max_liquidity_msat = self.max_liquidity_msat().checked_sub(amount_msat).unwrap_or(0);


I am a but confused if this and the prior two functions always work properly with concurrency. But it seems to me that concurrency is actually not messing up the book keeping here. But how do you handle the decay of the knowledge when substracting from max_liquidity? This could happen at two very different moments. Let's say you forward one htlc at 12:00 and the next at 12:45 both have been successful. How will you compute the max liquidity at 13:05? As far as I understand the knowledge from the first routing event should have been decyed to half at that time. I am not sure how this is reflected in this data structure. might not be a problem as the halflife is chosen arbitrarlity at this time anyway but I thought I should point this out.

I am a but confused if this and the prior two functions always work properly with concurrency. But it seems to me that concurrency is actually not messing up the book keeping here.

It's accomplished by having get_route take a non-mut reference to a Score, while the event handler needs a mut reference. The compiler ensures we only ever have one mut reference, which we accomplish through using a Mutex. If you're curious how this works see:

rust-lightning/lightning/src/routing/scoring.rs

Lines 123 to 146 in e4387fa

/// A scorer that is accessed under a lock.

///

/// Needed so that calls to [`Score::channel_penalty_msat`] in [`find_route`] can be made while

/// having shared ownership of a scorer but without requiring internal locking in [`Score`]

/// implementations. Internal locking would be detrimental to route finding performance and could

/// result in [`Score::channel_penalty_msat`] returning a different value for the same channel.

///

/// [`find_route`]: crate::routing::router::find_route

pub trait LockableScore<'a> {

/// The locked [`Score`] type.

type Locked: 'a + Score;

/// Returns the locked scorer.

fn lock(&'a self) -> Self::Locked;

}

/// (C-not exported)

impl<'a, T: 'a + Score> LockableScore<'a> for Mutex<T> {

type Locked = MutexGuard<'a, T>;

fn lock(&'a self) -> MutexGuard<'a, T> {

Mutex::lock(self).unwrap()

}

}

rust-lightning/lightning-invoice/src/payment.rs

Lines 166 to 167 in e4387fa

R: for <'a> Router<<<S as Deref>::Target as LockableScore<'a>>::Locked>,

S::Target: for <'a> LockableScore<'a>,

But how do you handle the decay of the knowledge when substracting from max_liquidity? This could happen at two very different moments. Let's say you forward one htlc at 12:00 and the next at 12:45 both have been successful. How will you compute the max liquidity at 13:05? As far as I understand the knowledge from the first routing event should have been decyed to half at that time. I am not sure how this is reflected in this data structure. might not be a problem as the halflife is chosen arbitrarlity at this time anyway but I thought I should point this out.

max_liquidity_msat and min_liquidity_msat in DirectedChannelLiquidity return the decayed values. The internal representation in ChannelLiquidity consists of min and max un-decayed offsets along with a time the values were modified. The decay is realized in the offset representation any time they need to be modified. See set_min_liquidity_msat and set_max_liquidity_msat.

renepickhardt

I did not look at the test cases yet. Overall I really like the tracking of the channel liquidity to make path finding decisions and I am really looking forward to see how the final version will perform!

As already addressed in my first part of the review I think I have now discovered the line where you seem to reduce the effective liquidity too heavily in computing the probabilities and I have provided a counter example (assuming my understanding of the code is correct) in my comment. While of course channels should have a 0 probability if the amount is larger than the htlc_max_size constraint the probabilities for amounts smaller or equal to the constraint should still be computed with the actual channel capacity and not this effective capacity measure.

Also I did not see how you handle channel reserve values. Technically one would always have to reduce the reserves too in the computation of the probabilities as the reserves are not accessible for routing but I guess the effect can be neglected as we usually try not to saturate the full liquidity anyway.

Let me now if you need help in benchmarking and with evaluating in a mainnet experiment.

renepickhardt · 2022-01-19T17:40:59Z

lightning/src/routing/scoring.rs

+			Probability::Zero => u64::max_value(),
+			Probability::One => 0,
+			Probability::Ratio { numerator, denominator } => {
+				let log_success_probability = log10_approx(numerator, denominator);


While the log is merely introduced to switch from the multiplicative nature of channel probabilities for an entire path to adding the scores per hop and making this metric usable for shortest path algos and min cost flows the base of the log does not really matter. for the actual computation / ranking of the scored paths. Of corse it might when combinging with other features in a weighted arithmetic mean...

However from an information theoretic point of view the base 2 log is usually used as that would measure the entropy and information gain in bits. it would also allow us to have very fast approximations as we just needed the first significant bit of a number.

While the log is merely introduced to switch from the multiplicative nature of channel probabilities for an entire path to adding the scores per hop and making this metric usable for shortest path algos and min cost flows the base of the log does not really matter. for the actual computation / ranking of the scored paths. Of corse it might when combinging with other features in a weighted arithmetic mean...

Note earlier comment about how we combine penalty and fees to get the cost.

However from an information theoretic point of view the base 2 log is usually used as that would measure the entropy and information gain in bits. it would also allow us to have very fast approximations as we just needed the first significant bit of a number.

IIRC, when using base 2, I would have a zero penalty for larger success probabilities when base 10 would give me something small but non-zero. Though this may also have had to due with the value I was multiplying by.

Additionally, when using log2(numerator) - log2(denominator), it's easy to get a probability of zero if the amount is close to the min liquidity. That is, subtracting them from max liquidity to get the numerator and denominator, respectively, may give two values with identical most significant bits (i.e., zero when subtracted).

While the log is merely introduced to switch from the multiplicative nature of channel probabilities for an entire path to adding the scores per hop and making this metric usable for shortest path algos and min cost flows the base of the log does not really matter. for the actual computation / ranking of the scored paths. Of corse it might when combinging with other features in a weighted arithmetic mean...

Note earlier comment about how we combine penalty and fees to get the cost.

note my reply: #1227 (comment) I advice against the way how you currently do that.

IIRC, when using base 2, I would have a zero penalty for larger success probabilities when base 10 would give me something small but non-zero. Though this may also have had to due with the value I was multiplying by.

nit: which - when rounding is taking place - might actually be ok. small payments kind of don't mind which channel to take. They have a really high probability and the log could be close to 0 or 0 (though technically only for probability of 1 we want zero as a result.

Additionally, when using log2(numerator) - log2(denominator), it's easy to get a probability of zero if the amount is close to the min liquidity. That is, subtracting them from max liquidity to get the numerator and denominator, respectively, may give two values with identical most significant bits (i.e., zero when subtracted).

Shouldn't that only be true if you take integers of the base2 log or like I suggested estimate this with the position of the first non zero bit of the numerator and denominator?

renepickhardt · 2022-01-19T17:50:45Z

lightning/src/routing/scoring.rs

+
+			let capacity_msat = network_graph.channels()
+				.get(&hop.short_channel_id)
+				.and_then(|channel| channel.as_directed_to(&target).map(|d| d.effective_capacity()))


if the effective_capacity is filled with values like htlc_maxsize_msat I fear that the computed probabilities will be lower than the actual probabilities and the negative log probabilities will be overestimated. One might argue that this might not even be a problem if we do the same mistake for all channels but the htlc_maxsize_msat value will be different for various channels let me make this example clear.

let's assume we want to route 1000 msat and we have two channels to chose from:

a) a channel of size 3000 msats with max htlc size of 3000 msats o
b) a channel of size 30000000 msats with max htlc size of 2000 msats

with your definition of effective capacity being bound by the max htlc size you would give channel a) a higher likelihood where in reality the successprobability of channel b) should be muich higher!

Yep, I assume this is the same point made in the following conversation?

https://github.com/lightningdevkit/rust-lightning/pull/1227/files#r787934743

Yes it is the same - only that here is the point where it is actually being initialized. I think this should really be fixed. While stuffing several small HTLCs into a channel in the context of MPP is one concern I think the much bigger concern is that the probability computation becomes severely wrong. Especially large channels should be highly attractive / likely. But I guess folks with large channels will want to cap the max size of htlcs which means that you would potentially not find such channels that have a high chance for high liquidity.

This may be worth fixing, though in practice today its a relatively rare issue. Its also worth noting that not all users have chain data and don't bother fetching the channel's capacity by looking at the chain, using htlc_maximum_msat instead to determine it, because its the best info they have. Some of our users aren't the only ones doing this, I believe Electrum does this as well. Thus, we really need the implementation here to be robust even if we're using htlc_maximum_msat data to infer channel sizes, I'm not sure how best to do that, but just ensuring we pipe the values from on-chain through doesn't really solve the concern.

Ultimately this needs to be resolved at the spec level, of course, but given the values are basically always the same today we may just wait until we have a new gossip format for that.

Addressed this as per a top-level comment, but dropped it in a favor of a follow-up as per offline discussion. Commit was a bit more involved and had a performance regressions, though pre-computing the DirectedChannelInfo::htlc_maximum_msat seem to fix it, oddly given it should only be calculated once.

TheBlueMatt · 2022-01-20T17:19:05Z

Looks like this needs rebase now.

jkczyz · 2022-01-20T17:47:50Z

Looks like this needs rebase now.

Sure does! Mind if I squash the fixups for 3c2df0c? Should make it a little easier.

lightning/src/routing/network_graph.rs

lightning/src/routing/scoring.rs

valentinewallace · 2022-01-31T22:35:43Z

lightning/src/routing/scoring.rs

+	/// such as a chain data, network gossip, or invoice hints, the latter indicating sufficient
+	/// capacity (i.e., near [`u64::max_value`]). Thus, implementations should be overflow-safe.


I thought the previous wording might've been clearer here, suggestion being: "For hints provided in the invoice, we assume the channel has sufficient capacity to accept the invoice's full amount, and provide a channel_capacity_msat of u64::max_value. Thus..."

Reworded a bit to tighten it up. Technically, it can be less than u64::max_value since in-flight HTLCs are deducted for the time being.

lightning/src/routing/scoring.rs

lightning/src/routing/network_graph.rs

lightning/src/routing/scoring.rs

TheBlueMatt

Basically there, mostly doc comments. Still need to finish discussing #1227 (comment)

lightning/src/routing/router.rs

TheBlueMatt · 2022-02-02T03:36:23Z

lightning/src/routing/scoring.rs

+/// Knowledge about channel liquidity balances takes the form of upper and lower bounds on the
+/// possible liquidity. Certainty of the bounds is decreased over time using a decay function. See
+/// [`ProbabilisticScoringParameters`] for details.
+///


I feel like we should say something here about how this is likely to be really great for probing-based nodes or noes with high payment flow, but because we learn the specific liquidity on a channel, instead of learning how often a channel is available, this is unlikely to be a great approach for nodes which have very low payment volume. Scorer isn't a ton better, admittedly, but we likely want to extend this to measure node "badness" later.

Thanks for the update. I think its still a little vague, I suppose the decaying is also bad for long-term learning, but my other long-term-learning concern here is just about how we go about learning to begin with. Our whole thing here is about trying to remember the current channel balances on every channel on the network. We don't so much learn anything as we do try to keep a mental map of the whole network up-to-date all the time. That's great if you're probing or have payment flows, but otherwise we're really not learning at all, IMO.

Fully agree! Those seem to be two orthogonal problems.

The probability that liquidity is available (as a probabilistic model for the reliability)

The actual observed reliability of a specific node / channel (as a provenance score or distribution)

I claim - but obviously don't know - that the first one will be much more impactful. In particular for the first one we know how to properly quantify this as the probabilistic model gives us the propper information theoretic framework.

That being said I am very interested in examining the second one and investigating if it is useful to include it to the cost function. I have drafts for an approach for assigning provenance scores as a probability distribution that can be used to create the joint distribution between this and the liquidity based model. The good thing is that models for the second question can be learnt from log files and also tested against the log file. Will be very happy to collaborate on this in particular as you seem to have access to data / logfiles

Right, I assume that both will be important (though somewhat annoyingly the lightning onion errors dont tell us which happened when we get a fail - liquidity limit or node offline), but either way we don't yet do either and that would be bad for some users :)

Re-worded the first part of the sentence as discussed offline.

Having two opposing conjectures / hypothesis is one of the best starting points for research. As said I am very interested in these two question (1. How to define a reliability based probabilistic provenance score for nodes / channels 2. How useful will this be in the cost function) and happy to collaborate and find this out. Also very happy to at least anecdotaly be proven wrong on my conjecture

lightning/src/routing/scoring.rs

TheBlueMatt · 2022-02-02T04:11:18Z

lightning/src/routing/scoring.rs

+			max_liquidity_offset_msat,
+			capacity_msat,
+			last_updated: &self.last_updated,
+			now: T::now(),


Wow, that's a lot of system-time-gets batman! I guess its pretty quick cause it rtdsc's on x86 usually when its warm, but we should probably fix this later. Maybe add a comment here noting that when we remove old entries, we should just decay them then and not bother decaying during lookups.

Wow, that's a lot of system-time-gets batman! I guess its pretty quick cause it rtdsc's on x86 usually when its warm, but we should probably fix this later.

This should be the similar behavior as Scorer::decayed_penalty_msat, right?

Maybe add a comment here noting that when we remove old entries, we should just decay them then and not bother decaying during lookups.

Not sure I follow. The penalty may not be accurate if we don't decay during lookup. Could you clarify what you mean by "remove old entries"?

This should be the similar behavior as Scorer::decayed_penalty_msat, right?

I think so? Doesn't mean its great, though. IIRC it can be quite cheap on x86, but on other platforms it can be a full syscall.

Not sure I follow. The penalty may not be accurate if we don't decay during lookup. Could you clarify what you mean by "remove old entries"?

I'm assuming when we implement the "remove entries for closed channels" logic, we'll instead just remove entries on a timer that have decayed almost all the way to zero. Once we have a timer, we can just do the decays on the timer, instead of doing the decays while routing.

Gotcha. Yeah, that would be a simple way to remove closed channels. I'd hope there would be a way to use the scorer without a background thread, though.

Another thought would be to push Time into either the Score or find_route interface so T::now could be called once and passed in as needed. Or even build it in to LockableScore in some way such that locking gives back some (S, T) implementing Score.

Yea, that would work too. I dunno what the right answer is. We already require a background thread for the network graph to expire channels, which we'd discussed at the time either doing there or on serialization and decided background thread was simpler. I guess we could have an explicit "channel is dead" call here like we do for the network graph, but that feels like about as much work as just hooking it up to the background thread/a timer? Anyway, its something we can discuss another time, I don't think this needs fixing here, but I worry this is only practically performant on x86.

lightning/src/routing/scoring.rs

TheBlueMatt · 2022-02-03T00:01:58Z

Note the failing benchmarks CI run is already reported as an upstream rustc nightly bug at rust-lang/rust#93599

TheBlueMatt

One further note about the "why not use this without lots of payment flows" comment at #1227 (comment) but I'm good with where we're at.

valentinewallace

LGTM 🚀

lightning/src/routing/network_graph.rs

A channel's capacity may be inferred or learned and is used to make routing decisions, including as a parameter to channel scoring. Define an EffectiveCapacity for this purpose. Score::channel_penalty_msat takes the effective capacity (less in-flight HTLCs for the same payment), and never None. Thus, for hops given in an invoice, the effective capacity is now considered (near) infinite if over a private channel or based on learned information if over a public channel. If a Score implementations needs the effective capacity when updating a channel's score, i.e. in payment_path_failed or payment_path_successful, it can access the channel's EffectiveCapacity via the NetworkGraph by first looking up the channel and then specifying which direction is desired using ChannelInfo::as_directed.

Add a Score implementation based on "Optimally Reliable & Cheap Payment Flows on the Lightning Network" by Rene Pickhardt and Stefan Richter[1]. Given the uncertainty of channel liquidity balances, probability distributions are defined based on knowledge learned from successful and unsuccessful attempts. Then the negative log of the success probability is used to determine the cost of routing a specific HTLC amount through a channel. [1]: https://arxiv.org/abs/2107.05322

ProbabilisticScorer uses successful and unsuccessful payments to gain more certainty of a channel's liquidity balance. Decay this knowledge over time to indicate decreasing certainty about the liquidity balance.

TheBlueMatt reviewed Jan 5, 2022

View reviewed changes

jkczyz force-pushed the 2021-12-probabilistic-scorer branch from d64b8d4 to 63b509f Compare January 6, 2022 01:18

TheBlueMatt reviewed Jan 6, 2022

View reviewed changes

jkczyz force-pushed the 2021-12-probabilistic-scorer branch 2 times, most recently from 99e637b to 22d285d Compare January 7, 2022 15:54

jkczyz mentioned this pull request Jan 7, 2022

Scorer Next Steps #1170

Closed

cdecker reviewed Jan 10, 2022

View reviewed changes

lightning/src/routing/network_graph.rs Outdated Show resolved Hide resolved

jkczyz force-pushed the 2021-12-probabilistic-scorer branch from 22d285d to eb0b930 Compare January 11, 2022 16:03

jkczyz marked this pull request as ready for review January 11, 2022 16:11

jkczyz force-pushed the 2021-12-probabilistic-scorer branch from eb0b930 to 055a356 Compare January 11, 2022 21:39

TheBlueMatt reviewed Jan 12, 2022

View reviewed changes

lightning/src/routing/network_graph.rs Outdated Show resolved Hide resolved

TheBlueMatt added this to the 0.0.105 milestone Jan 13, 2022

TheBlueMatt reviewed Jan 13, 2022

View reviewed changes

valentinewallace self-requested a review January 14, 2022 17:47

valentinewallace reviewed Jan 14, 2022

View reviewed changes

lightning/src/routing/network_graph.rs Show resolved Hide resolved

lightning/src/routing/router.rs Show resolved Hide resolved

lightning/src/routing/scoring.rs Show resolved Hide resolved

jkczyz force-pushed the 2021-12-probabilistic-scorer branch 3 times, most recently from 0544c75 to 25f8816 Compare January 17, 2022 04:56

jkczyz commented Jan 17, 2022

View reviewed changes

lightning/src/routing/router.rs Outdated Show resolved Hide resolved

valentinewallace reviewed Jan 18, 2022

View reviewed changes

jkczyz force-pushed the 2021-12-probabilistic-scorer branch from 25f8816 to d2806cb Compare January 19, 2022 17:18

renepickhardt reviewed Jan 19, 2022

View reviewed changes

renepickhardt suggested changes Jan 19, 2022

View reviewed changes

jkczyz mentioned this pull request Jan 20, 2022

Limit maximum CLTV delta during routing. #1234

Merged

jkczyz force-pushed the 2021-12-probabilistic-scorer branch from d2806cb to 16392a4 Compare January 20, 2022 16:45

jkczyz force-pushed the 2021-12-probabilistic-scorer branch from e0d8080 to 97fdf76 Compare February 1, 2022 03:23

valentinewallace reviewed Feb 1, 2022

View reviewed changes

jkczyz force-pushed the 2021-12-probabilistic-scorer branch 2 times, most recently from 421833c to e4c49d7 Compare February 1, 2022 17:39

valentinewallace reviewed Feb 1, 2022

View reviewed changes

lightning/src/routing/scoring.rs Show resolved Hide resolved

lightning/src/routing/network_graph.rs Outdated Show resolved Hide resolved

lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved

lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved

jkczyz force-pushed the 2021-12-probabilistic-scorer branch from e4c49d7 to 575cae0 Compare February 1, 2022 23:38

TheBlueMatt reviewed Feb 2, 2022

View reviewed changes

lightning/src/routing/scoring.rs Outdated Show resolved Hide resolved

jkczyz force-pushed the 2021-12-probabilistic-scorer branch 2 times, most recently from 18e467b to f372fe9 Compare February 2, 2022 23:34

TheBlueMatt previously approved these changes Feb 3, 2022

View reviewed changes

valentinewallace previously approved these changes Feb 3, 2022

View reviewed changes

lightning/src/routing/network_graph.rs Outdated Show resolved Hide resolved

jkczyz added 5 commits February 2, 2022 19:46

Decay channel liquidity balance offsets

2e730cd

ProbabilisticScorer uses successful and unsuccessful payments to gain more certainty of a channel's liquidity balance. Decay this knowledge over time to indicate decreasing certainty about the liquidity balance.

Deprecate Scorer in favor of ProbabilisticScorer

28faf89

Add ProbabilisticScorer benchmarks

d5db85e

jkczyz dismissed stale reviews from valentinewallace and TheBlueMatt via d5db85e February 3, 2022 02:23

jkczyz force-pushed the 2021-12-probabilistic-scorer branch from f372fe9 to d5db85e Compare February 3, 2022 02:23

TheBlueMatt approved these changes Feb 3, 2022

View reviewed changes

valentinewallace approved these changes Feb 3, 2022

View reviewed changes

valentinewallace merged commit a1fedea into lightningdevkit:main Feb 3, 2022

renepickhardt mentioned this pull request Mar 1, 2022

Pathfinding: take channel capacity into account lightningnetwork/lnd#5988

Closed

This was referenced Mar 11, 2022

Include the Uncertainty Network into the Payment Session and potentially acrosse payment sessions ElementsProject/lightning#4920

Open

Improvement to payment splitting algorithm ACINQ/eclair#1698

Open

renepickhardt mentioned this pull request Mar 24, 2022

Make log optional in ProbabilisticScorer #1372

Closed

jkczyz mentioned this pull request Mar 24, 2022

Pass EffectiveCapacity through to scorer instead of a u64 #1383

Closed

renepickhardt mentioned this pull request Mar 30, 2023

Forgetting learnt uncertainty reduction of liquidity in case of contradictions lightningnetwork/lnd#7553

Open

	let old_cost = cmp::max(old_entry.total_fee_msat, old_entry.path_htlc_minimum_msat)
	.checked_add(old_entry.path_penalty_msat)
	.unwrap_or_else(\|\| u64::max_value());
	let new_cost = cmp::max(total_fee_msat, path_htlc_minimum_msat)
	.checked_add(path_penalty_msat)
	.unwrap_or_else(\|\| u64::max_value());

	/// A scorer that is accessed under a lock.
	///
	/// Needed so that calls to [`Score::channel_penalty_msat`] in [`find_route`] can be made while
	/// having shared ownership of a scorer but without requiring internal locking in [`Score`]
	/// implementations. Internal locking would be detrimental to route finding performance and could
	/// result in [`Score::channel_penalty_msat`] returning a different value for the same channel.
	///
	/// [`find_route`]: crate::routing::router::find_route
	pub trait LockableScore<'a> {
	/// The locked [`Score`] type.
	type Locked: 'a + Score;

	/// Returns the locked scorer.
	fn lock(&'a self) -> Self::Locked;
	}

	/// (C-not exported)
	impl<'a, T: 'a + Score> LockableScore<'a> for Mutex<T> {
	type Locked = MutexGuard<'a, T>;

	fn lock(&'a self) -> MutexGuard<'a, T> {
	Mutex::lock(self).unwrap()
	}
	}

	R: for <'a> Router<<<S as Deref>::Target as LockableScore<'a>>::Locked>,
	S::Target: for <'a> LockableScore<'a>,

		/// such as a chain data, network gossip, or invoice hints, the latter indicating sufficient
		/// capacity (i.e., near [`u64::max_value`]). Thus, implementations should be overflow-safe.

Probabilistic channel scoring #1227

Probabilistic channel scoring #1227

Conversation

jkczyz commented Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkczyz Jan 6, 2022 • edited Loading

Choose a reason for hiding this comment

TheBlueMatt Jan 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 7, 2022 • edited Loading

Codecov Report

jkczyz commented Jan 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkczyz Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valentinewallace Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkczyz Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

renepickhardt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkczyz Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

renepickhardt left a comment

Choose a reason for hiding this comment

jkczyz commented Jan 5, 2022 •

edited

Loading

TheBlueMatt Jan 5, 2022 •

edited

Loading

TheBlueMatt Jan 19, 2022 •

edited

Loading

jkczyz Jan 6, 2022 •

edited

Loading

TheBlueMatt Jan 7, 2022 •

edited

Loading

codecov-commenter commented Jan 7, 2022 •

edited

Loading

jkczyz Jan 19, 2022 •

edited

Loading

valentinewallace Jan 18, 2022 •

edited

Loading

jkczyz Jan 19, 2022 •

edited

Loading

jkczyz Jan 19, 2022 •

edited

Loading