Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to decode onion error message from eclair to lnd #363

Closed
Roasbeef opened this issue Jan 8, 2018 · 13 comments
Closed

Unable to decode onion error message from eclair to lnd #363

Roasbeef opened this issue Jan 8, 2018 · 13 comments
Assignees
Labels

Comments

@Roasbeef
Copy link

Roasbeef commented Jan 8, 2018

Lately, when I've been trying to send to/through starblocks on some of my nodes, my HTLC's keep getting rejected. I haven't been able to determine why exactly, as I can't decode the error message sent by eclair:

2018-01-08 03:31:50.032 [DBG] PEER: Received UpdateFailHTLC(chan_id=04c37f4471d3e994c1bb455b9cb976959149e0b9bda1a0d377a2df14574c2e36, id=146, reason=06345bbb0e4e0d441abbd994c3713ee43c409b1856b9727b3b518f123d4b5eaf3f0272ddf0b44355951d71bc1fc61b0fbd6a1ba9cc5ab1591f1319f1230d244f61537ea925115271f2aac7b92298b35b7cc88cca6581ed44596cc58196e7111e53647863986acee00e0bd007335c373d82e6544476cdd44b7b0a0b1152d03aad40abf562e235ba7770c55244198e35640be0bed4ca91d8f472c39bdc820616b01a487e71ab7d858a63b9d1ed4f2a12715e1e488769b4cb055f6da10bf19ad3e46644cb6ad0572e140657516ec32ac7cc40b1b969773e16fcae505e946f249b3f3095abf810790bdb5720c576ce433917b76ed70d1b046a09741f3b032d7da4ce877ecdb2939891d56896a1381cafacf276a4bbe0209880d901ed98fc0520bada9225261c) from 
2018-01-08 03:31:50.043 [DBG] PEER: Received CommitSig(chan_id=04c37f4471d3e994c1bb455b9cb976959149e0b9bda1a0d377a2df14574c2e36, num_htlcs=0) from 
2018-01-08 03:31:50.050 [DBG] PEER: Sending RevokeAndAck(chan_id=04c37f4471d3e994c1bb455b9cb976959149e0b9bda1a0d377a2df14574c2e36, rev=4564ef7c7de66b5348820f993ab18cc21d4e11c1a8327518ccd4a596fe484e87, next_point=037346b4c008d039b46a2f577d4833a80133d91a29af1e03ae62e0fb191b571126) to 2018-01-08 03:31:50.059 [DBG] PEER: Sending CommitSig(chan_id=04c37f4471d3e994c1bb455b9cb976959149e0b9bda1a0d377a2df14574c2e36, num_htlcs=0) to 
2018-01-08 03:31:50.162 [DBG] PEER: Received RevokeAndAck(chan_id=04c37f4471d3e994c1bb455b9cb976959149e0b9bda1a0d377a2df14574c2e36, rev=379f27413b8a5154a4731a44a2d261368ce69885c363c4032e0a1af5bb6ae2f3, next_point=037b81d3eceb4fdfd3fca0fb8dad244ce1e670ee339266836a548ca5868221c5ab) from 
2018-01-08 03:31:50.165 [DBG] HSWC: ChannelPoint(372e4c5714dfa277d3a0a1bdb9e049919576b99c5b45bbc194e9d371447fc304:1) forwarding 1 HTLC's
2018-01-08 03:31:50.165 [DBG] HSWC: Closed completed onion circuit for 7740f2c4715c705ab2d0211d06ae67a37fa0e95338e8f73d7f424fb13ab86604: (0:0:0, 220) <-> (1257786:147:1, 146)
2018-01-08 03:31:50.168 [ERR] HSWC: unable to de-obfuscate onion failure, htlc with hash(7740f2c4715c705ab2d0211d06ae67a37fa0e95338e8f73d7f424fb13ab86604): unable to decode error update (type=*lnwire.FailTemporaryChannelFailure, len_bytes=130, bytes=1007b42a4030b07a3f456093a3020b980c1f54a7b15d663d122360823091945ac55d6941f4e3c0bccc2339020993008ac4024c6af59d9a4a4074b3f12fb1fae93a1543497fd7f826957108f4a30fd9cec3aeba79972084e90ead01ea3309000000001329ca00006200005a526d580001009000000000000027100000271000000064): unexpected EOF

lnd determines it's a FailTemporaryChannelFailure, then tries to decode the message:

1007b42a4030b07a3f456093a3020b980c1f54a7b15d663d122360823091945ac55d6941f4e3c0bccc2339020993008ac4024c6af59d9a4a4074b3f12fb1fae93a1543497fd7f826957108f4a30fd9cec3aeba79972084e90ead01ea3309000000001329ca00006200005a526d580001009000000000000027100000271000000064

Are y'all able to decode this?

@Roasbeef
Copy link
Author

Roasbeef commented Jan 8, 2018

Hmm, actually if I just try to decode the raw error (instead of along with its framing), I'm able to properly decode it:

len:  4103
(*lnwire.FailTemporaryChannelFailure)(0xc42000e0b8)(TemporaryChannelFailure(update=(*lnwire.ChannelUpdate)(0xc420082370)({
 Signature: (*btcec.Signature)(0xc4200107b0)({
  R: (*big.Int)(0xc42000d060)(81490963346137529563603176940925655100903038764901586889101343776863355061597),
  S: (*big.Int)(0xc42000d080)(47609384326767702852395815807248248468603241552974167823125523079358952585749)
 }),
 ChainHash: (chainhash.Hash) (len=32 cap=32) 000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943,
 ShortChannelID: (lnwire.ShortChannelID) 1255882:98:0,
 Timestamp: (uint32) 1515351384,
 Flags: (lnwire.ChanUpdateFlag) 1,
 TimeLockDelta: (uint16) 144,
 HtlcMinimumMsat: (lnwire.MilliSatoshi) 10000 mSAT,
 BaseFee: (uint32) 10000,
 FeeRate: (uint32) 100
})
))

The odd thing here still though, is that the prepended length of the update was 4103 bytes, when the update itself is just 130 bytes.

@sstone sstone self-assigned this Jan 10, 2018
@sstone
Copy link
Member

sstone commented Jan 10, 2018

We're missing the len field in our error message: we send 1007 | channelUpdate instead of 1007 | len | channelUpdate so the first 2 bytes you interpret as length are actually the first 2 bytes of the channel update's signature field.

The correct encoding should be:

10070080b42a4030b07a3f456093a3020b980c1f54a7b15d663d122360823091945ac55d6941f4e3c0bccc2339020993008ac4024c6af59d9a4a4074b3f12fb1fae93a1543497fd7f826957108f4a30fd9cec3aeba79972084e90ead01ea3309000000001329ca00006200005a526d580001009000000000000027100000271000000064

@sstone sstone added the bug label Jan 10, 2018
sstone added a commit that referenced this issue Jan 10, 2018
When a ailure message includes a channel update field, this field must be encoded as:
[ len | channel update]
And we did not include the len field.
@sstone
Copy link
Member

sstone commented Jan 10, 2018

Ok, with a correct encoding lnd crashes when it receives the error (see log below).
The parsed temporary failure message looks good (i.e. matches what was sent).
I'm using lnd at lightningnetwork/lnd@beeb75c

2018-01-10 16:40:56.082 [ERR] CRTR: Attempt to send payment abadffbc5695a4dc400a5a227b44da8231b7ee2b6f6bd9340b49fd8eb0eb3577 failed: TemporaryChannelFailure(update=(*lnwire.ChannelUpdate)(0xc420272dc0)({
 Signature: (*btcec.Signature)(0xc420678470)({
  R: (*big.Int)(0xc4205374a0)(49491959598261878261056428276442350416278241260864900079909248898946334365505),
  S: (*big.Int)(0xc4205374c0)(13460190816797932915642861930206119487212107405213384283682323122815355044768)
 }),
 ChainHash: (chainhash.Hash) (len=32 cap=32) 0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206,
 ShortChannelID: (lnwire.ShortChannelID) 592:1:0,
 Timestamp: (uint32) 1515598737,
 Flags: (lnwire.ChanUpdateFlag) 0,
 TimeLockDelta: (uint16) 144,
 HtlcMinimumMsat: (lnwire.MilliSatoshi) 1000 mSAT,
 BaseFee: (uint32) 10000,
 FeeRate: (uint32) 100
})
)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xace5cb]

goroutine 244 [running]:
github.com/lightningnetwork/lnd/routing.(*Route).nextHopChannel(0xc4205c9310, 0xc420471f20, 0x0, 0x0)
	/home/fabrice/go/src/github.com/lightningnetwork/lnd/routing/pathfind.go:154 +0x8b
github.com/lightningnetwork/lnd/routing.(*ChannelRouter).SendPayment(0xc420214210, 0xc420233c00, 0x0, 0x0, 0x0, 0x0, 0xc420555796, 0xc420555790, 0x0)
	/home/fabrice/go/src/github.com/lightningnetwork/lnd/routing/router.go:1651 +0xa52
main.(*rpcServer).SendPayment.func3(0xc42069a960, 0xc4205cb960, 0xc420232d40, 0xc4205cba20, 0xc42013cba0, 0x143f860, 0xc42014ca30, 0xc42069a900)
	/home/fabrice/go/src/github.com/lightningnetwork/lnd/rpcserver.go:1802 +0x14b
created by main.(*rpcServer).SendPayment
	/home/fabrice/go/src/github.com/lightningnetwork/lnd/rpcserver.go:1831 +0x5ad

@Roasbeef
Copy link
Author

If you run that test again with lnd on tracelevel logging (--debuglevel=CRTR=trace) for the router, can you paste the log dump? Thanks!

@Roasbeef
Copy link
Author

Roasbeef commented Jan 10, 2018

Working on a fix on our end. One question: this was triggered as we detected that the destination sent a temp chan failure, but there's an assumption in the code atm, that only any of the intermediate nodes will ever send that error (so we then try to prune that outgoing channel).

In this case the dest in starblocks, so why's it sending a temp chan failure if the HTLC reached it?

@Roasbeef
Copy link
Author

Or is it failing because it has too many incoming/accepted HTLC's already?

case (_: TooManyAcceptedHtlcs, Some(channelUpdate)) => TemporaryChannelFailure(channelUpdate)

@Roasbeef
Copy link
Author

Closed by mistake...

@Roasbeef Roasbeef reopened this Jan 10, 2018
@pm47
Copy link
Member

pm47 commented Jan 10, 2018

The failure is sent by endurance, not starblocks.

Reason is that the channel is depleted (there are actually other non-depleted channels between endurance and starblocks)

@pm47
Copy link
Member

pm47 commented Jan 10, 2018

I'm curious why you think that the destination sent the error?

@Roasbeef
Copy link
Author

I'm curious why you think that the destination sent the error?

That was just a hunch without the full logs.

Reason is that the channel is depleted (there are actually other non-depleted channels between endurance and starblocks)

If multiple channels are active, and an incoming HTLC request comes along, eclair will not just go ahead an utilize the available channel even if it wasn't the one specified in the onion payload?

@pm47
Copy link
Member

pm47 commented Jan 10, 2018

If multiple channels are active, and an incoming HTLC request comes along, eclair will not just go ahead an utilize the available channel even if it wasn't the one specified in the onion payload?

There is a TODO for that in the relayer ;-)

@Roasbeef
Copy link
Author

I'm curious why you think that the destination sent the error?

Looked into it a bit more, and the crash above happend as we detected that the destination sent the error. I say this as the only way that the map lookup that led to the crash can fail, is if the destination sent the error. Still investigating on my side, but adding a patch to address this edge case for now.

@sstone
Copy link
Member

sstone commented Jan 11, 2018

Yes, the crash above happens when it is the destination sends the failure message, not when it's a relaying a node. I've run the following tests locally:
eclair 1 -- eclair 2 -- lnd, eclair 1 fails the HTLC with a TemporaryChannelFailure => lnd crashes
eclair 1 -- eclair 2 -- lnd, eclair 2 fails the HTLC with a TemporaryChannelFailure => lnd logs the error and remains functional
In both cases the failure message is logged properly

@pm47 pm47 closed this as completed in 065de8b Jan 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants