cmd/dcrdex: Unable to stop app after shutting down dcrd node. #321

JoeGruffins · 2020-05-01T12:07:10Z

If you shut down your dcrd node/asset before shutting down the server, the server hangs on shutdown. The logs stop here:

^C2020-05-01 21:03:08.847 [INF] MAIN: Received signal (interrupt). Shutting down...
2020-05-01 21:03:08.847 [INF] MAIN: Stopping DEX...
2020-05-01 21:03:08.847 [INF] DEX: Stopping subsystems...
2020-05-01 21:03:08.847 [INF] COMM: RPC server shutting down...
2020-05-01 21:03:08.847 [DBG] COMM: RPC listener done for 127.0.0.1:7232
2020-05-01 21:03:08.847 [INF] COMM: RPC server shutdown complete
2020-05-01 21:03:08.847 [INF] DEX: Comms Server shutdown.
2020-05-01 21:03:08.847 [INF] DEX: BookRouter shutdown.
2020-05-01 21:03:08.847 [DBG] MKT: Market "dcr_btc" stopped.
2020-05-01 21:03:08.847 [INF] DEX: Market[dcr_btc] shutdown.
2020-05-01 21:03:08.847 [INF] DEX: Swapper shutdown.
2020-05-01 21:03:08.847 [INF] DEX: Auth manager shutdown.

Starting the node back up does not seem to help.

The text was updated successfully, but these errors were encountered:

buck54321 · 2020-05-01T12:40:30Z

I've seen this too. Probably start looking around

dcrdex/client/asset/btc/btc.go

Lines 346 to 347 in a83ce99

    
           btc.run(ctx) 
        
           err := btc.wallet.LockUnspent(true, nil)

@JoeGruffins you want to investigate?

JoeGruffins · 2020-05-01T14:28:22Z

sure!

chappjc · 2020-05-01T14:29:07Z

Part of updating our dcrd/rpcclient dependency will be using the new context.Context input arg. That might be related if it's an RPC that hangs. That won't help with btc though since I don't think it's rpcclient is updated.

JoeGruffins · 2020-05-02T10:35:13Z

I have tracked this down as being a function of our rpcclient.

The dcrd backend polls for best blocks:

dcrdex/server/asset/dcr/dcr.go

Line 451 in 6778508

bestHash, err := dcr.node.GetBestBlockHash()

When there is no connection, dcrd does not return an error when auto-reconnect is on. It saves all requests to fire them all at once when a reconnect happens. And so it stops there when not connected and never makes to the break out

This issue can be solved by turning off autoreconnect here:

dcrdex/server/asset/dcr/dcr.go

Lines 768 to 774 in 6778508

    
           config := &rpcclient.ConnConfig{ 
        
           	Host:         host, 
        
           	Endpoint:     "ws", // websocket 
        
           	User:         user, 
        
           	Pass:         pass, 
        
           	Certificates: dcrdCerts, 
        
           }

by adding the line DisableAutoReconnect: true,

We "solved" a similar issue in dcrstakpool by writing a reconnect function. jrick also has a convenient package https://github.com/jrick/wsrpc that I think simplifies things. However, we would still need to handle reconnects manually.

Also, the client does reconnect and shutdown continues with auto-reconnect on when using testnet. For some reason it does not seem to work the same when using the simnet harness.

Part of updating our dcrd/rpcclient dependency will be using the new context.Context input arg.

Or maybe this? Not sure what it is yet, but passing a context and having the client stop waiting on ctx.Done() would also solve the issue.

chappjc · 2020-05-04T16:55:00Z

PR #325 looks good to fix the hang, but let's open an issue or two for: (1) running with auto-reconnect disabled to prevent RPC calls from simply hanging, and (2) using the newer rpcclient API that takes Contexts, which would solve this another way and give us a way to set timeouts on any given RPC call. I'm not certain, but it seems like setting timeouts on RPC calls might allow us to keep auto-reconnect.

chappjc · 2020-05-04T16:56:42Z

@JoeGruffins Assuming we stick with rpcclient, could you investigate the two approaches I named? What are the pros/cons of autoreconnect, and does the Context-enabled rpcclient API effectively fix the issues without requiring manual reconnect code?

JoeGruffins · 2020-05-05T01:16:51Z

Is the rpcclient that uses a context dcrd/rpcclient/v6 ? If so I don't see where it has changed to use a context throughout.

chappjc · 2020-05-05T14:13:39Z

Is the rpcclient that uses a context dcrd/rpcclient/v6 ? If so I don't see where it has changed to use a context throughout.

There's a context.Context input on just about all the Client methods, at least on master. e.g.:

https://github.com/decred/dcrd/blob/ce2195fbc3de0ee60ebaebfe7a967f0d8b041498/rpcclient/chain.go#L56-L60

// GetBestBlockHash returns the hash of the best block in the longest block
// chain.
func (c *Client) GetBestBlockHash(ctx context.Context) (*chainhash.Hash, error) {
	return c.GetBestBlockHashAsync(ctx).Receive()
}

Currently that's rpcclient/v6.

JoeGruffins · 2020-05-06T00:29:14Z

It looks like that context doesn't concern websocket connections, only http.

https://github.com/decred/dcrd/blob/ce2195fbc3de0ee60ebaebfe7a967f0d8b041498/rpcclient/infrastructure.go#L878-L910

Will test out v6 anyway to see if canceling that ctx has the desired effect for us.

chappjc · 2020-05-06T00:32:11Z

But https://github.com/decred/dcrd/blob/ce2195fbc3de0ee60ebaebfe7a967f0d8b041498/rpcclient/infrastructure.go#L1403

JoeGruffins · 2020-05-06T00:42:32Z

In https://github.com/decred/dcrd/blob/ce2195fbc3de0ee60ebaebfe7a967f0d8b041498/rpcclient/infrastructure.go#L668 it looks like that ctx is ignored until a reconnect is achieved.

here: https://github.com/decred/dcrd/blob/ce2195fbc3de0ee60ebaebfe7a967f0d8b041498/rpcclient/infrastructure.go#L727

chappjc · 2020-05-06T01:51:10Z

Ohhhh, that's unfortunate. Seems like if the context is cancelled then it should cease reconnection attempts.

JoeGruffins · 2020-05-27T10:04:31Z

rpcserver/v6 should now be able to handle this better. decred/dcrd#2198 I guess closing this issue will depend on v6 being finalized, and us using it.

JoeGruffins mentioned this issue May 3, 2020

asset/dcr: Shut down the RPC client in a goroutine #325

Merged

chappjc closed this as completed in #325 May 5, 2020

chappjc reopened this May 6, 2020

JoeGruffins mentioned this issue Jun 1, 2020

stalled dcrdex shutdown #431

Closed

chappjc closed this as completed Apr 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/dcrdex: Unable to stop app after shutting down dcrd node. #321

cmd/dcrdex: Unable to stop app after shutting down dcrd node. #321

JoeGruffins commented May 1, 2020

buck54321 commented May 1, 2020

JoeGruffins commented May 1, 2020

chappjc commented May 1, 2020

JoeGruffins commented May 2, 2020 •

edited

Loading

chappjc commented May 4, 2020 •

edited

Loading

chappjc commented May 4, 2020

JoeGruffins commented May 5, 2020

chappjc commented May 5, 2020 •

edited

Loading

JoeGruffins commented May 6, 2020

chappjc commented May 6, 2020

JoeGruffins commented May 6, 2020 •

edited

Loading

chappjc commented May 6, 2020

JoeGruffins commented May 27, 2020

cmd/dcrdex: Unable to stop app after shutting down dcrd node. #321

cmd/dcrdex: Unable to stop app after shutting down dcrd node. #321

Comments

JoeGruffins commented May 1, 2020

buck54321 commented May 1, 2020

JoeGruffins commented May 1, 2020

chappjc commented May 1, 2020

JoeGruffins commented May 2, 2020 • edited Loading

chappjc commented May 4, 2020 • edited Loading

chappjc commented May 4, 2020

JoeGruffins commented May 5, 2020

chappjc commented May 5, 2020 • edited Loading

JoeGruffins commented May 6, 2020

chappjc commented May 6, 2020

JoeGruffins commented May 6, 2020 • edited Loading

chappjc commented May 6, 2020

JoeGruffins commented May 27, 2020

JoeGruffins commented May 2, 2020 •

edited

Loading

chappjc commented May 4, 2020 •

edited

Loading

chappjc commented May 5, 2020 •

edited

Loading

JoeGruffins commented May 6, 2020 •

edited

Loading