Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix NeoSystem shutdown order #2842

Merged
merged 2 commits into from
Jan 16, 2023
Merged

fix NeoSystem shutdown order #2842

merged 2 commits into from
Jan 16, 2023

Conversation

ixje
Copy link
Contributor

@ixje ixje commented Jan 11, 2023

fix neo-project/neo-modules#786

What I theorize is happening is that while the plugins are disposed (and they remove their Committing/Committed handlers) in the background the main persist loop continues to add blocks and transactions. The plugins (like app log) no longer process these blocks hence we lose data. By first shutting down the "main loop" we ensure plugins don't miss anything.

@superboyiii
Copy link
Member

Due to my test, some issues still exist.

  1. Throw exception while exit
neo> exit
Unhandled exception. System.InvalidOperationException: Cannot create child while terminating or terminated
   at Akka.Actor.ActorCell.MakeChild(Props props, String name, Boolean async, Boolean systemService)
   at Akka.Actor.ActorCell.AttachChild(Props props, Boolean isSystemService, String name)
   at Akka.Actor.Internal.ActorSystemImpl.SystemActorOf(Props props, String name)
   at Akka.Actor.Inbox.Create(ActorSystem system)
   at Neo.NeoSystem.EnsureStopped(IActorRef actor) in E:\PR-test\neo-PR#2842\neo\src\Neo\NeoSystem.cs:line 206
   at Neo.Plugins.StateService.StatePlugin.Dispose()
   at Neo.NeoSystem.Dispose() in E:\PR-test\neo-PR#2842\neo\src\Neo\NeoSystem.cs:line 171
   at Neo.CLI.MainService.Stop() in E:\PR-test\neo-PR#2842\neo-node\neo-cli\CLI\MainService.cs:line 450
   at Neo.CLI.MainService.OnStop() in E:\PR-test\neo-PR#2842\neo-node\neo-cli\CLI\MainService.cs:line 358
   at Neo.ConsoleService.ConsoleServiceBase.Run(String[] args) in E:\PR-test\neo-PR#2842\neo-node\Neo.ConsoleService\ConsoleServiceBase.cs:line 555
   at Neo.Program.Main(String[] args) in E:\PR-test\neo-PR#2842\neo-node\neo-cli\Program.cs:line 20
  1. Some plugins which need persist data like StateService, still losing data while exit or ctrl + C
    I've chosen some random height to verify state root hash after several exit, they showed the wrong hash which means data missing.
    For example:
    My test node
neo> state root 20000
{"version":0,"index":20000,"roothash":"0xe1989c24e4c87c77ed9f6c9a0c9a87cd4f7fac46fc8ca4108cfa409c630ebe20","witnesses":[]}

mainnet seed node

{
    "jsonrpc": "2.0",
    "id": 1,
    "result": {
        "version": 0,
        "index": 20000,
        "roothash": "0xa776adf30617805e86feccb360e2521cd5eb68cb0a12bc1cfce56109f3940faf",
        "witnesses": []
    }
}

I will take a look at StatesDumper plugin while I finished resyncing full data of mainnet. In previous version, it will surely lose data after being interupted by ctrl + c or exit. See issues here: neo-project/neo-modules#736

@superboyiii
Copy link
Member

As what I predicted, StatesDumper still has this issue:
1673506043809
1673507197104
Each exit will cause data lose in StatesDumper.

@ixje
Copy link
Contributor Author

ixje commented Jan 12, 2023

I've not encountered this shutdown error myself, but somebody else inside coz has so we'll need to investigate further.

@ixje
Copy link
Contributor Author

ixje commented Jan 12, 2023

It seems like plugins cannot call NeoSystem.EnsureStopped after ActorSystem.Dispose()

 public void Dispose()
        {
            EnsureStopped(LocalNode);
            // Dispose will call ActorSystem.Terminate()
            ActorSystem.Dispose();
            ActorSystem.WhenTerminated.Wait();
            foreach (var p in Plugin.Plugins)   <- TooLate
                p.Dispose();
            HeaderCache.Dispose();
            store.Dispose();
        }

Moving it up like shown next resolves that exception.

 public void Dispose()
        {
            EnsureStopped(LocalNode);
            // Dispose will call ActorSystem.Terminate()
            foreach (var p in Plugin.Plugins)   <- OK
                p.Dispose();
            ActorSystem.Dispose();
            ActorSystem.WhenTerminated.Wait();
            HeaderCache.Dispose();
            store.Dispose();
        }

but then blocks still get persisted e.g.

Persisting block 0
App log committing 0
App log committed 0
NEO-CLI v3.5.0  -  NEO v3.5.0  -  NEO-VM v3.5.0

neo> Persisting block 1
App log committing 1
App log committed 1
Persisting block 2
App log committing 2
App log committed 2
Persisting block 3
App log committing 3
App log committed 3
exit
Persisting block 4  <- OH NO (this is printed in Blockchain.Persist())

Process finished with exit code 0.

I can fix this by adding an EnsureStopped(blockchain) like below

 public void Dispose()
        {
            EnsureStopped(LocalNode);
            EnsureStopped(Blockchain);
            // Dispose will call ActorSystem.Terminate()
            ActorSystem.Dispose();
            ActorSystem.WhenTerminated.Wait();
            foreach (var p in Plugin.Plugins)
                p.Dispose();
            HeaderCache.Dispose();
            store.Dispose();
        }

to get

Persisting block 0
App log commiting 0
App log committed 0
NEO-CLI v3.5.0  -  NEO v3.5.0  -  NEO-VM v3.5.0

neo> Persisting block 1
App log commiting 1
App log committed 1
Persisting block 2
App log commiting 2
App log committed 2
exit
Persisting block 3
App log committing 3
App log committed 3
Persisting block 4
App log committing 4
App log committed 4

Process finished with exit code 0.

Now I still have to fix the exception that this causes when syncing from an offline chain.

@ixje
Copy link
Contributor Author

ixje commented Jan 12, 2023

@superboyiii this should be it. The exception during offline chain sync I was seeing was due some change I did to the offline syncing system and does not apply to the default system. Can you help test this please? :)

@superboyiii
Copy link
Member

@superboyiii this should be it. The exception during offline chain sync I was seeing was due some change I did to the offline syncing system and does not apply to the default system. Can you help test this please? :)

Sure. I'm doing that.

Copy link
Member

@superboyiii superboyiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ixje Nice fix! The exception and the inconsistent StateService issue has gone. For StatesDumper, I think it's another issue.

@superboyiii
Copy link
Member

For StatesDumper, I think the issue comes from neo-project/neo-modules#568 (comment)

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix @ixje

@dusmart
Copy link

dusmart commented Jan 15, 2023

Thanks. This fixed my problem, too. #2624

@shargon shargon merged commit ba957ff into neo-project:master Jan 16, 2023
@superboyiii superboyiii mentioned this pull request Feb 2, 2023
@ixje ixje deleted the fix-applog-losing-state branch April 18, 2023 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

applicationlogs get lost on shutdown
5 participants