-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZnSeasideGemServer crashes due to socket becoming nil after a Seaside error handler caught an exception #1297
Comments
first observation ... I reproduced the bug and in my case the stack overflow was "immediately" preceded by a Transcript show of an MNU:
The odd thing is that a continuation was not recorded so I would have expected the gem to fail with such an "unhandled exception" ... it is also likely that error handlers on the stack would dispose of sockets to prevent random stuff going out over the wire ... so at this point I'm going to say that the root cause of the problem is the "unhandled MNU" and that is what I think we need to understand ... The good news is that it is reproducing for on my local machine, so we have a chance to track this puppy down ... |
... the Prior to the MNU, there was this:
Now I don't think the WAGemStoneProductionErrorHandler is expected to shut down the system, but I'm not sure whether the server is expected to survive a WAGsInvalidCallbackContext ... Okay I debugged this continuation (using
The code that get's executed in
Which spells doom from the server, because this didn't cause the server to immediately die ... Sooooo, now it seems that the Hmmm, it looks like Headed to dinner now, but there's a bit more meat to find on this bone ... |
I think that catastrophic failure in this case is probably expected presumably there's something in 3.6.1 (and beyond?) that is choking and resulting in the WAGsInvalidCallbackContext ... debugging that particular error is next up ... If you can determine which test is leading to this ... Presumably the functional tests can be run interactively, so I'll head down that route and once we've isolated the particular test, we should be able to get to the bottom of the problem ... |
Well... But indeed! If I run the test interactively, after about a minute or so, the server dies. Since the problem occurred with the inclusion of the WACORSFunctionalTest, and that this test is starting and stopping gems to perform the test, I was focused that it had something to do with that... thanks for spotting that and redirecting the search effort. I'll continue down this path and see what's going on. |
The error does get handled by Seaside (
So, this last error went unnoticed and prevents the socket from being destroyed after handling the Edit: the
|
I have no automated test for it yet but |
Taking a slightly different tack ... I see that the code in ZnGemServerManagingMultiThreadedServer>>serveConnectionsOn: does not have an ifCurtailed: so I'm wondering if using ZnGemServerManagingMultiThreadedServer would be a better choice? That would entail using ZnNewGemServer or ZnSeasideNewGemServer instead of ZnGemServer/ZnSeasideGemServer ... but sets of GemServers are tested, so the New variants might just behave when errors are occurring ... In general, a crashing server (backed by a server restart) is a "good thing" since the transactional nature of the system means that the user is expected to retry their request when a truly fatal error occurs ... So for tests that are supposed to create errors, the new ZnNewGemServer may behave better from a testing perspective ... |
My problem is that I fail to see why the ifCurtailed block is executed. It should not because the error has been captured by the Seaside error handler and an appropriate response was produced. |
And with the following scenario, I start thinking something is still wrong with the workaround code made in the context of #1198 In the Seaside
If you do not click on the 'Restart' button again (i.e. not triggering a restore of that continuation on the server), everything is fine. For example, if you go to the Seaside start page again instead of going 'back' after getting the error response: all is well. |
Hmmm, perhaps the socket instance variable is part of the continuation? if so the restored continuation would include the niled socket var? It is possible that the persistent behavior of Sockets has changed in 3.6.1 making them commitable ... Hmm, hmmm, I think that if server state IS being persisted then the ZnTransactionSafeManagingMultiThreadedServer should be used ... it takes great care to wrap all objects on the stack with TransientValue wrappers so the stack can be persisted (as part of the continuation) ... although this was done to be able to simply save a continuation safely ... Restoring a continuation on the stack that has references to a Socket will never work ... so that depends upon where the continuation boundaries are located ... When I look at continuations in |
Another scenario that does not involve the
|
@dalehenrich I am unable to reconstruct the issue when using Unfortunately, I don't feel at ease just changing to that version for the tests because I fail to understand why the |
Confirmed that its working but I keep this issue open for a little more investigation. |
@jbrichau ...
I think that the ifCurtailed: can probably be fixed ... don't know if there are other potential issues lurking in ZnSeasideGemServer :) I created ZnSeasideNewGemServer for a reason, but without looking at the diffs between the two implementations I don't recall what those reasons were ... this commit comment is interesting though: add ZnNewGsServerTests for testing ZnNewGemServer ... these tests can be run against an interactive gem server running in second tODE environmnent ... 19 run, 9 passes, 0 expected defects, 10 failures, 0 errors, 0 unexpected passes .... so blatant server-side errors have been cleaned up, as are the commit comments on this issue create GsApplicationTools v1.0.0 compatible gem server #69 ... ZnGemServer was a year old at that time (2015), so I was probably reluctant to wholesale replace ZnGemServer with ZnNewGemServer, since folks may have been using the class for awhile and been happy with it ... |
... and I guess people have been using ZnGemServer for 6 years now and it took that long for this particular issue to pop up??? |
@dalehenrich I'm quite surprised by that yes. I am not using the ZnGemServer myself (using FastCGI). Keeping this open to investigate the issue a bit more later on since I really want to try to fully grasp what's going on, but I will tend to more urgent matters first. Thanks for the feedback! |
Running the tests in
WAWebDriverFunctionalTestCase
including thetestCORSFilterFunctionalTest
on GemStone 3.6.x is crashing the ZnServer with a stack overflow when handling the exception that the socket is nil.Running all the tests with the
testCORSFilterFunctionalTest
removed, does not crash the ZnServer.Since this test restarts all gems when it starts and stops, the cause can probably be found there. However, I fail to understand why the gems restarts in that test cause the socket to be nil at some later point, why it does not crash on other versions of GemStone, and why that also leads to a loop in exception handling. A lot of questions...
The text was updated successfully, but these errors were encountered: