Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

tests: improve test-child-process-fork-dgram. #8045

Conversation

misterdjules
Copy link

Fixes test-child-process-fork-dgram on the SmartOS jenkins node.

  1. Wait for child process to register event handlers on server UDP socket
    before sending messages from the parent process.

  2. Increase the number of total messages so that both the child and the
    master process get a chance to handle at least one of them.

However, this PR looks like a hack for a test case that could fail at anytime depending on how the underlying platform decides to dispatch the messages to the child or the master.

What are we trying to test with this test case? Do we absolutely need both the child and the parent to always handle UDP messages? Could we remove the UDP 'message' event handler as soon as we receive one dgram in the child or the master, so that the one that did not receive a message yet will be the only one that can receive it? This would guarantee (well, as much as UDP can guarantee anything...) that both processes will receive at least one message.

@misterdjules
Copy link
Author

Also note that 2) (increasing the number of total messages) is enough to fix the issue for now but 1) makes the test case more robust (although it's not sufficient to fix the issue).

@misterdjules
Copy link
Author

@AndreasMadsen Since you're the original author of this test, I hope you don't mind if I put you in the loop so that I can better understand the intent of this test. More precisely, I have the following questions:

What are we trying to test with this test case? Do we absolutely need both the child and the parent to handle UDP messages for the whole duration of the test? Could we remove the UDP 'message' event handler as soon as we receive one dgram in the child or the master, so that the one that did not receive a message yet will be the only one that can receive it?

@AndreasMadsen
Copy link
Member

The purpose of second argument in process.send is to provide a cluster API at a lower level than what the cluster module allows for. The purpose of this test is to ensure at that there is some level of distributed behavior. I would therefor claim that changing the state of the "cluster" configuration (which process can receive messages) in order to make the tests passes, is far from ideal.

Maybe I remember wrong, but I don't think removing the message event will change anything. As far as I can see there is no reference to removeListener or newListener in dgram.js. I also remember that calling server.close() will close the entire server across all workers.

I think the solution here is to increase the amount of messages sent as you have done. But you should also consider changing the send mechanism, so it sends messages with less delay between. A UDP message can easily be read within 1 ms, so the OS may very well chose to only send the messages to one worker, as long as the load isn't any higher.

@misterdjules
Copy link
Author

@AndreasMadsen Thank you for your feedback!

It turns out that if we want to keep the same logic for this test, the only thing that improves its reliability is to increase the number of messages to send. Shortening the delay between messages actually makes things worse on SmartOS.

I don't like solving bugs by changing hard-coded values, but that's all I have right now.

@tjfontaine What do you think?

Send messages until both the parent and the child process have received
at least one message. If at least one of them doesn't receive any
message, the test runner will make the test timeout.

Fixes nodejs#8046.
@tjfontaine
Copy link

Thanks landed in b0277f3

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants