Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove premature call to workSocket() in TNonblockingServer #1476

Conversation

bgedik
Copy link
Contributor

@bgedik bgedik commented Jan 21, 2018

No description provided.

@bgedik bgedik changed the title remove workSocket call that is too early Remove premature call to workSocket() in TNonblockingServer Jan 21, 2018
@bgedik
Copy link
Contributor Author

bgedik commented Jan 21, 2018

The following tests FAILED:
16 - concurrency_test (Timeout)
20 - TNonblockingSSLServerTest (Timeout)

I think the failure in TNonblockingSSLServerTest is related. concurrency_test passes locally for me, so I am guessing that there may be a different problem with it (sporadic failure perhaps?).

@jeking3
Copy link
Contributor

jeking3 commented Jan 22, 2018

Yes, concurrency_test has sporadic failures - it is a lot better than it used to be; seems to be worse on Visual Studio 10. Perhaps the solution here is to handle EAGAIN properly?

@bgedik
Copy link
Contributor Author

bgedik commented Jan 22, 2018

@jeking3 My latest fix worked fine, but failed to compile on windows. I'll find a variation that works for all platforms.

@bgedik
Copy link
Contributor Author

bgedik commented Jan 23, 2018

@jeking3 This is ready for review now.

Copy link
Contributor

@jeking3 jeking3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use peek() which is already well-soaked and should provide the same answer?

@@ -78,6 +78,7 @@ class TSSLSocket : public TSocket {
bool peek();
void open();
void close();
bool hasPendingDataToRead();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this what peek() is for?

Copy link
Contributor Author

@bgedik bgedik Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peek is designed to be blocking for blocking sockets. And for non-blocking ones, it throws an exception if at least 1 byte is not available after a few retries. It may be well-soaked, but not well-designed IMO. It implements MSG_PEEK semantics and not MSG_PEEK | MSG_DONT_WAIT. Not to mention its doxygen being out of touch with its implementation. MSG_PEEK means peak at the data, but do not remove it from the buffer. In order to peek, it waits for the data to arrive. That is very different than what hasPendingDataToRead does.

I can replace the existing peek() with this new implementation if you prefer that. However, I don't know if there will be any implications. It will certainly be a behavior change and my guess is that it will break things. I can test and see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to have one way to ask, "is there any data I can read" that does not block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the cross test suite currently does not test asynchronous. It is more of a protocol test; it would be interesting to add one more matrix choice of threaded vs. nonblocking for the languages that support it, and test both against both.

Copy link
Contributor Author

@bgedik bgedik Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeking3 You said it would be preferable to have one way to ask, "is there any data I can read" that does not block. I completely agree. But the existing peek method is not designed to be non-blocking. So I have a few choices here:

  • Update the doxygen comment of peek to reflect what it does and keep hasPendingDataToRead
  • Add an optional argument to peek that says nonBlocking=false and if it is provided as true, do what hasPendingDataToRead used to do and remove hasPendingDataToRead.
  • Change peek to be always non-blocking.

How about #2 above? I think #3 is not easy, as it will require code changes in other places that I am not comfortable with.

Copy link
Contributor Author

@bgedik bgedik Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented option #2. In terms of your matrix choice suggestion: How can we make that happen? Is that something I can help with? I am bothered by the TNonblockingSeverTest test not catching the problem reported in this bug. The moment we switched to 0.11, it started throwing unexpected exceptions for us, when using Java client against C++ TNonblockingSeverTest server. This should have been caught by the existing tests. I would like to help with this. So if you have a suggestion for me, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comments were wrong; peek blocks until there is something to do when the socket is a blocking socket. I think that if the socket knew it was non-blocking then a call to peek() would behave like a call to hasPendingDataToRead, i.e. it would be a non-blocking call. Perhaps that's a better way to approach it, but I would have to look at the code a little more closely. I think the original intention of my comment was correct, there should be only one way to peek. Calling peek() on a non-blocking socket should not block; calling peek() on a blocking socket should block.

Copy link
Contributor Author

@bgedik bgedik Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies to read. This goes back to my original point. TSocket should behave differently for blocking vs non-blocking sockets. Look at how TSocket and TSSLSocket handle reads on a non-blocking socket. They throw exceptions on EAGAIN. Luckily for those of us using TNonBlockingServer without SSL, it does not ever call read on a non-SSL socket that is not ready to return data. However, that is not the case for SSL (as there could be bytes in the socket but not enough to yield application level bytes). In such a case, the TSSLSocket throws an exception. Read calls in TNonBlockingServer are retried by catching the exception from TSSLSocket and searching for the "retry" string in the exception message. Uggh!

The correct way to fix all this mess is to make TSocket aware of whether the socket is blocking or non-blocking, and behave accordingly. For instance, it should return that 0 bytes were read when the socket is non-blocking and EAGAIN is received.

My suggesting is to handle the issue of making TSocket and TSSLSocket aware of the blocking/non-blocking nature of the socket in a separate Jira issue.

hasPendingDataToRead is a method that is non-blocking for both blocking and non-blocking sockets and is a valid operation on any type of socket. It corresponds to the MSG_PEEK | MSG_DONT_WAIT combination on a recv call, which behaves the same for blocking and non-blocking sockets. I think it has value independent of seek.

@jeking3
Copy link
Contributor

jeking3 commented Jan 24, 2018

I see now, I may have led you down the wrong path with my comments. So peek() blocks waiting for data the be available to read and the end result of peek is that there is data to read OR the socket disconnected. What you are looking for is a call to see if the socket has data available but does not block. As such, those are different code paths...

Perhaps your original code before this last push was correct. I will take another look.

}
initializeHandshake();
if (!checkHandshake())
throw TSSLException("SSL_peek: Handshake is not completed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably say something other than SSL_peek?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if (!checkHandshake())
throw TSSLException("SSL_peek: Handshake is not completed");
// data may be available in SSL buffers (note: SSL_pending does not have a failure mode)
return TSocket::hasPendingDataToRead() || SSL_pending(ssl_) > 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should SSL_pending be checked first for efficiency? First check the SSL buffers, then if those are clear then check the socket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Nice catch.

@@ -78,6 +78,7 @@ class TSSLSocket : public TSocket {
bool peek();
void open();
void close();
bool hasPendingDataToRead();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comments were wrong; peek blocks until there is something to do when the socket is a blocking socket. I think that if the socket knew it was non-blocking then a call to peek() would behave like a call to hasPendingDataToRead, i.e. it would be a non-blocking call. Perhaps that's a better way to approach it, but I would have to look at the code a little more closely. I think the original intention of my comment was correct, there should be only one way to peek. Calling peek() on a non-blocking socket should not block; calling peek() on a blocking socket should block.

@bgedik
Copy link
Contributor Author

bgedik commented Jan 24, 2018

Ok, reverted back.
@jeking3 Did you have the time to take about look? How do we proceed on this?

@bgedik
Copy link
Contributor Author

bgedik commented Feb 9, 2018

@jeking3 anything else I can do to get this into a mergable state?

@jeking3
Copy link
Contributor

jeking3 commented Feb 19, 2018

Squash to one commit, please.

@bgedik bgedik closed this Feb 19, 2018
@bgedik bgedik reopened this Feb 19, 2018
@bgedik bgedik closed this Feb 19, 2018
@bgedik
Copy link
Contributor Author

bgedik commented Feb 19, 2018

@jeking3 Here: #1497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants