Remove premature call to workSocket() in TNonblockingServer #1476

bgedik · 2018-01-21T17:45:55Z

No description provided.

bgedik · 2018-01-21T18:51:53Z

The following tests FAILED:
16 - concurrency_test (Timeout)
20 - TNonblockingSSLServerTest (Timeout)

I think the failure in TNonblockingSSLServerTest is related. concurrency_test passes locally for me, so I am guessing that there may be a different problem with it (sporadic failure perhaps?).

jeking3 · 2018-01-22T13:19:40Z

Yes, concurrency_test has sporadic failures - it is a lot better than it used to be; seems to be worse on Visual Studio 10. Perhaps the solution here is to handle EAGAIN properly?

bgedik · 2018-01-22T20:21:03Z

@jeking3 My latest fix worked fine, but failed to compile on windows. I'll find a variation that works for all platforms.

bgedik · 2018-01-23T02:13:10Z

@jeking3 This is ready for review now.

jeking3

Why not use peek() which is already well-soaked and should provide the same answer?

jeking3 · 2018-01-24T00:01:20Z

lib/cpp/src/thrift/transport/TSSLSocket.h

@@ -78,6 +78,7 @@ class TSSLSocket : public TSocket {
  bool peek();
  void open();
  void close();
+  bool hasPendingDataToRead();


Isn't this what peek() is for?

Peek is designed to be blocking for blocking sockets. And for non-blocking ones, it throws an exception if at least 1 byte is not available after a few retries. It may be well-soaked, but not well-designed IMO. It implements MSG_PEEK semantics and not MSG_PEEK | MSG_DONT_WAIT. Not to mention its doxygen being out of touch with its implementation. MSG_PEEK means peak at the data, but do not remove it from the buffer. In order to peek, it waits for the data to arrive. That is very different than what hasPendingDataToRead does.

I can replace the existing peek() with this new implementation if you prefer that. However, I don't know if there will be any implications. It will certainly be a behavior change and my guess is that it will break things. I can test and see.

It would be preferable to have one way to ask, "is there any data I can read" that does not block.

Note that the cross test suite currently does not test asynchronous. It is more of a protocol test; it would be interesting to add one more matrix choice of threaded vs. nonblocking for the languages that support it, and test both against both.

@jeking3 You said it would be preferable to have one way to ask, "is there any data I can read" that does not block. I completely agree. But the existing peek method is not designed to be non-blocking. So I have a few choices here:

Update the doxygen comment of peek to reflect what it does and keep hasPendingDataToRead

Add an optional argument to peek that says nonBlocking=false and if it is provided as true, do what hasPendingDataToRead used to do and remove hasPendingDataToRead.

Change peek to be always non-blocking.

How about #2 above? I think #3 is not easy, as it will require code changes in other places that I am not comfortable with.

I implemented option #2. In terms of your matrix choice suggestion: How can we make that happen? Is that something I can help with? I am bothered by the TNonblockingSeverTest test not catching the problem reported in this bug. The moment we switched to 0.11, it started throwing unexpected exceptions for us, when using Java client against C++ TNonblockingSeverTest server. This should have been caught by the existing tests. I would like to help with this. So if you have a suggestion for me, please let me know.

I think my comments were wrong; peek blocks until there is something to do when the socket is a blocking socket. I think that if the socket knew it was non-blocking then a call to peek() would behave like a call to hasPendingDataToRead, i.e. it would be a non-blocking call. Perhaps that's a better way to approach it, but I would have to look at the code a little more closely. I think the original intention of my comment was correct, there should be only one way to peek. Calling peek() on a non-blocking socket should not block; calling peek() on a blocking socket should block.

Same applies to read. This goes back to my original point. TSocket should behave differently for blocking vs non-blocking sockets. Look at how TSocket and TSSLSocket handle reads on a non-blocking socket. They throw exceptions on EAGAIN. Luckily for those of us using TNonBlockingServer without SSL, it does not ever call read on a non-SSL socket that is not ready to return data. However, that is not the case for SSL (as there could be bytes in the socket but not enough to yield application level bytes). In such a case, the TSSLSocket throws an exception. Read calls in TNonBlockingServer are retried by catching the exception from TSSLSocket and searching for the "retry" string in the exception message. Uggh!

The correct way to fix all this mess is to make TSocket aware of whether the socket is blocking or non-blocking, and behave accordingly. For instance, it should return that 0 bytes were read when the socket is non-blocking and EAGAIN is received.

My suggesting is to handle the issue of making TSocket and TSSLSocket aware of the blocking/non-blocking nature of the socket in a separate Jira issue.

hasPendingDataToRead is a method that is non-blocking for both blocking and non-blocking sockets and is a valid operation on any type of socket. It corresponds to the MSG_PEEK | MSG_DONT_WAIT combination on a recv call, which behaves the same for blocking and non-blocking sockets. I think it has value independent of seek.

jeking3 · 2018-01-24T20:04:10Z

I see now, I may have led you down the wrong path with my comments. So peek() blocks waiting for data the be available to read and the end result of peek is that there is data to read OR the socket disconnected. What you are looking for is a call to see if the socket has data available but does not block. As such, those are different code paths...

Perhaps your original code before this last push was correct. I will take another look.

jeking3 · 2018-01-24T20:06:46Z

lib/cpp/src/thrift/transport/TSSLSocket.cpp

+  }
+  initializeHandshake();
+  if (!checkHandshake())
+    throw TSSLException("SSL_peek: Handshake is not completed");


This should probably say something other than SSL_peek?

jeking3 · 2018-01-24T20:07:40Z

lib/cpp/src/thrift/transport/TSSLSocket.cpp

+  if (!checkHandshake())
+    throw TSSLException("SSL_peek: Handshake is not completed");
+  // data may be available in SSL buffers (note: SSL_pending does not have a failure mode)
+  return TSocket::hasPendingDataToRead() || SSL_pending(ssl_) > 0;


Should SSL_pending be checked first for efficiency? First check the SSL buffers, then if those are clear then check the socket.

Right. Nice catch.

jeking3 · 2018-01-24T20:11:19Z

lib/cpp/src/thrift/transport/TSSLSocket.h

@@ -78,6 +78,7 @@ class TSSLSocket : public TSocket {
  bool peek();
  void open();
  void close();
+  bool hasPendingDataToRead();


I think my comments were wrong; peek blocks until there is something to do when the socket is a blocking socket. I think that if the socket knew it was non-blocking then a call to peek() would behave like a call to hasPendingDataToRead, i.e. it would be a non-blocking call. Perhaps that's a better way to approach it, but I would have to look at the code a little more closely. I think the original intention of my comment was correct, there should be only one way to peek. Calling peek() on a non-blocking socket should not block; calling peek() on a blocking socket should block.

This reverts commit 501c440.

bgedik · 2018-01-24T20:48:06Z

Ok, reverted back.
@jeking3 Did you have the time to take about look? How do we proceed on this?

bgedik · 2018-02-09T13:19:13Z

@jeking3 anything else I can do to get this into a mergable state?

jeking3 · 2018-02-19T13:15:58Z

Squash to one commit, please.

bgedik · 2018-02-19T20:08:26Z

@jeking3 Here: #1497

remove workSocket call that is too early

f3c682e

bgedik changed the title ~~remove workSocket call that is too early~~ Remove premature call to workSocket() in TNonblockingServer Jan 21, 2018

Bugra Gedik added 2 commits January 21, 2018 16:35

Make the work socket conditional

e115634

Revert back the changes

8e5ba48

Bugra Gedik added 2 commits January 22, 2018 10:52

Only call workSocket() when there is pending data to read

017d242

Minor update to a comment

c929047

Bugra Gedik added 6 commits January 22, 2018 12:51

Only call workSocket() when there is pending data to read

fb72b94

Only call workSocket() when there is pending data to read

4777cab

Fix the CMake build

f69e91b

Fix the CMake build

8405119

Fix the CMake build

7b60db3

Use the correct type for ioctlsocket call

0810960

jeking3 reviewed Jan 24, 2018

View reviewed changes

Make non-blocking peek optional

501c440

jeking3 reviewed Jan 24, 2018

View reviewed changes

Revert "Make non-blocking peek optional"

a9dc3f1

This reverts commit 501c440.

Bugra Gedik added 3 commits January 24, 2018 13:12

Review fixes

c8d422a

Review fixes

d64a8e0

Review fixes

83d3c13

bgedik closed this Feb 19, 2018

bgedik reopened this Feb 19, 2018

bgedik closed this Feb 19, 2018

bgedik mentioned this pull request Feb 19, 2018

Do not call workSocket() in TNonblockigServer without ensuring that there is data on the socket #1497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove premature call to workSocket() in TNonblockingServer #1476

Remove premature call to workSocket() in TNonblockingServer #1476

bgedik commented Jan 21, 2018

bgedik commented Jan 21, 2018 •

edited

Loading

jeking3 commented Jan 22, 2018

bgedik commented Jan 22, 2018 •

edited

Loading

bgedik commented Jan 23, 2018

jeking3 left a comment

jeking3 Jan 24, 2018

bgedik Jan 24, 2018 •

edited

Loading

jeking3 Jan 24, 2018

jeking3 Jan 24, 2018

bgedik Jan 24, 2018 •

edited

Loading

bgedik Jan 24, 2018 •

edited

Loading

jeking3 Jan 24, 2018

bgedik Jan 24, 2018 •

edited

Loading

jeking3 commented Jan 24, 2018

jeking3 Jan 24, 2018

bgedik Jan 24, 2018

jeking3 Jan 24, 2018

bgedik Jan 24, 2018

jeking3 Jan 24, 2018

bgedik commented Jan 24, 2018 •

edited

Loading

bgedik commented Feb 9, 2018 •

edited

Loading

jeking3 commented Feb 19, 2018

bgedik commented Feb 19, 2018

Remove premature call to workSocket() in TNonblockingServer #1476

Remove premature call to workSocket() in TNonblockingServer #1476

Conversation

bgedik commented Jan 21, 2018

bgedik commented Jan 21, 2018 • edited Loading

jeking3 commented Jan 22, 2018

bgedik commented Jan 22, 2018 • edited Loading

bgedik commented Jan 23, 2018

jeking3 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgedik Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgedik Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

bgedik Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgedik Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

jeking3 commented Jan 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgedik commented Jan 24, 2018 • edited Loading

bgedik commented Feb 9, 2018 • edited Loading

jeking3 commented Feb 19, 2018

bgedik commented Feb 19, 2018

bgedik commented Jan 21, 2018 •

edited

Loading

bgedik commented Jan 22, 2018 •

edited

Loading

bgedik Jan 24, 2018 •

edited

Loading

bgedik Jan 24, 2018 •

edited

Loading

bgedik Jan 24, 2018 •

edited

Loading

bgedik Jan 24, 2018 •

edited

Loading

bgedik commented Jan 24, 2018 •

edited

Loading

bgedik commented Feb 9, 2018 •

edited

Loading