-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6328] [Python] Python API for StreamingListener #9186
Conversation
…mplement StreamingListenerEvent
this is ok to test. |
@zsxwing can you take a look. |
Test build #44014 has finished for PR 9186 at commit
|
The test that failed was a flaky StreamingKMeansTest. |
test this again |
@tdas it looks like Jenkins didn't retest |
retest this please |
Test build #44115 has finished for PR 9186 at commit
|
def onOutputOperationCompleted(self, outputOperationCompleted): | ||
pass | ||
|
||
def getEventInfo(self, event): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would make it easier to see which Info is associated with each StreamingListenerEvent but I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use this method to return instances of the Python friendly classes for these Scala objects?
I suggest adding some Python friendly classes for |
Test build #44248 has finished for PR 9186 at commit
|
Test build #44268 has finished for PR 9186 at commit
|
self._test_func(input, func, expected) | ||
|
||
# Test occasionally fails without a delay | ||
time.sleep(.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The listener sometimes receives 3 batchCompleted events instead of 4 when I run the test, even though it always gets the correct job output. However, when I add a minor delay the test passes consistently. Can I get someone's opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at PySparkStreamingTestCase.wait_for
. You can use it to wait for the expected results or timeout.
Test build #44377 has finished for PR 9186 at commit
|
def onBatchStarted(self, batchStarted): | ||
batch_info = BatchInfo(batchStarted.batchInfo()) | ||
batch_started = StreamingListenerBatchStarted(batch_info) | ||
self.userStreamingListener .onBatchStarted(batch_started) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: redundant space
Test build #45690 has finished for PR 9186 at commit
|
self.assertEqual(info.schedulingDelay(), -1) | ||
self.assertEqual(info.processingDelay(), -1) | ||
self.assertEqual(info.totalDelay(), -1) | ||
self.assertEqual(info.numRecords(), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just meant you can access the Map here, e.g.:
for streamId in info.streamIdToInputInfo():
streamInputInfo = info.streamIdToInputInfo()[streamId]
# access fields of streamInputInfo
for outputOpId in info.outputOperationInfos():
outputOperationInfo = info.outputOperationInfos()[outputOpId]
# access fields of outputOperationInfo
I just noticed that Streaming Python unit tests cannot report failure. It always says |
We should retest this PR after merging #9669. |
#9669 has been merged. please merge with master and test the PR again. |
retest this please |
Test build #45874 has finished for PR 9186 at commit
|
@djalova could you fix the test?
|
Sure, I'll change it to check for at least 4 batches. |
Test build #45880 has finished for PR 9186 at commit
|
@djalova could you add the following checks?
|
Sorry for my unclear comment. I just meant adding the real codes to access fields of
|
Sorry I wasn't reading your comment carefully. I'll make the update. |
Test build #45905 has finished for PR 9186 at commit
|
Test build #45906 has finished for PR 9186 at commit
|
Thanks @djalova LGTM |
Author: Daniel Jalova <djalova@us.ibm.com> Closes #9186 from djalova/SPARK-6328. (cherry picked from commit ace0db4) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
No description provided.