-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23064][DOCS][SS] Added documentation for stream-stream joins #20255
Conversation
Test build #86073 has finished for PR 20255 at commit
|
clickTime >= impressionTime AND | ||
clickTime <= impressionTime + interval 1 hour | ||
""" | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should just work for R, like this:
(I added withWatermark in 2.3)
impressions <- read.stream( ...
clicks <- read.stream( ...
# Apply watermarks on event-time columns
impressionsWithWatermark <- withWatermark(impressions, "impressionTime", "2 hours")
clicksWithWatermark <- withWatermark(clicks, "clickTime", "3 hours")
# Join with event-time constraints
impressionsWithWatermark.join(
clicksWithWatermark,
expr(
"clickAdId = impressionAdId AND
clickTime >= impressionTime AND
clickTime <= impressionTime + interval 1 hour"
))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests for stream-stream joins in R as well? :)
Actually, I would like it to be tested first before I add a code snippet. so that instead "should work" we can claim for sure "works".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
|
||
However, note that the outer NULL results will be generated with a delay (depends on the specified | ||
watermark delay and the time range condition) because the engine has to wait for that long to ensure | ||
there were no matches and there will be no more matches in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space?
@@ -2142,6 +2452,7 @@ write.stream(aggDF, "memory", outputMode = "complete", checkpointLocation = "pat | |||
|
|||
**Talks** | |||
|
|||
- Spark Summit 2017 Talk - [Easy, Scalable, Fault-tolerant Stream Processing with Structured Streaming in Apache Spark](https://spark-summit.org/2017/events/easy-scalable-fault-tolerant-stream-processing-with-structured-streaming-in-apache-spark/) | |||
- Spark Summit Europe 2017 Talks - | |||
- [Easy, Scalable, Fault-tolerant Stream Processing with Structured Streaming in Apache Spark](https://spark-summit.org/2017/events/easy-scalable-fault-tolerant-stream-processing-with-structured-streaming-in-apache-spark/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this link needs to be updated. blocked on some links not working on the spark summit website.
Test build #86214 has finished for PR 20255 at commit
|
Test build #86215 has finished for PR 20255 at commit
|
Test build #86219 has finished for PR 20255 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except some nits
|
||
- Cannot use streaming aggregations before joins. | ||
|
||
- Cannot use mapGroupsWithState and flatMapGroupsWithState in Update mode cannot before joins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cannot before joins.
<td style="vertical-align: middle;">Inner</td> | ||
<td style="vertical-align: middle;"> | ||
Supported, optionally specify watermark on both sides + | ||
time constraints for state cleanup< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit" remove <
Test build #86303 has finished for PR 20255 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging to master and 2.3. |
## What changes were proposed in this pull request? Added documentation for stream-stream joins data:image/s3,"s3://crabby-images/521b8/521b8061d6d506a1d7dde9f478e84bb691c2a94a" alt="image" data:image/s3,"s3://crabby-images/26f17/26f175dba75f4c9b4d7ec1fe1434a1bf0c4bda83" alt="image" data:image/s3,"s3://crabby-images/db57c/db57c8bbd64677053c9ff17640657f9aab3ad8dd" alt="image" data:image/s3,"s3://crabby-images/01677/01677a1ebfe09331de48e682578affc95a8a361b" alt="image" ## How was this patch tested? N/a Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #20255 from tdas/join-docs. (cherry picked from commit 1002bd6) Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
What changes were proposed in this pull request?
Added documentation for stream-stream joins
How was this patch tested?
N/a