-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Is it possible to restart reading a stream at a specified index? #523
Comments
Hi @andye2004 , I think this is possible. How is batch consuming your Spring Content Storage? If you are using Spring Content REST on top of a Store then you have an endpoint that supports byte ranges. So, you can make a request with a Byte-Range header. If you are using the Store Java API directly then after getting the InputStream can you Or am I misunderstanding your question? |
Hi @paulcwarren, thanks for a super fast follow-up! For background, and I'm sure you already know how Spring batch works but..... When executing a step (not a tasklet) in a job and the reader batches are smaller than the data available (I know, the whole point of batching) the transaction boundary is (roughly), and apologies for terrible pseudo code:-
Under the covers Spring batch uses a countdown I tried the opposite and began the transaction boundary when calling This is what led me to asking the question and I had thought about skipping etc BUT I'm reading 40Mb+ files at whatever the configured postgres blob size is defined as, say 1024 bytes maybe? I might end up reading the first 1024 bytes thousands of time and simply discarding them and this is going to be pretty costly thing to do and I reckon simply an infeasible approach to take. After thinking on this some more, I think I need to find a way of using a different session for the Spring content InputStream as opposed to have it use the same session/transaction as is in the current context, that way when the batch transaction is committed the content session is untouched and I can continue to read from the DB without having to re-open the stream and discard n bytes each time. Hopefully that all made sense. |
Hmmm...interesting problem. Thanks for sharing the context. The behavior you describe makes sense to me, I think. I wrote the jpa storage module to participate in the current transaction as folks asked for basic transaction support and the use case for that was something more like a developer is developing a boot app with spring data and spring content where a service does some work on one or more entities, then does some content operation on content associated with those entities, then perhaps does some more work on those entities. If anything fails, roll it all back. But it sounds like when using Spring Batch (definitely not an expert) processing steps start and stop the transaction. However, since a single step run might not process the entire content you therefore want to maintain the open cursor into that content stream. I can think of some options:
|
Hi @paulcwarren, I did actually spend a bit of time on this problem again yesterday exploring some options and realised I could use a fairly simple approach that would work by wrapping the call to the content store in an Async method returning a callable future, e.g.
then in the reader I can just call This is clearly not a solution to the original question around returning an input stream starting at a specific offset but it does solve the transactional issue described in my previous post. At least it should for the majority of situations but I've now run into another issue. The input stream being returned from the content store wraps the This is probably sufficient for most cases as you would normally call However, even this doesn't solve my current current issue. I need to wrap the input stream in a reader I that I have no control over which (in)conveniently calls More thought required methinks. P.S. DB storage is mandatory unfortunately. |
Sorry should've said, feel free to close this off. I'm happy the original question has been answered. I'll update the thread if I find a solution that solves my last remaining problem. Thanks again for providing such an excellent library and the speedy response to my question. Apologies for drivelling on in such long posts :), bad habit. |
OK, so this was easier than I thought. Final two problems were
I thought about wrapping the input stream returned from the content store, but then realised someone else must've done this already and they have, loads of implementations out there. Then it was just a case of not calling close in the batch Final service and non-closing InputStream wrapper:-
With all that, all I need to do is call the Works like a dream. Thanks again for all your help! |
Thanks for the very detailed write up @andye2004. As I try to build a community around Spring Content it is super helpful to have these kind of write up on issues allowing others to advance faster and Spring Content to gain more usage so I really appreciate the time you took to do that. And thanks also for going the extra mile to figure it out. That's a clever use of the CompletableFuture API leveraging. Its characteristic to run arbitrary code in a different thread led to a simply, clean solution I think. A good outcome. I'll go ahead and close this issue but if you ever spot a better implementation alternative for the blob resource that servers the basic transaction support use case and these sorts of batch use cases then please don't be spy. I'd be all ears. |
I'm looking to read the content InputStream in a Spring batch reader BUT due to the way the transaction boundaries work, I either run into deadlock scenarios or I have to read the entire InputStream into a buffer. Ideally I'd like to be able to open / start reading the stream at a specified index, would this be possible?
I can provide a detailed explanation of what is actually happening with an indication of the inter-play going on between spring batch and Spring content but it is pretty complex and is likely to be lengthy, hence the simplistic question above. That said, I'm sure there are lots of occasions where we might want to start reading content at a specified location.
Thanks in advance, Andy.
The text was updated successfully, but these errors were encountered: