-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow StaxEventItemReader to auto-detect the input file encoding #4101
Comments
Hi @benas. I've checked that just assigning |
Thank you for checking. However, while reviewing this, I don't see anything wrong with using the platform's default encoding as a default encoding. I believe that is a better default than The title of this issue is implying that the default encoding is used unconditionally, but that is not accurate. It is just a default value, made configurable in #807. If the default encoding is not suitable, it can be changed with the setter added in #807. Closing this as a non issue. |
Hi, @fmbenhassine: XML files are a bit different to flat files, as they include a header that declares the encoding of the data that follows. In my experience it's quite common to pass null as the encoding to the XML parser so that it can decide about it reading the header. Sometimes, it's useful to force the parser to use an encoding when the file to parse doesn't contain the encoding header or when this header is not correct, but this situation is uncommon. Nevertheless, the configurable encoding introduced in #807, would be enough if the setter would't reject the null value, as it does currently:
|
That's a different story. Accepting
Some StAX implementations might not accept The goal is to cover the option of letting the reader detect the encoding without passing /**
* Create a new XMLEventReader from a java.io.InputStream
* @param stream the InputStream to read from
* @return an instance of the {@code XMLEventReader}
* @throws XMLStreamException if an error occurs
*/
public abstract XMLEventReader createXMLEventReader(java.io.InputStream stream)
throws XMLStreamException;
/**
* Create a new XMLEventReader from a java.io.InputStream
* @param stream the InputStream to read from
* @param encoding the character encoding of the stream
* @return an instance of the {@code XMLEventReader}
* @throws XMLStreamException if an error occurs
*/
public abstract XMLEventReader createXMLEventReader(java.io.InputStream stream, String encoding)
throws XMLStreamException; If we make the setter accept eventReader = this.encoding != null ? xmlInputFactory.createXMLEventReader(inputStream, this.encoding) :
xmlInputFactory.createXMLEventReader(inputStream); |
That's a point!
Good question. My intuition suggests that both methods should have the same effect, but intuition and XML APIs do not always match.
Tried both methods in two different 1.8 JREs (IBM and SUN). All of them work the same way no matter you use I've drilled down a bit inside both code flows and all of them converge into
|
Thank you for your feedback, Appreciate it! I think using I will plan this enhancement for the upcoming 5.0.2, by accepting |
Backport-to-4.3.x, also appreciated, thanks! |
Before this commit, it was not possible to pass a null encoding to the StaxEventItemReader, which prevents the XML event reader to auto-detect the file encoding. This commits makes the encoding setter more lenient by accepting a null value. Resolves #4101
Bug description
StaxEventItemReader uses platform default encoding to read XML files.
Environment
Spring-batch version 4.3.0. Related to #807
Steps to reproduce
Just try to parse an XML file with a declared encoding different to the platform encoding.
Expected behavior
Encoding field in StaxEventItemReader should default to null, so that the parser has the opportunity to autodetect the file encoding.
The text was updated successfully, but these errors were encountered: