Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cache mechanism implementation for metadata.json #14224
1 - gets3Object that includes getLastModified() (just contains a summary, do not download the whole metadata.json file.)
2- check the condition (cache contains up-to-date metadata)
3- If the cache contains up-to-date metadata, get it;
Otherwise, download it, set it to the cache, and return it.
[SPARKNLP-1031] Solves Dependency Parsers training issue #14225
This PR introduces critical enhancements and optimizations to the processing of the CoNLL-U format, which is instrumental in the training of Dependency Parsers. The key improvements include:
Enhanced Multiword Token Handling: This update ensures proper processing of lines identified by id columns as multiword tokens (e.g., 2-3 no _ _ _ _ _ _ _ _). This adjustment guarantees that multiword tokens are accurately recognized and managed throughout the parsing process.
Improved Handling of Missing uPos Values: Before this change, lines with unavailable uPos values could disrupt the parsing flow. With the current enhancements, the system gracefully handles such scenarios, ensuring uninterrupted parsing operations even in the absence of uPos values.
Beyond these functional enhancements, this PR undertakes a comprehensive refactoring of the underlying codebase. The refactoring efforts focus on enhancing code readability, cleanliness, and maintainability. These improvements pave the way for easier future modifications and debugging, aligning with best practices in software development.