Skip to content

Releases: JohnSnowLabs/spark-nlp

John Snow Labs Spark-NLP 1.2.5

08 Jan 22:11
Compare
Choose a tag to compare

Note: Pipelines from 1.2.4 or older cannot be loaded from 1.2.5

New features

  • #70
    Word embeddings parameter for CRF NER annotator
  • #78
    Annotator Features replace spark Params and are now serialized using Kryo and partitioned parquet files, increases performance and smaller memory consumption in Driver for saving and loading pipelines with large corpora. Such features are now also broadcasted for better performance in distributed environments. This enhancement is a breaking change, does not allow to load older pipelines

Bug fixes

  • cb9aa43
    Stemmer was not capable of being deserialized (Implements DefaultParamsReadable)
  • #75
    Sentence Boundary detector was not properly setting bounds

Documentation (thanks @maziyarpanahi)

  • #79
    Typo in code
  • #74
    Bad description

John Snow Labs Spark-NLP 1.2.4

23 Dec 07:07
Compare
Choose a tag to compare

New features

  • c17ddac
    ResourceHelper now allows input files to be read in the shape of Spark Dataset, implicitly enabling HDFS paths, allowing larger annotator input files. Needs to set 'TXTDS' as input format Param to let annotators read this way. Allowed in: Lemmatizer, EntityExtractor, RegexMatcher, Sentiment Analysis models, Spell Checker and Dependency Parser.

Enhancements and progress

  • #64
    EntityExtractor refactored. This annotator uses an input file containing a list of entities to look for inside target text. This annotator has been refactored to be of better use and specifically faster, by using a Trie search algorithm. Proper examples included in python notebooks.
  • 4920e5c
    CRF NER Benchmarking progress. CRF NER Documentation and official release coming soon

Bug fixes

  • Issue #41 <> d3b9086
    Fixed default resources not being loaded properly when using the library through --spark-packages. Improved input reading from resources and folder resources, and falling back to disk, with better error handling.
  • 0840585
    Corrected param names in DocumentAssembler
  • Issue #58 <> 5a53395
    Deleted a left-over deprecated function which was misleading.
  • c02591b
    Added a filtering to ensure no empty sentences arrive to unnormalized Vivekn Sentiment Analysis

Documentation and examples

  • b81e95c
    Added additional resources into FAQ page.
  • 0c3f43c
    Added Spark Submit example notebook with full Pipeline use case
  • Issue #53 <> 20efe4a
    Fixed scala python documentation mistakes
  • 782eb8d
    Typos fix

Other

  • 91d8acb
    Removed Regex NER due to slowness and little use. CRF NER to replace NER.