Skip to content

Commit

Permalink
- Updated release number
Browse files Browse the repository at this point in the history
- Updated documentation for release 1.2.4
- Updated missing format params
  • Loading branch information
saif-ellafi committed Dec 23, 2017
1 parent 984ccf8 commit 022f98b
Show file tree
Hide file tree
Showing 14 changed files with 316 additions and 89 deletions.
53 changes: 53 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,56 @@
========
1.2.4
========
---------------
New features
---------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/c17ddac7a5a9e775cddc18d672e80e60f0040e38
ResourceHelper now allows input files to be read in the shape of Spark Dataset, implicitly enabling HDFS paths, allowing larger annotator input files. Needs to set 'TXTDS' as input format Param to let annotators read this way. Allowed in: Lemmatizer, EntityExtractor, RegexMatcher, Sentiment Analysis models, Spell Checker and Dependency Parser.

---------------
Enhancements and progress
---------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/4920e5ce394b25937969cc4cab1d81172be722a3
CRF NER Benchmarking progress
* https://github.com/JohnSnowLabs/spark-nlp/pull/64
EntityExtractor refactored. This annotator uses an input file containing a list of entities to look for inside target text. This annotator has been refactored to be of better use and specifically faster, by using a Trie search algorithm. Proper examples included in python notebooks.

---------------
Bug fixes
---------------
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/41 <> https://github.com/JohnSnowLabs/spark-nlp/commit/d3b9086e834233f3281621d7c82e32195479fc82
Fixed default resources not being loaded properly when using the library through --spark-packages. Improved input reading from resources and folder resources, and falling back to disk, with better error handling.
* https://github.com/JohnSnowLabs/spark-nlp/commit/08405858c6186e6c3e8b668233e30df12fa50374
Corrected param names in DocumentAssembler
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/58 <> https://github.com/JohnSnowLabs/spark-nlp/commit/5a533952cdacf67970c5a8042340c8a4c9416b13
Deleted a left-over deprecated function which was misleading.
* https://github.com/JohnSnowLabs/spark-nlp/commit/c02591bd683db3f615150d7b1d121ffe5d9e4535
Added a filtering to ensure no empty sentences arrive to unnormalized Vivekn Sentiment Analysis

---------------
Documentation and examples
---------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/b81e95ce37ed3c4bd7b05e9f9c7b63b31d57e660
Added additional resources into FAQ page.
* https://github.com/JohnSnowLabs/spark-nlp/commit/0c3f43c0d3e210f3940f7266fe84426900a6294e
Added Spark Summit example notebook with full Pipeline use case
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/53 <> https://github.com/JohnSnowLabs/spark-nlp/commit/20efe4a3a5ffbceedac7bf775466b7a8cde5044f
Fixed scala python documentation mistakes
* https://github.com/JohnSnowLabs/spark-nlp/commit/782eb8dce171b69a615887b3defaf8b729b735f2
Typos fix

---------------
Other
---------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/91d8acb1f0f4840dad86db3319d0b062bd63b8c6
Removed Regex NER due to slowness and little use. CRF NER to replace NER.

---------------
Other
---------------
https://github.com/JohnSnowLabs/spark-nlp/commit/91d8acb1f0f4840dad86db3319d0b062bd63b8c6
Removed Regex NER due to slowness and little use. CRF NER to replace NER.

========
1.2.3
========
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ This library has been uploaded to the spark-packages repository https://spark-pa
To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:1.0.0` to you spark command

```sh
spark-shell --packages JohnSnowLabs:spark-nlp:1.2.3
spark-shell --packages JohnSnowLabs:spark-nlp:1.2.4
```

```sh
pyspark --packages JohnSnowLabs:spark-nlp:1.2.3
pyspark --packages JohnSnowLabs:spark-nlp:1.2.4
```

```sh
spark-submit --packages JohnSnowLabs:spark-nlp:1.2.3
spark-submit --packages JohnSnowLabs:spark-nlp:1.2.4
```

If you want to use and old version check the spark-packages websites to see all the releases.
Expand All @@ -36,19 +36,19 @@ Our package is deployed to maven central. In order to add this package as a depe
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>1.2.3</version>
<version>1.2.4</version>
</dependency>
```

#### SBT
```sbtshell
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.2.3"
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.2.4"
```

If you are using `scala 2.11`

```sbtshell
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.2.3"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.2.4"
```

## Using the jar manually
Expand Down
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ name := "spark-nlp"

organization := "com.johnsnowlabs.nlp"

version := "1.2.3"
version := "1.2.4"

scalaVersion := scalaVer

Expand Down
82 changes: 77 additions & 5 deletions docs/components.html
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,21 @@ <h4 id="Lemmatizer" class="section-block"> 5. Lemmatizer: Lemmas</h4>
setDictionary(path): Path to file containing multiple key to value
dictionary, or key,value lemma dictionary. Default: Not provided
</li>
<li>
setLemmaFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
Default:
Looks up path in configuration
</li>
<li>
setLemmaKeySep(format): Separator for keys and multiple values
Default:
"->" or Looks up path in configuration
</li>
<li>
setLemmaValSep(format): Separator among values
Default:
"\t" or Looks up path in configuration
</li>
</ul>
<b>Example:</b><br>
</p>
Expand All @@ -361,6 +376,21 @@ <h4 id="Lemmatizer" class="section-block"> 5. Lemmatizer: Lemmas</h4>
setDictionary(path): Path to file containing multiple key to value
dictionary, or key,value lemma dictionary. Default: Not provided
</li>
<li>
setLemmaFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
Default:
Looks up path in configuration
</li>
<li>
setLemmaKeySep(format): Separator for keys and multiple values
Default:
"->" or Looks up path in configuration
</li>
<li>
setLemmaValSep(format): Separator among values
Default:
"\t" or Looks up path in configuration
</li>
</ul>
<b>Example:</b><br>
</p>
Expand Down Expand Up @@ -396,10 +426,20 @@ <h4 id="RegexMatcher" class="section-block"> 6. RegexMatcher: Rule matching</h4>
MATCH_FIRST|MATCH_ALL|MATCH_COMPLETE
</li>
<li>
setRules(path): Path to file containing a set of regex,key pair.
setRulesPath(path): Path to file containing a set of regex,key pair.
Default:
Looks up path in configuration
</li>
<li>
setRulesFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
Default:
TXT or looks up path in configuration
</li>
<li>
setRulesSeparator(sep): Separator for rules file
Default:
"," or looks up path in configuration
</li>
</ul>
<b>Example:</b><br>
</p>
Expand All @@ -424,10 +464,20 @@ <h4 id="RegexMatcher" class="section-block"> 6. RegexMatcher: Rule matching</h4>
MATCH_FIRST|MATCH_ALL|MATCH_COMPLETE
</li>
<li>
setRules(path): Path to file containing a set of regex,key pair.
setRulesPath(path): Path to file containing a set of regex,key pair.
Default:
Looks up path in configuration
</li>
<li>
setRulesFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
Default:
TXT or looks up path in configuration
</li>
<li>
setRulesSeparator(sep): Separator for rules file
Default:
"," or looks up path in configuration
</li>
</ul>
<b>Example:</b><br>
</p>
Expand Down Expand Up @@ -467,10 +517,15 @@ <h4 id="EntityExtractor" class="section-block"> 7. EntityExtractor: Phrase match
boundaries for better precision
</li>
<li>
setEntities(path): Provides a file with phrases to match. Default:
setEntitiesPath(path): Provides a file with phrases to match. Default:
Looks up
path in configuration
</li>
<li>
setEntitiesFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
Default:
TXT or looks up path in configuration
</li>
</ul>
<b>Example:</b><br>
</p>
Expand Down Expand Up @@ -498,10 +553,15 @@ <h4 id="EntityExtractor" class="section-block"> 7. EntityExtractor: Phrase match
boundaries for better precision
</li>
<li>
setEntities(path): Provides a file with phrases to match. Default:
setEntitiesPath(path): Provides a file with phrases to match. Default:
Looks up
path in configuration
</li>
<li>
setEntitiesFormat(format): TXT for txt files or TXTDS for text files read as dataset (allows hdfs)
Default:
TXT or looks up path in configuration
</li>
</ul>
<b>Example:</b><br>
</p>
Expand Down Expand Up @@ -710,6 +770,12 @@ <h4 id="SentimentDetector" class="section-block"> 11. SentimentDetector: Sentime
<li>
setDictPath(path)
</li>
<li>
setDictFormat(path)
</li>
<li>
setDictSeparator(path)
</li>
</ul>
<br>
<b>Input:</b>
Expand Down Expand Up @@ -739,6 +805,12 @@ <h4 id="SentimentDetector" class="section-block"> 11. SentimentDetector: Sentime
<li>
setDictPath(path)
</li>
<li>
setDictFormat(path)
</li>
<li>
setDictSeparator(path)
</li>
</ul>
<br>
<b>Input:</b>
Expand Down Expand Up @@ -884,7 +956,7 @@ <h4 id="SpellChecker" class="section-block"> 13. SpellChecker: Token spell
setCorpusPath: path to training corpus. Can be any good text.
</li>
<li>
setCorpusFormat(format): Allowed “txt” or “txtds”. The latter uses spark dataframes from text
setCorpusFormat(format): Allowed “txt” or “txtds”. The latter uses spark dataframes from text
</li>
<li>
setSlangPath: path to custom dictionares, separated by comma
Expand Down
18 changes: 5 additions & 13 deletions python/example/vivekn-sentiment/sentiment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"#Load the input data to be annotated\n",
Expand Down Expand Up @@ -160,9 +158,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"pipeline = Pipeline(stages=[\n",
Expand All @@ -182,9 +178,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"for r in sentiment_data.take(5):\n",
Expand Down Expand Up @@ -217,9 +211,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"Pipeline.read().load(\"./ps\")\n",
Expand All @@ -239,7 +231,7 @@
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
Expand Down
Loading

0 comments on commit 022f98b

Please sign in to comment.