Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARKNLP-1027] llama.cpp integration #14364

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)

organization := "com.johnsnowlabs.nlp"

version := "5.4.2"
version := "5.5.0"

(ThisBuild / scalaVersion) := scalaVer

Expand Down Expand Up @@ -180,6 +180,16 @@ val onnxDependencies: Seq[sbt.ModuleID] =
else
Seq(onnxCPU)

val llamaCppDependencies =
if (is_gpu.equals("true"))
Seq(llamaCppGPU)
else if (is_silicon.equals("true"))
Seq(llamaCppSilicon)
// else if (is_aarch64.equals("true"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DevinTDHa We don't need a special build for aarch64 or it's not supported?

// Seq(openVinoCPU)
else
Seq(llamaCppCPU)

val openVinoDependencies: Seq[sbt.ModuleID] =
if (is_gpu.equals("true"))
Seq(openVinoGPU)
Expand All @@ -202,6 +212,7 @@ lazy val root = (project in file("."))
utilDependencies ++
tensorflowDependencies ++
onnxDependencies ++
llamaCppDependencies ++
openVinoDependencies ++
typedDependencyParserDependencies,
// TODO potentially improve this?
Expand Down
135 changes: 135 additions & 0 deletions docs/en/annotator_entries/AutoGGUF.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
{%- capture title -%}
AutoGGUFModel
{%- endcapture -%}

{%- capture description -%}
Annotator that uses the llama.cpp library to generate text completions with large language
models.

For settable parameters, and their explanations, see [HasLlamaCppProperties](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/HasLlamaCppProperties.scala) and refer to
the llama.cpp documentation of
[server.cpp](https://github.com/ggerganov/llama.cpp/tree/7d5e8777ae1d21af99d4f95be10db4870720da91/examples/server)
for more information.

If the parameters are not set, the annotator will default to use the parameters provided by
the model.

Pretrained models can be loaded with `pretrained` of the companion object:

```scala
val autoGGUFModel = AutoGGUFModel.pretrained()
.setInputCols("document")
.setOutputCol("completions")
```

The default model is `"gguf-phi3-mini-4k-instruct-q4"`, if no name is provided.

For available pretrained models please see the [Models Hub](https://sparknlp.org/models).

For extended examples of usage, see the
[AutoGGUFModelTest](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/test/scala/com/johnsnowlabs/nlp/annotators/seq2seq/AutoGGUFModelTest.scala)
and the
[example notebook](https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/llama.cpp/llama.cpp_in_Spark_NLP_AutoGGUFModel.ipynb).

**Note**: To use GPU inference with this annotator, make sure to use the Spark NLP GPU package and set
the number of GPU layers with the `setNGpuLayers` method.

When using larger models, we recommend adjusting GPU usage with `setNCtx` and `setNGpuLayers`
according to your hardware to avoid out-of-memory errors.
{%- endcapture -%}

{%- capture input_anno -%}
DOCUMENT
{%- endcapture -%}

{%- capture output_anno -%}
DOCUMENT
{%- endcapture -%}

{%- capture python_example -%}
>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> document = DocumentAssembler() \
... .setInputCol("text") \
... .setOutputCol("document")
>>> autoGGUFModel = AutoGGUFModel.pretrained() \
... .setInputCols(["document"]) \
... .setOutputCol("completions") \
... .setBatchSize(4) \
... .setNPredict(20) \
... .setNGpuLayers(99) \
... .setTemperature(0.4) \
... .setTopK(40) \
... .setTopP(0.9) \
... .setPenalizeNl(True)
>>> pipeline = Pipeline().setStages([document, autoGGUFModel])
>>> data = spark.createDataFrame([["Hello, I am a"]]).toDF("text")
>>> result = pipeline.fit(data).transform(data)
>>> result.select("completions").show(truncate = False)
+-----------------------------------------------------------------------------------------------------------------------------------+
|completions |
+-----------------------------------------------------------------------------------------------------------------------------------+
|[{document, 0, 78, new user. I am currently working on a project and I need to create a list of , {prompt -> Hello, I am a}, []}]|
+-----------------------------------------------------------------------------------------------------------------------------------+
{%- endcapture -%}

{%- capture scala_example -%}
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline
import spark.implicits._

val document = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val autoGGUFModel = AutoGGUFModel
.pretrained()
.setInputCols("document")
.setOutputCol("completions")
.setBatchSize(4)
.setNPredict(20)
.setNGpuLayers(99)
.setTemperature(0.4f)
.setTopK(40)
.setTopP(0.9f)
.setPenalizeNl(true)

val pipeline = new Pipeline().setStages(Array(document, autoGGUFModel))

val data = Seq("Hello, I am a").toDF("text")
val result = pipeline.fit(data).transform(data)
result.select("completions").show(truncate = false)
+-----------------------------------------------------------------------------------------------------------------------------------+
|completions |
+-----------------------------------------------------------------------------------------------------------------------------------+
|[{document, 0, 78, new user. I am currently working on a project and I need to create a list of , {prompt -> Hello, I am a}, []}]|
+-----------------------------------------------------------------------------------------------------------------------------------+

{%- endcapture -%}

{%- capture api_link -%}
[AutoGGUFModel](/api/com/johnsnowlabs/nlp/annotators/seq2seq/AutoGGUFModel)
{%- endcapture -%}

{%- capture python_api_link -%}
[AutoGGUFModel](/api/python/reference/autosummary/sparknlp/annotator/seq2seq/auto_gguf_model/index.html)
{%- endcapture -%}

{%- capture source_link -%}
[AutoGGUFModel](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/annotators/seq2seq/AutoGGUFModel.scala)
{%- endcapture -%}

{% include templates/anno_template.md
title=title
description=description
input_anno=input_anno
output_anno=output_anno
python_example=python_example
scala_example=scala_example
api_link=api_link
python_api_link=python_api_link
source_link=source_link
%}
1 change: 1 addition & 0 deletions docs/en/annotators.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ There are two types of Annotators:
{:.table-model-big}
|Annotator|Description|Version |
|---|---|---|
{% include templates/anno_table_entry.md path="" name="AutoGGUFModel" summary="Annotator that uses the llama.cpp library to generate text completions with large language models."%}
{% include templates/anno_table_entry.md path="" name="BGEEmbeddings" summary="Sentence embeddings using BGE."%}
{% include templates/anno_table_entry.md path="" name="BigTextMatcher" summary="Annotator to match exact phrases (by token) provided in a file against a Document."%}
{% include templates/anno_table_entry.md path="" name="Chunk2Doc" summary="Converts a `CHUNK` type column back into `DOCUMENT`. Useful when trying to re-tokenize or do further analysis on a `CHUNK` result."%}
Expand Down
Loading
Loading