Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKNLP-605: ConvNextForImageClassification #13713

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/en/annotators.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ There are two types of Annotators:

- `pretrained(name, language, extra_location)` -> by default, pre-trained will bring a default model, sometimes we offer more than one model, in this case, you may have to use name, language or extra location to download them.


## Available Annotators

{:.table-model-big}
Expand Down Expand Up @@ -101,7 +100,8 @@ There are two types of Annotators:
{% include templates/anno_table_entry.md path="" name="YakeKeywordExtraction" summary="Unsupervised, Corpus-Independent, Domain and Language-Independent and Single-Document keyword extraction."%}

## Available Transformers
Additionally, these transformers are available to generate embeddings.

Additionally, these transformers are available.

{:.table-model-big}
|Transformer|Description|Version|
Expand All @@ -118,6 +118,7 @@ Additionally, these transformers are available to generate embeddings.
{% include templates/anno_table_entry.md path="./transformers" name="CamemBertEmbeddings" summary="CamemBert is based on Facebook’s RoBERTa model released in 2019."%}
{% include templates/anno_table_entry.md path="./transformers" name="CamemBertForSequenceClassification" summary="amemBertForSequenceClassification can load CamemBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks."%}
{% include templates/anno_table_entry.md path="./transformers" name="CamemBertForTokenClassification" summary="CamemBertForTokenClassification can load CamemBERT Models with a token classification head on top"%}
{% include templates/anno_table_entry.md path="./transformers" name="ConvNextForImageClassification" summary="ConvNextForImageClassification is an image classifier based on ConvNet models"%}
{% include templates/anno_table_entry.md path="./transformers" name="DeBertaEmbeddings" summary="DeBERTa builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa."%}
{% include templates/anno_table_entry.md path="./transformers" name="DeBertaForQuestionAnswering" summary="DeBertaForQuestionAnswering can load DeBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD."%}
{% include templates/anno_table_entry.md path="./transformers" name="DistilBertEmbeddings" summary="DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base."%}
Expand Down
165 changes: 165 additions & 0 deletions docs/en/transformer_entries/ConvNextForImageClassification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
{%- capture title -%}
ConvNextForImageClassification
{%- endcapture -%}

{%- capture description -%}
ConvNextForImageClassification is an image classifier based on ConvNet models.

The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan
Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. ConvNeXT is a pure convolutional
model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform
them.

Pretrained models can be loaded with `pretrained` of the companion object:

```scala
val imageClassifier = ConvNextForImageClassification.pretrained()
.setInputCols("image_assembler")
.setOutputCol("class")
```

The default model is `"image_classifier_convnext_tiny_224_local"`, if no name is provided.

For available pretrained models please see the
[Models Hub](https://nlp.johnsnowlabs.com/models?task=Image+Classification).

Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To
see which models are compatible and how to import them see
https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended
examples, see
[ConvNextForImageClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/cv/ConvNextForImageClassificationTestSpec.scala).

**References:**

[A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)

**Paper Abstract:**

*The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers
(ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.
A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision
tasks such as object detection and semantic segmentation. It is the hierarchical Transformers
(e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers
practically viable as a generic vision backbone and demonstrating remarkable performance on a
wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still
largely credited to the intrinsic superiority of Transformers, rather than the inherent
inductive biases of convolutions. In this work, we reexamine the design spaces and test the
limits of what a pure ConvNet can achieve. We gradually "modernize" a standard ResNet toward
the design of a vision Transformer, and discover several key components that contribute to the
performance difference along the way. The outcome of this exploration is a family of pure
ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts
compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8%
ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K
segmentation, while maintaining the simplicity and efficiency of standard ConvNets.*
{%- endcapture -%}

{%- capture input_anno -%}
IMAGE
{%- endcapture -%}

{%- capture output_anno -%}
CATEGORY
{%- endcapture -%}

{%- capture python_example -%}
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
imageDF = spark.read \
.format("image") \
.option("dropInvalid", value = True) \
.load("src/test/resources/image/")
imageAssembler = ImageAssembler() \
.setInputCol("image") \
.setOutputCol("image_assembler")
imageClassifier = ConvNextForImageClassification \
.pretrained() \
.setInputCols(["image_assembler"]) \
.setOutputCol("class")
pipeline = Pipeline().setStages([imageAssembler, imageClassifier])
pipelineDF = pipeline.fit(imageDF).transform(imageDF)
pipelineDF \
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "class.result") \
.show(truncate=False)
+-----------------+----------------------------------------------------------+
|image_name |result |
+-----------------+----------------------------------------------------------+
|bluetick.jpg |[bluetick] |
|chihuahua.jpg |[Chihuahua] |
|egyptian_cat.jpeg|[tabby, tabby cat] |
|hen.JPEG |[hen] |
|hippopotamus.JPEG|[hippopotamus, hippo, river horse, Hippopotamus amphibius]|
|junco.JPEG |[junco, snowbird] |
|ostrich.JPEG |[ostrich, Struthio camelus] |
|ox.JPEG |[ox] |
|palace.JPEG |[palace] |
|tractor.JPEG |[thresher, thrasher, threshing machine |
+-----------------+----------------------------------------------------------+
{%- endcapture -%}

{%- capture scala_example -%}
import com.johnsnowlabs.nlp.annotator._
import com.johnsnowlabs.nlp.ImageAssembler
import org.apache.spark.ml.Pipeline

val imageDF: DataFrame = spark.read
.format("image")
.option("dropInvalid", value = true)
.load("src/test/resources/image/")

val imageAssembler = new ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")

val imageClassifier = ConvNextForImageClassification
.pretrained()
.setInputCols("image_assembler")
.setOutputCol("class")

val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier))
val pipelineDF = pipeline.fit(imageDF).transform(imageDF)

pipelineDF
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "class.result")
.show(truncate = false)
+-----------------+----------------------------------------------------------+
|image_name |result |
+-----------------+----------------------------------------------------------+
|palace.JPEG |[palace] |
|egyptian_cat.jpeg|[tabby, tabby cat] |
|hippopotamus.JPEG|[hippopotamus, hippo, river horse, Hippopotamus amphibius]|
|hen.JPEG |[hen] |
|ostrich.JPEG |[ostrich, Struthio camelus] |
|junco.JPEG |[junco, snowbird] |
|bluetick.jpg |[bluetick] |
|chihuahua.jpg |[Chihuahua] |
|tractor.JPEG |[tractor] |
|ox.JPEG |[ox] |
+-----------------+----------------------------------------------------------+

{%- endcapture -%}

{%- capture api_link -%}
[ConvNextForImageClassification](/api/com/johnsnowlabs/nlp/annotators/cv/ConvNextForImageClassification)
{%- endcapture -%}

{%- capture python_api_link -%}
[ConvNextForImageClassification](/api/python/reference/autosummary/sparknlp/annotator/cv/convnext_for_image_classification/index.html#sparknlp.annotator.cv.convnext_for_image_classification.ConvNextForImageClassification)
{%- endcapture -%}

{%- capture source_link -%}
[ConvNextForImageClassification](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/annotators/cv/ConvNextForImageClassification.scala)
{%- endcapture -%}

{% include templates/anno_template.md
title=title
description=description
input_anno=input_anno
output_anno=output_anno
python_example=python_example
scala_example=scala_example
api_link=api_link
python_api_link=python_api_link
source_link=source_link
%}
1 change: 1 addition & 0 deletions python/sparknlp/annotator/cv/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
# limitations under the License.
from sparknlp.annotator.cv.vit_for_image_classification import *
from sparknlp.annotator.cv.swin_for_image_classification import *
from sparknlp.annotator.cv.convnext_for_image_classification import *
Loading