Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translated image_captioning from en to es #28960

Closed
wants to merge 55 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
ebc5d85
translated image_captioning from en to es
gisturiz Feb 11, 2024
1da8803
fixed translation issues and added section to toctree
gisturiz Feb 13, 2024
04bedcf
made further corrections to doc and Tip html tag
gisturiz Feb 13, 2024
f09b91b
[Nougat] Fix pipeline (#28242)
NielsRogge Feb 12, 2024
c872ee0
[Docs] Update README and default pipelines (#28864)
NielsRogge Feb 12, 2024
7216f3a
Convert `torch_dtype` as `str` to actual torch data type (i.e. "float…
KossaiSbai Feb 12, 2024
a9cc329
[`pipelines`] updated docstring with vqa alias (#28951)
cmahmut Feb 12, 2024
bfb4a51
Tests: tag `test_save_load_fast_init_from_base` as flaky (#28930)
gante Feb 12, 2024
c084642
Updated requirements for image-classification samples: datasets>=2.14…
alekseyfa Feb 12, 2024
2a380e4
Always initialize tied output_embeddings if it has a bias term (#28947)
hackyon Feb 12, 2024
9231c8f
Clean up staging tmp checkpoint directory (#28848)
woshiyyya Feb 12, 2024
a321baf
[Docs] Add language identifiers to fenced code blocks (#28955)
khipp Feb 12, 2024
296e2ef
[Docs] Add video section (#28958)
NielsRogge Feb 12, 2024
85ffd13
[i18n-de] Translate CONTRIBUTING.md to German (#28954)
khipp Feb 12, 2024
957cad4
[`NllbTokenizer`] refactor with added tokens decoder (#27717)
ArthurZucker Feb 13, 2024
882f6dc
Add sudachi_projection option to BertJapaneseTokenizer (#28503)
hiroshi-matsuda-rit Feb 13, 2024
a9e8b6c
Static Cache: load models with MQA or GQA (#28975)
gante Feb 13, 2024
3fd3630
Update configuration_llama.py: fixed broken link (#28946)
AdityaKane2001 Feb 13, 2024
f46a4ca
[`DETR`] Update the processing to adapt masks & bboxes to reflect pad…
amyeroberts Feb 13, 2024
0029925
ENH: Do not pass warning message in case `quantization_config` is in …
younesbelkada Feb 14, 2024
5562e59
ENH [`AutoQuantizer`]: enhance trainer + not supported quant methods …
younesbelkada Feb 14, 2024
21fd9a7
Add `StableLM` (#28810)
jon-tow Feb 14, 2024
467546d
Add SiglipForImageClassification and CLIPForImageClassification (#28952)
NielsRogge Feb 14, 2024
4f85ce9
AQLM quantizer support (#28928)
Feb 14, 2024
36e083c
[`Doc`] Fix docbuilder - make `BackboneMixin` and `BackboneConfigMixi…
amyeroberts Feb 14, 2024
8078a3d
Set the dataset format used by `test_trainer` to float32 (#28920)
ji-huazhong Feb 14, 2024
0eb938c
Introduce AcceleratorConfig dataclass (#28664)
muellerzr Feb 14, 2024
374fd5f
Fix flaky test vision encoder-decoder generate (#28923)
zucchini-nlp Feb 14, 2024
cf0740d
Mask Generation Task Guide (#28897)
merveenoyan Feb 14, 2024
f986e22
Add tie_weights() to LM heads and set bias in set_output_embeddings()…
hackyon Feb 14, 2024
bc70143
Backbone kwargs in config (#28784)
amyeroberts Feb 14, 2024
0f573da
[TPU] Support PyTorch/XLA FSDP via SPMD (#28949)
alanwaketan Feb 14, 2024
f5e4cd4
FIX [`Trainer` / tags]: Fix trainer + tags when users do not pass `"t…
younesbelkada Feb 14, 2024
6bc05d0
[`CLeanup`] Revert SDPA attention changes that got in the static kv c…
ArthurZucker Feb 14, 2024
02b36e5
Fix static generation when compiling! (#28937)
ArthurZucker Feb 15, 2024
78869f9
Add cuda_custom_kernel in DETA (#28989)
SangbumChoi Feb 15, 2024
1b07b38
DeformableDetrModel support fp16 (#29013)
DonggeunYu Feb 15, 2024
e323f7b
Fix copies between DETR and DETA (#29037)
amyeroberts Feb 15, 2024
9da245c
FIX: Fix error with `logger.warning` + inline with recent refactor (#…
younesbelkada Feb 15, 2024
2d4d8f6
Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)
amyeroberts Feb 15, 2024
e0f288b
Removed obsolete attribute setting for AQLM quantization. (#29034)
Feb 15, 2024
37b72cb
Fix a tiny typo in `generation/utils.py::GenerateEncoderDecoderOutput…
sadra-barikbin Feb 15, 2024
935475a
add test marker to run all tests with @require_bitsandbytes (#28278)
Titus-von-Koeller Feb 16, 2024
77dbf80
Update all references to canonical models (#29001)
LysandreJik Feb 16, 2024
f73b3db
Update important model list (#29019)
LysandreJik Feb 16, 2024
5c8232f
Fix max_length criteria when using inputs_embeds (#28994)
zucchini-nlp Feb 16, 2024
d6b75b7
Support : Leverage Accelerate for object detection/segmentation model…
Tanmaypatil123 Feb 16, 2024
e1474fd
fix num_assistant_tokens with heuristic schedule (#28759)
jmamou Feb 16, 2024
0e5bca7
fix failing trainer ds tests (#29057)
pacman100 Feb 16, 2024
f210572
`auto_find_batch_size` isn't yet supported with DeepSpeed/FSDP. Raise…
pacman100 Feb 16, 2024
61f6af6
Honor trust_remote_code for custom tokenizers (#28854)
rl337 Feb 16, 2024
2e6542a
Feature: Option to set the tracking URI for MLflowCallback. (#29032)
seanswyi Feb 16, 2024
78078c1
Fix trainer test wrt DeepSpeed + auto_find_bs (#29061)
muellerzr Feb 16, 2024
a56576d
Add chat support to text generation pipeline (#28945)
Rocketknight1 Feb 16, 2024
10f6c60
[Docs] Spanish translation of task_summary.md (#28844)
aaronjimv Feb 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 266 additions & 0 deletions docs/source/es/tasks/image_captioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Subtitulación de Imágenes

[[open-in-colab]]

La subtitulación de imágenes es la tarea de predecir un subtítulo para una imagen dada. Las aplicaciones comunes en el mundo real incluyen
ayudar a personas con discapacidad visual que les puede ayudar a navegar a través de diferentes situaciones. Por lo tanto, la subtitulación de imágenes
ayuda a mejorar la accesibilidad del contenido para las personas describiéndoles imágenes.

Esta guía te mostrará cómo:

* Ajustar un modelo de subtitulación de imágenes.
* Usar el modelo ajustado para inferencia.

Antes de comenzar, asegúrate de tener todas las bibliotecas necesarias instaladas:

```bash
pip install transformers datasets evaluate -q
pip install jiwer -q
```

Te animamos a que inicies sesión en tu cuenta de Hugging Face para que puedas subir y compartir tu modelo con la comunidad. Cuando se te solicite, ingresa tu token para iniciar sesión:

```python
from huggingface_hub import notebook_login

notebook_login()
```

## Cargar el conjunto de datos de subtítulos BLIP de Pokémon

Utiliza la biblioteca 🤗 Dataset para cargar un conjunto de datos que consiste en pares {image-caption}. Para crear tu propio conjunto de datos de subtitulación de imágenes
en PyTorch, puedes seguir [este cuaderno](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/GIT/Fine_tune_GIT_on_an_image_captioning_dataset.ipynb).

```python
from datasets import load_dataset

ds = load_dataset("lambdalabs/pokemon-blip-captions")
ds
```
```bash
DatasetDict({
train: Dataset({
features: ['image', 'text'],
num_rows: 833
})
})
```

El conjunto de datos tiene dos características, `image` y `text`.

<Consejo>

Muchos conjuntos de datos de subtitulación de imágenes contienen múltiples subtítulos por imagen. En esos casos, una estrategia común es muestrear aleatoriamente un subtítulo entre los disponibles durante el entrenamiento.

</Consejo>

Divide el conjunto de entrenamiento del conjunto de datos en un conjunto de entrenamiento y de prueba con el método [`~datasets.Dataset.train_test_split`]:

```python
ds = ds["train"].train_test_split(test_size=0.1)
train_ds = ds["train"]
test_ds = ds["test"]
```

Vamos a visualizar un par de muestras del conjunto de entrenamiento.

```python
from textwrap import wrap
import matplotlib.pyplot as plt
import numpy as np


def plot_images(images, captions):
plt.figure(figsize=(20, 20))
for i in range(len(images)):
ax = plt.subplot(1, len(images), i + 1)
caption = captions[i]
caption = "\n".join(wrap(caption, 12))
plt.title(caption)
plt.imshow(images[i])
plt.axis("off")


sample_images_to_visualize = [np.array(train_ds[i]["image"]) for i in range(5)]
sample_captions = [train_ds[i]["text"] for i in range(5)]
plot_images(sample_images_to_visualize, sample_captions)
```

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/sample_training_images_image_cap.png" alt="Sample training images"/>
</div>

## Preprocesar el conjunto de datos

Dado que el conjunto de datos tiene dos modalidades (imagen y texto), la canalización de preprocesamiento preprocesará las imágenes y los subtítulos.

Para hacerlo, carga la clase de procesador asociada con el modelo que estás a punto de ajustar.

```python
from transformers import AutoProcessor

checkpoint = "microsoft/git-base"
processor = AutoProcessor.from_pretrained(checkpoint)
```

El procesador preprocesará internamente la imagen (lo que incluye el cambio de tamaño y la escala de píxeles) y tokenizará el subtítulo.

```python
def transforms(example_batch):
images = [x for x in example_batch["image"]]
captions = [x for x in example_batch["text"]]
inputs = processor(images=images, text=captions, padding="max_length")
inputs.update({"labels": inputs["input_ids"]})
return inputs


train_ds.set_transform(transforms)
test_ds.set_transform(transforms)
```

Con el conjunto de datos listo, ahora puedes configurar el modelo para el ajuste fino.

## Cargar un modelo base

Carga ["microsoft/git-base"](https://huggingface.co/microsoft/git-base) en un objeto [`AutoModelForCausalLM`](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM).

```python
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(checkpoint)
```

## Evaluar

Los modelos de subtitulación de imágenes se evalúan típicamente con el [Rouge Score](https://huggingface.co/spaces/evaluate-metric/rouge) o Tasa de Error de Palabra ([Word Error Rate](https://huggingface.co/spaces/evaluate-metric/wer), por sus siglas en inglés). Para esta guía, utilizarás la Tasa de Error de Palabra (WER).

Usamos la biblioteca 🤗 Evaluate para hacerlo. Para conocer las limitaciones potenciales y otros problemas del WER, consulta [this guide](https://huggingface.co/spaces/evaluate-metric/wer).

```python
from evaluate import load
import torch

wer = load("wer")


def compute_metrics(eval_pred):
logits, labels = eval_pred
predicted = logits.argmax(-1)
decoded_labels = processor.batch_decode(labels, skip_special_tokens=True)
decoded_predictions = processor.batch_decode(predicted, skip_special_tokens=True)
wer_score = wer.compute(predictions=decoded_predictions, references=decoded_labels)
return {"wer_score": wer_score}
```

## ¡Entrenar!

Ahora, estás listo para comenzar a ajustar el modelo. Utilizarás el 🤗 [`Trainer`] para esto.

Primero, define los argumentos de entrenamiento usando [`TrainingArguments`].

```python
from transformers import TrainingArguments, Trainer

model_name = checkpoint.split("/")[1]

training_args = TrainingArguments(
output_dir=f"{model_name}-pokemon",
learning_rate=5e-5,
num_train_epochs=50,
fp16=True,
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
gradient_accumulation_steps=2,
save_total_limit=3,
evaluation_strategy="steps",
eval_steps=50,
save_strategy="steps",
save_steps=50,
logging_steps=50,
remove_unused_columns=False,
push_to_hub=True,
label_names=["labels"],
load_best_model_at_end=True,
)
```

Luego pásalos junto con los conjuntos de datos y el modelo al 🤗 Trainer.

```python
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_ds,
eval_dataset=test_ds,
compute_metrics=compute_metrics,
)
```

Para comenzar el entrenamiento, simplemente llama a [`~Trainer.train`] en el objeto [`Trainer`].

```python
trainer.train()
```

Deberías ver cómo disminuye suavemente la pérdida de entrenamiento a medida que avanza el entrenamiento.

Una vez completado el entrenamiento, comparte tu modelo en el Hub con el método [`~Trainer.push_to_hub`] para que todos puedan usar tu modelo:

```python
trainer.push_to_hub()
```

## Inferencia

Toma una imagen de muestra de test_ds para probar el modelo.

```python
from PIL import Image
import requests

url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/pokemon.png"
image = Image.open(requests.get(url, stream=True).raw)
image
```

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/test_image_image_cap.png" alt="Test image"/>
</div>

Prepara la imagen para el modelo.

```python
device = "cuda" if torch.cuda.is_available() else "cpu"

inputs = processor(images=image, return_tensors="pt").to(device)
pixel_values = inputs.pixel_values
```

Llama a [`generate`] y decodifica las predicciones.

```python
generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_caption)
```
```bash
a drawing of a pink and blue pokemon
```

¡Parece que el modelo ajustado generó un subtítulo bastante bueno!
Loading