Add llm #2

Wovchena · 2023-10-16T15:43:22Z

Ticket 121865

Wovchena · 2023-10-16T16:14:29Z

@slyalin , @apaniukov , @ilya-lavrenov - see this PR. It is basically dependent on llama.cpp due to tokenization and this is not great to be honest., could we cover that differently with ov-tokenizers perhaps?

Yes, there is a Python full pipeline example that uses llama tokenizer, which can be used here, after tokenzier conversion: https://github.com/apaniukov/openvino_contrib/tree/tokenizer-fix-decode/modules/custom_operations/user_ie_extensions/tokenizer/python#text-generation-pipeline
python demos/thirdparty/llama.cpp/convert.py open_llama_3b_v2/ --vocab-only --outfile open_llama_3b_v2/vocab.gguf
We could provide similar convert_tokenizer.py cli script, to get tokenizer/detokenizer.xml from HF hub or disk.

@apaniukov is there C++ API for tokenization and detokenization?

apaniukov · 2023-10-16T16:54:18Z

@slyalin , @apaniukov , @ilya-lavrenov - see this PR. It is basically dependent on llama.cpp due to tokenization and this is not great to be honest., could we cover that differently with ov-tokenizers perhaps?

Yes, there is a Python full pipeline example that uses llama tokenizer, which can be used here, after tokenzier conversion: https://github.com/apaniukov/openvino_contrib/tree/tokenizer-fix-decode/modules/custom_operations/user_ie_extensions/tokenizer/python#text-generation-pipeline
python demos/thirdparty/llama.cpp/convert.py open_llama_3b_v2/ --vocab-only --outfile open_llama_3b_v2/vocab.gguf
We could provide similar convert_tokenizer.py cli script, to get tokenizer/detokenizer.xml from HF hub or disk.
@apaniukov is there C++ API for tokenization and detokenization?

Yes, there is. Right now is on openvin_contrib branch. Here is an instruction for building and installation.

It does not support llama tokenizers from tokenizer.json file, only for *.model files. All llama tokenizers should be interchangeable, so you can get it from here (or any other llama repo that have .model file).

Then you can convert tokenizer to OV models:

from transformers import AutoTokenizer
from openvino import save_model
from ov_tokenizer import init_extension, convert_tokenizer


init_extension("path/to/libuser_ov_extensions.so")

hf_tokenizer = AutoTokenizer.from_pretrained("microsoft/Llama2-7b-WhoIsHarryPotter")
ov_tokenizer, ov_detokenizer = convert_tokenizer(hf_tokenizer, with_decoder=True)

save_model(ov_tokenizer, "tokenizer.xml")
save_model(ov_detokenizer, "detokenizer.xml")

From here you can work with them like with any other OV model. You can also add similar postprocessing to the llama model to get out_token right away instead of logits.

Models cannot work with strings at this point, so you need to convert the input to uint8 tensor with a predefined format, see pack_strings. To get strings from the detokenizer uint8 output, see unpack_strings.

Wovchena · 2023-10-18T11:48:43Z

llm/llm.cpp

+        throw std::runtime_error("Model and vocab number of tokens don't match");
+    }
+    float* logits = ireq.get_tensor("logits").data<float>() + (prompt.size() - 1) * n_vocab;
+    ptrdiff_t out_token = std::max_element(logits, logits + n_vocab) - logits;


@yury-gorbachev, I was told that you requested to add beam search. Should it be a separate application or should I provide beam search implementation only given that beam size of 1 is greedy sampling?

I'm going to merge greedy sampling for now to unblock others

eaidova · 2023-10-20T15:41:03Z

llm/README.md

+
+## Supported models
+
+1. [LLaMA 2](https://huggingface.co/meta-llama/Llama-2-13b-hf)


probably any models of these families?

ilya-lavrenov · 2023-11-20T09:41:58Z

llm/cpp/llm.cpp

+    }
+    ireq.get_tensor("input_ids").set_shape(tokenizer.get_tensor("input_ids").get_shape());  // TODO: replace with ireq.set_tensor("input_ids", tokenizer.get_tensor("input_ids")); after it's fixed
+    ireq.get_tensor("attention_mask").set_shape(tokenizer.get_tensor("input_ids").get_shape());
+    std::copy_n(tokenizer.get_tensor("input_ids").data<int32_t>(), tokenizer.get_tensor("input_ids").get_size(), ireq.get_tensor("input_ids").data<int32_t>());


we have Tensor::copy_to, which can also allocate output tensor

ilya-lavrenov · 2023-11-20T10:05:14Z

llm/cpp/CMakeLists.txt

+    )
+else()
+    target_compile_options(llm PRIVATE -Wall)  # Display all warnings
+    target_compile_options(sentencepiece-static PRIVATE -Wno-stringop-overflow)  # Disable the warning from openvino_contrib


let's move this code to contrib

Verify beam_idx added, upgrade OpenVINO

Adding lib to bazel

…tion-pavel Integrate JinjaCpp

Added embedding pipeline and embedding handle, encapsulated handle code

NPU Static pipeline: Unroll SDPA - move K transpose to MatMul

Add llm

3320785

Ticket 121865

Wovchena mentioned this pull request Oct 16, 2023

demos: add llm openvinotoolkit/open_model_zoo#3866

Closed

Wovchena added 5 commits October 16, 2023 19:49

Remove demo

17a54ea

paths: llm

3b3e724

list

99bfe05

dict

ef58f25

**

30ed68e

3.7->3.11 because it was slow

7fc305f

andrei-kochin requested review from yury-gorbachev and eaidova October 17, 2023 08:19

Wovchena commented Oct 18, 2023

View reviewed changes

Wovchena added 6 commits October 20, 2023 17:21

Move to openvino_contrib

cb21f81

Remove OpenVINO_DIR

406b401

Fix user_ov_extensions path, supress warning

ea4f043

Disable the warning from openvino_contrib

5bfe1f3

Fix path to openvino_contrib

1ba965c

Fix path to convert_tokenizers.py

8c28f45

eaidova reviewed Oct 20, 2023

View reviewed changes

Wovchena added 4 commits October 20, 2023 20:03

Explain constants

3092d79

improve

4c6c6cb

Add cpp subfolder

f310169

/->-

e741596

Wovchena marked this pull request as ready for review October 23, 2023 10:51

Add --cache-dir

c68fa5e

Wovchena merged commit e2cff7b into openvinotoolkit:master Oct 23, 2023

ilya-lavrenov reviewed Nov 20, 2023

View reviewed changes

eaidova pushed a commit that referenced this pull request Dec 14, 2023

Merge pull request #2 from Wovchena/upgrade-openvino

09de2d9

Verify beam_idx added, upgrade OpenVINO

vshampor pushed a commit to vshampor/openvino.genai that referenced this pull request May 13, 2024

Merge pull request openvinotoolkit#2 from rasapala/ct-beam-search

e5de363

Adding lib to bazel

as-suvorov pushed a commit to as-suvorov/openvino.genai that referenced this pull request May 16, 2024

Merge pull request openvinotoolkit#2 from ilya-lavrenov/jinja-integra…

f274b93

…tion-pavel Integrate JinjaCpp

sammysun0711 added a commit that referenced this pull request Jul 23, 2024

Merge pull request #2 from sammysun0711/rag_sample_first_pr_chen

9fe988f

Added embedding pipeline and embedding handle, encapsulated handle code

mzegla mentioned this pull request Aug 6, 2024

Add an option to drop the request #732

Merged

github-merge-queue bot pushed a commit that referenced this pull request Nov 20, 2024

Merge pull request #2 from dmatveev/at/static-llm-pipeline-opt-layout

6c63270

NPU Static pipeline: Unroll SDPA - move K transpose to MatMul

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llm #2

Add llm #2

Wovchena commented Oct 16, 2023

Wovchena commented Oct 16, 2023

apaniukov commented Oct 16, 2023

Wovchena Oct 18, 2023

Wovchena Oct 23, 2023

eaidova Oct 20, 2023

ilya-lavrenov Nov 20, 2023

ilya-lavrenov Nov 20, 2023


		## Supported models

		1. [LLaMA 2](https://huggingface.co/meta-llama/Llama-2-13b-hf)

Add llm #2

Add llm #2

Conversation

Wovchena commented Oct 16, 2023

Wovchena commented Oct 16, 2023

apaniukov commented Oct 16, 2023

Wovchena Oct 18, 2023

Choose a reason for hiding this comment

Wovchena Oct 23, 2023

Choose a reason for hiding this comment

eaidova Oct 20, 2023

Choose a reason for hiding this comment

ilya-lavrenov Nov 20, 2023

Choose a reason for hiding this comment

ilya-lavrenov Nov 20, 2023

Choose a reason for hiding this comment