add zephyr support in chatbot notebook #1447

eaidova · 2023-11-09T17:08:12Z

No description provided.

review-notebook-app · 2023-11-09T17:08:17Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

CalebXDonoho · 2023-11-09T23:54:30Z

Hello, in running the notebook on HuggingFaceH4/zephyr-7b-beta, I noticed that fp16 and int8 convert; however, int4 runs into an nncf error as shown attached- claiming that the conversion to int4 cannot happen due to lack of support on bf16.I have been running these tests on SPR HBM Max 9480 if that helps! Are there additional setup steps I should be taking or is the int4 quantization not possible on this machine? Thanks for your help!

ryanloney · 2023-11-10T01:28:42Z

I was able to run on my 10th Gen Core Windows laptop with 32GB of RAM and generate int4 compressed IR, then run on CPU. Trying iGPU next.

MaximProshin · 2023-11-10T06:25:52Z

Hello, in running the notebook on HuggingFaceH4/zephyr-7b-beta, I noticed that fp16 and int8 convert; however, int4 runs into an nncf error as shown attached- claiming that the conversion to int4 cannot happen due to lack of support on bf16.I have been running these tests on SPR HBM Max 9480 if that helps! Are there additional setup steps I should be taking or is the int4 quantization not possible on this machine? Thanks for your help!

I guess it's SPR specifics where bf16 was enabled in the conversion step. @eaidova , is it possible to avoid it?

eaidova · 2023-11-10T06:55:37Z

@CalebXDonoho could you please try to replace installation optimum-intel to my branch instead of provided in notebook ()?

%pip install git+https://github.com/eaidova/optimum-intel.git@ea/fp32_dtype

if it helps, then I'll submit fix to optimum-intel

brmarkus · 2023-11-10T08:32:53Z

I have seen (early and exotic variants of with early BSP, BIOS) SPR CPUs "not supporting BF16"; lscpu hasn't revealed the corresponding instruction sets...

eaidova · 2023-11-10T09:37:40Z

@CalebXDonoho could you please try to execute following code in your env and share with me results:

import torch
print(torch.get_default_dtype())

Looks like on your system the defult dtype set to torch.bfloat16 that lead to converting model with preserving this dtype

CalebXDonoho · 2023-11-11T01:57:00Z

hello @eaidova . This is what I see when attempting the steps discussed prior on my SPR HBM upon reinstall optimum-intel using the steps provided. This should also show the default dtype I am using. Let me know if I need to do further tests

* Initial commit for a NLP recipe * Added code to convert and export red pajama model * Simple chatbot app * Added support for llama2 models * Removed support for redpajama * Add possibility to quantize weights for the chat model * WIP chatbot app * Added access token * Removed access token * Improved app to behave as a chat * Simple gradio interface * Small changes in the app * Changed virtual assistant to conversational agent * Fixed llama quantization issues * Added bark inference utils * Feedback changes for bark script and requirements * Updated the documentation for functions and fixed requirements * model directory and individual model names update * Removed the use_small at unnecesary placeholders * Requirements updates * Small changes in tts conversion script * Fixes in bark utils * Free resources after conversion, restore use_small parameter * add zephyr support in chatbot notebook (#1447) * add zephyr support in chatbot notebook * update readme * change int8 compression path * support whisper-large-v3 (#1449) * whisper model selection (#1450) * add model selection * upd quantization * update SD pipeline import (#1452) Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * substitute a new method in the original pipeline (#1451) Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * fix api migration issues (#1454) * fix controlnet conversion for 2023.2 (#1453) * fix controlnet conversion for 2023.2 * update other notebooks * update transformers version according to optimum requirements (#1455) * fix openvino-nightly install (#1456) * fix lcm notebook running with GPU (#1457) * Listing all notebooks in one file (#1458) * align torch specific install (#1459) * Fix TOC links (#1460) * Image generation with Segmind Stable Diffusion 1B (SSD-1B) (#1437) * SSD-B1 * pep8 * spelling * spelling * Comparison with SDXL * Add output * Standalone notebook footer * Fixes * Fixes * Update notebooks/248-stable-diffusion-xl/README.md Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * Table of contents --------- Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * Add note for video codec (#1461) Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com> * calibrate upcast to f32 for pajama, tiny SD and t5 encoder (#1382) * use calibrate partially upcast to FP32 * fix for pajama * successfully saved rt_info, gradio works fine for RedPajama * redpajama: better placement and input for calibrate * renamed to model_upcast_utils.py, added tiny SD, added proper saving into redpajama * turn off debug messages: silent=True * upcast DeepFloyd T5 * finalize T5 DeepFloyd * corrected name of downloaded script * RedPajama final working revision * tiny SD ready for review * revert tiny SD * add explanations why we call calibrate/upcast_partially_to_fp32 * fix spelling errors * update .pyspelling.wordlist.txt * resolved conflict * code check fix * rename rt_info * resolved conflicts * reverted redpajama, double-checked T5 * resolve conflicts, fully revert redpajama * uncomment model_upcast_utils.py download * add Chinese models in LLM chatbot (#1448) * add qwen and chatglm2 add qwen and chatglm2 add qwen and chatglm2 add qwen and chatglm2 add qwen and chatglm2 add qwen and chatglm2 * fix the CI issues fix the CI issues * Added text to speech part to the interface and inference pipeline * move chatglm patch out of converter.py (#1463) move chatglm patch out of converter.py * update Chinese README (#1462) * update README update README * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> * Update README_cn.md Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> --------- Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> * update openvino version (#1465) * update openvino version * update docker * fix ov version and links in notebook (#1466) * solve chatglm model converson isse (#1467) solve chatglm model converson isse * update nncf release (#1468) * Select configuration for int4 compression (#1469) * update model usage for compression dolly (#1474) * update model usage for compression dolly * Update 240-dolly-2-instruction-following.ipynb * Fix chatglm2 quantization (#1473) * Fix chatglm2 quantization * Update 254-llm-chatbot.ipynb * fix ChatGLM patching issue (#1475) update * Interface changes - added audio inputs * Added a fix for llm-chatbot int8 weight compression in case fp16 model already exists (#1479) * Update README.md (#1481) * Update README.md Reducing gif for faster load time * Update README.md * move model class to a separate file (#1477) * update the chatglm reshape function update update update * mover model class to a separate file mover model class to a separate file move model class to a separate file move model class to a separate file * move chatglm patch to converter.py move chatglm patch to converter.py update update * remove model cache for int8 converter * Show off AudioLDM2 model (#1464) * draft pr Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * adapting gpt-2 Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * update ov pipeline Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * gpt-2 functional Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * functioning pipeline with gradio Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * add text Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * ready notebook Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * update readme Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * ordering imports Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * rename notebook and add to ignore list Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * flake8 fix Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * spelling fix Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * review fixes Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * remove cell output Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * update model name Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * updated author org name Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * base model folder Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * spelling fix Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * add mo link Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> --------- Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> * Add notebook for ControlNet + LCM LoRA (#1478) * Add notebook for ControlNet + LCM LoRA * add text * grammar and code style * fix dependencies install * install accelerate * fix step * Set seed in LCM notebook for quantization (#1485) * Fix git for film in readme (#1486) * Update README.md * Change TensorFlow Hub links to Kaggle models (#1482) * Change TensorFlow Hub links to Kaggle models Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com> * Update links for direct download Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com> --------- Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com> * Update .pyspelling.wordlist.txt add SVTR to the list * Connected synthesize function to the app * Fixed paths for bark * Update Readme for notebook 406 PaddleOCR-webcam (#1487) * Update README.md * Update .pyspelling.wordlist.txt add SVTR to the list * Added whisper to the app pipeline * Applied chat template from tokenizer * Small changes * Allowed llama int4 quantization * Added comments * fix ci issues (#1490) * fix get_box None (#1491) * return red-pajama back working in llmchatbot (#1492) * Tweak INT4 parameters for pajama model (#1493) * notebooks improvements (#1496) * Bump cryptography from 41.0.5 to 41.0.6 in /.docker (#1497) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.5 to 41.0.6. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@41.0.5...41.0.6) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump aiohttp from 3.8.6 to 3.9.0 in /.docker (#1498) Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.6 to 3.9.0. - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](aio-libs/aiohttp@v3.8.6...v3.9.0) --- updated-dependencies: - dependency-name: aiohttp dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix distil whisper install if optimum exists (#1501) * ignore pascal voc dataset website in links check (#1502) * Update 263-latent-consistency-models-image-generation.ipynb (#1504) We should turn on the safety filter by default to avoid any NSFW images to be generated by default. * apply feedback (#1503) * add mistral to chatbot notebook (#1505) * add mistral to chatbot notebook * Update notebooks/254-llm-chatbot/254-llm-chatbot.ipynb * Add SDXL turbo notebook (#1499) * Add SDXL turbo notebook * Apply suggestions from code review * Add colab links check (#1507) * add neural chat (#1506) * Added quantization to SDXL-Turbo notebook (#1508) * Added quantization to SDXL-Turbo notebook * refactoring * Apply comments * add notus (#1509) * Bump jupyter-server from 2.9.1 to 2.11.2 in /.docker (#1515) Bumps [jupyter-server](https://github.com/jupyter-server/jupyter_server) from 2.9.1 to 2.11.2. - [Release notes](https://github.com/jupyter-server/jupyter_server/releases) - [Changelog](https://github.com/jupyter-server/jupyter_server/blob/main/CHANGELOG.md) - [Commits](jupyter-server/jupyter_server@v2.9.1...v2.11.2) --- updated-dependencies: - dependency-name: jupyter-server dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Create test.cpp * Add files via upload * Delete recipes/conversational_voice_agent/test.cpp * [Hackathon] Add notebook for Paint-by-Example (#1029) * Add notebook for Paint-by-Example * Add initial notebook template for paint by example * Add gradio interfaces * Update gradio output to use OpenVINO inference pipeline * Update notebook to use Paint-By-Example pretrained model * Add readme, descriptive sections, and some standards compliance changes --------- Co-authored-by: Lee <jason.lee@intel.com> Co-authored-by: Michelle J Nieman <michelle.j.nieman@intel.com> Co-authored-by: Edmund Leemhuis <edmund.leemhuis@intel.com> Co-authored-by: Angeline Alfred <angeline.alfred@intel.com> Signed-off-by: Poonam Gupta <poonam.gupta@intel.com> * update documenation, fix format, update readme docs * Fix static analysis findings, fix image size in readme * rename from 239 -> 246 * add detailed model pipeline flowchart * fix spelling * fix spelling * Create output folder if it does not exist * Skip paint-by-example for treon * Update for OpenVINO 2023.1.0 * Fix deprecation messages * Add openvino version to pip install * set version for gradio, added selector for device, and other fixes from code review * rename to 272 * add code to download images and remove them from repo * convert image in doc to markdown * including outputs with notebook * fix mode of files back to 644 * fix mode of one more file back to 644 * Apply suggestions from code review --------- Co-authored-by: Lee <jason.lee@intel.com> Co-authored-by: Adrian Boguszewski <adekboguszewski@gmail.com> * Add files via upload * Remove openvino-dev from 272-paint-by-example (#1518) * add tiny llama to chatbot notebook (#1516) * Added quantization to LCM LoRA and ControlNet notebook (#1513) * Added quantization to LCM LoRA and ControlNet notebook * Change demo * minor fixes * apply comments * fix spell * disable GPU * minor fix * Small fixes in the ASR conversion script * The assistant works for a car dealer now * Small changes in TTS conversion script * Fixed bark generation issues * Encodec Model inclusion * Readme update * Readme updates * Small changes in encodec conversion * Minor readme changes * Added encodec model to the pipeline * Export only decoder model * Use bark with IPEX instead of OpenVINO * Adding the conversational agent to the overall readme * Updated with image for the recipe * Update recipes/README.md --------- Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com> Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: AnishaUdayakumar <anisha.udayakumar@intel.com> Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> Co-authored-by: Igor Davidyuk <igor.davidyuk@intel.com> Co-authored-by: Aleksandr Mokrov <aleksandr.mokrov@intel.com> Co-authored-by: Ilya Trushkin <ilya.trushkin@intel.com> Co-authored-by: Pavel Esir <pavel.esir@gmail.com> Co-authored-by: Ethan Yang <ethan.yang@intel.com> Co-authored-by: Zhuo Wu <zhuo.wu@intel.com> Co-authored-by: Nikita Savelyev <nikita.savelyev@intel.com> Co-authored-by: Raymond Lo <raymond.lo@intel.com> Co-authored-by: Liubov Talamanova <liubov.talamanova@intel.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> Co-authored-by: Alexander Suvorov <alexander.suvorov@intel.com> Co-authored-by: Edmund Leemhuis <103226580+eleemhui@users.noreply.github.com> Co-authored-by: Lee <jason.lee@intel.com>

add zephyr support in chatbot notebook

dea5e50

update readme

e8312c3

change int8 compression path

2db8a3c

eaidova force-pushed the ea/zephyr_chat branch from fe4d108 to 2db8a3c Compare November 10, 2023 13:27

eaidova merged commit 353739a into openvinotoolkit:main Nov 13, 2023
14 checks passed

eaidova deleted the ea/zephyr_chat branch November 13, 2023 06:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add zephyr support in chatbot notebook #1447

add zephyr support in chatbot notebook #1447

eaidova commented Nov 9, 2023

review-notebook-app bot commented Nov 9, 2023

CalebXDonoho commented Nov 9, 2023

ryanloney commented Nov 10, 2023

MaximProshin commented Nov 10, 2023

eaidova commented Nov 10, 2023

brmarkus commented Nov 10, 2023

eaidova commented Nov 10, 2023

CalebXDonoho commented Nov 11, 2023

add zephyr support in chatbot notebook #1447

add zephyr support in chatbot notebook #1447

Conversation

eaidova commented Nov 9, 2023

review-notebook-app bot commented Nov 9, 2023

CalebXDonoho commented Nov 9, 2023

ryanloney commented Nov 10, 2023

MaximProshin commented Nov 10, 2023

eaidova commented Nov 10, 2023

brmarkus commented Nov 10, 2023

eaidova commented Nov 10, 2023

CalebXDonoho commented Nov 11, 2023