Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add zephyr support in chatbot notebook #1447

Merged
merged 3 commits into from
Nov 13, 2023

Conversation

eaidova
Copy link
Collaborator

@eaidova eaidova commented Nov 9, 2023

No description provided.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CalebXDonoho
Copy link

Hello, in running the notebook on HuggingFaceH4/zephyr-7b-beta, I noticed that fp16 and int8 convert; however, int4 runs into an nncf error as shown attached- claiming that the conversion to int4 cannot happen due to lack of support on bf16.I have been running these tests on SPR HBM Max 9480 if that helps! Are there additional setup steps I should be taking or is the int4 quantization not possible on this machine? Thanks for your help! nncf_conversion_error

@ryanloney
Copy link
Contributor

I was able to run on my 10th Gen Core Windows laptop with 32GB of RAM and generate int4 compressed IR, then run on CPU. Trying iGPU next.
image

@MaximProshin
Copy link
Contributor

Hello, in running the notebook on HuggingFaceH4/zephyr-7b-beta, I noticed that fp16 and int8 convert; however, int4 runs into an nncf error as shown attached- claiming that the conversion to int4 cannot happen due to lack of support on bf16.I have been running these tests on SPR HBM Max 9480 if that helps! Are there additional setup steps I should be taking or is the int4 quantization not possible on this machine? Thanks for your help! nncf_conversion_error

I guess it's SPR specifics where bf16 was enabled in the conversion step. @eaidova , is it possible to avoid it?

@eaidova
Copy link
Collaborator Author

eaidova commented Nov 10, 2023

@CalebXDonoho could you please try to replace installation optimum-intel to my branch instead of provided in notebook ()?

%pip install git+https://github.com/eaidova/optimum-intel.git@ea/fp32_dtype

if it helps, then I'll submit fix to optimum-intel

@brmarkus
Copy link

I have seen (early and exotic variants of with early BSP, BIOS) SPR CPUs "not supporting BF16"; lscpu hasn't revealed the corresponding instruction sets...

@eaidova
Copy link
Collaborator Author

eaidova commented Nov 10, 2023

@CalebXDonoho could you please try to execute following code in your env and share with me results:

import torch
print(torch.get_default_dtype())

Looks like on your system the defult dtype set to torch.bfloat16 that lead to converting model with preserving this dtype

@CalebXDonoho
Copy link

hello @eaidova . This is what I see when attempting the steps discussed prior on my SPR HBM upon reinstall optimum-intel using the steps provided. This should also show the default dtype I am using. Let me know if I need to do further tests
screenshot_1110

@eaidova eaidova merged commit 353739a into openvinotoolkit:main Nov 13, 2023
14 checks passed
@eaidova eaidova deleted the ea/zephyr_chat branch November 13, 2023 06:19
adrianboguszewski added a commit that referenced this pull request Dec 18, 2023
* Initial commit for a NLP recipe

* Added code to convert and export red pajama model

* Simple chatbot app

* Added support for llama2 models

* Removed support for redpajama

* Add possibility to quantize weights for the chat model

* WIP chatbot app

* Added access token

* Removed access token

* Improved app to behave as a chat

* Simple gradio interface

* Small changes in the app

* Changed virtual assistant to conversational agent

* Fixed llama quantization issues

* Added bark inference utils

* Feedback changes for bark script and requirements

* Updated the documentation for functions and fixed requirements

* model directory and individual model names update

* Removed the use_small at unnecesary placeholders

* Requirements updates

* Small changes in tts conversion script

* Fixes in bark utils

* Free resources after conversion, restore use_small parameter

* add zephyr support in chatbot notebook (#1447)

* add zephyr support in chatbot notebook

* update readme

* change int8 compression path

* support whisper-large-v3 (#1449)

* whisper model selection (#1450)

* add model selection

* upd quantization

* update SD pipeline import (#1452)

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* substitute a new method in the original pipeline (#1451)

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* fix api migration issues (#1454)

* fix controlnet conversion for 2023.2 (#1453)

* fix controlnet conversion for 2023.2

* update other notebooks

* update transformers version according to optimum requirements (#1455)

* fix openvino-nightly install (#1456)

* fix lcm notebook running with GPU (#1457)

* Listing all notebooks in one file (#1458)

* align torch specific install (#1459)

* Fix TOC links (#1460)

* Image generation with Segmind Stable Diffusion 1B (SSD-1B) (#1437)

* SSD-B1

* pep8

* spelling

* spelling

* Comparison with SDXL

* Add output

* Standalone notebook footer

* Fixes

* Fixes

* Update notebooks/248-stable-diffusion-xl/README.md

Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>

* Table of contents

---------

Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>

* Add note for video codec (#1461)

Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>

* calibrate upcast to f32 for pajama, tiny SD and t5 encoder (#1382)

* use calibrate partially upcast to FP32

* fix for pajama

* successfully saved rt_info, gradio works fine for RedPajama

* redpajama: better placement and input for calibrate

* renamed to model_upcast_utils.py, added tiny SD, added proper saving into redpajama

* turn off debug messages: silent=True

* upcast DeepFloyd T5

* finalize T5 DeepFloyd

* corrected name of downloaded script

* RedPajama final working revision

* tiny SD ready for review

* revert tiny SD

* add explanations why we call calibrate/upcast_partially_to_fp32

* fix spelling errors

* update .pyspelling.wordlist.txt

* resolved conflict

* code check fix

* rename rt_info

* resolved conflicts

* reverted redpajama, double-checked T5

* resolve conflicts, fully revert redpajama

* uncomment model_upcast_utils.py download

* add Chinese models in LLM chatbot (#1448)

* add qwen and chatglm2

add qwen and chatglm2

add qwen and chatglm2

add qwen and chatglm2

add qwen and chatglm2

add qwen and chatglm2

* fix the CI issues

fix the CI issues

* Added text to speech part to the interface and inference pipeline

* move chatglm patch out of converter.py (#1463)

move chatglm patch out of converter.py

* update Chinese README (#1462)

* update README

update README

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

* Update README_cn.md

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>

---------

Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>
Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>

* update openvino version (#1465)

* update openvino version

* update docker

* fix ov version and links in notebook (#1466)

* solve chatglm model converson isse (#1467)

solve chatglm model converson isse

* update nncf release (#1468)

* Select configuration for int4 compression (#1469)

* update model usage for compression dolly (#1474)

* update model usage for compression dolly

* Update 240-dolly-2-instruction-following.ipynb

* Fix chatglm2 quantization (#1473)

* Fix chatglm2 quantization

* Update 254-llm-chatbot.ipynb

* fix ChatGLM patching issue (#1475)

update

* Interface changes - added audio inputs

* Added a fix for llm-chatbot int8 weight compression in case fp16 model already exists (#1479)

* Update README.md (#1481)

* Update README.md

Reducing gif for faster load time

* Update README.md

* move model class to a separate file (#1477)

* update the chatglm reshape function

update

update

update

* mover model class to a separate file

mover model class to a separate file

move model class to a separate file

move model class to a separate file

* move chatglm patch to converter.py

move chatglm patch to converter.py

update

update

* remove model cache for int8 converter

* Show off AudioLDM2 model (#1464)

* draft pr

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* adapting gpt-2

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* update ov pipeline

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* gpt-2 functional

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* functioning pipeline with gradio

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* add text

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* ready notebook

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* update readme

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* ordering imports

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* rename notebook and add to ignore list

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* flake8 fix

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* spelling fix

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* review fixes

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* remove cell output

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* update model name

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* updated author org name

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* base model folder

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* spelling fix

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* add mo link

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

---------

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>

* Add notebook for ControlNet + LCM LoRA (#1478)

* Add notebook for ControlNet + LCM LoRA

* add text

* grammar and code style

* fix dependencies install

* install accelerate

* fix step

* Set seed in LCM notebook for quantization (#1485)

* Fix git for film in readme (#1486)

* Update README.md

* Change TensorFlow Hub links to Kaggle models (#1482)

* Change TensorFlow Hub links to Kaggle models

Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>

* Update links for direct download

Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>

---------

Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>

* Update .pyspelling.wordlist.txt

add SVTR to the list

* Connected synthesize function to the app

* Fixed paths for bark

* Update Readme for notebook 406 PaddleOCR-webcam (#1487)

* Update README.md

* Update .pyspelling.wordlist.txt

add SVTR to the list

* Added whisper to the app pipeline

* Applied chat template from tokenizer

* Small changes

* Allowed llama int4 quantization

* Added comments

* fix ci issues (#1490)

* fix get_box None (#1491)

* return red-pajama back working in llmchatbot (#1492)

* Tweak INT4 parameters for pajama model (#1493)

* notebooks improvements (#1496)

* Bump cryptography from 41.0.5 to 41.0.6 in /.docker (#1497)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.5 to 41.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.5...41.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump aiohttp from 3.8.6 to 3.9.0 in /.docker (#1498)

Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.6 to 3.9.0.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](aio-libs/aiohttp@v3.8.6...v3.9.0)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix distil whisper install if optimum exists (#1501)

* ignore pascal voc dataset website in links check (#1502)

* Update 263-latent-consistency-models-image-generation.ipynb (#1504)

We should turn on the safety filter by default to avoid any NSFW images to be generated by default.

* apply feedback (#1503)

* add mistral to chatbot notebook (#1505)

* add mistral to chatbot notebook

* Update notebooks/254-llm-chatbot/254-llm-chatbot.ipynb

* Add SDXL turbo notebook (#1499)

* Add SDXL turbo notebook

* Apply suggestions from code review

* Add colab links check (#1507)

* add neural chat (#1506)

* Added quantization to SDXL-Turbo notebook (#1508)

* Added quantization to SDXL-Turbo notebook

* refactoring

* Apply comments

* add notus (#1509)

* Bump jupyter-server from 2.9.1 to 2.11.2 in /.docker (#1515)

Bumps [jupyter-server](https://github.com/jupyter-server/jupyter_server) from 2.9.1 to 2.11.2.
- [Release notes](https://github.com/jupyter-server/jupyter_server/releases)
- [Changelog](https://github.com/jupyter-server/jupyter_server/blob/main/CHANGELOG.md)
- [Commits](jupyter-server/jupyter_server@v2.9.1...v2.11.2)

---
updated-dependencies:
- dependency-name: jupyter-server
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Create test.cpp

* Add files via upload

* Delete recipes/conversational_voice_agent/test.cpp

* [Hackathon] Add notebook for Paint-by-Example (#1029)

* Add notebook for Paint-by-Example

* Add initial notebook template for paint by example
* Add gradio interfaces
* Update gradio output to use OpenVINO inference pipeline
* Update notebook to use Paint-By-Example pretrained model
* Add readme, descriptive sections, and some standards compliance changes

---------
Co-authored-by: Lee <jason.lee@intel.com>
Co-authored-by: Michelle J Nieman <michelle.j.nieman@intel.com>
Co-authored-by: Edmund Leemhuis <edmund.leemhuis@intel.com>
Co-authored-by: Angeline Alfred <angeline.alfred@intel.com>
Signed-off-by: Poonam Gupta <poonam.gupta@intel.com>

* update documenation, fix format, update readme docs

* Fix static analysis findings, fix image size in readme

* rename from 239 -> 246

* add detailed model pipeline flowchart

* fix spelling

* fix spelling

* Create output folder if it does not exist

* Skip paint-by-example for treon

* Update for OpenVINO 2023.1.0

* Fix deprecation messages

* Add openvino version to pip install

* set version for gradio, added selector for device, and other fixes from code review

* rename to 272

* add code to download images and remove them from repo

* convert image in doc to markdown

* including outputs with notebook

* fix mode of files back to 644

* fix mode of one more file back to 644

* Apply suggestions from code review

---------

Co-authored-by: Lee <jason.lee@intel.com>
Co-authored-by: Adrian Boguszewski <adekboguszewski@gmail.com>

* Add files via upload

* Remove openvino-dev from 272-paint-by-example (#1518)

* add tiny llama to chatbot notebook (#1516)

* Added quantization to LCM LoRA and ControlNet notebook (#1513)

* Added quantization to LCM LoRA and ControlNet notebook

* Change demo

* minor fixes

* apply comments

* fix spell

* disable GPU

* minor fix

* Small fixes in the ASR conversion script

* The assistant works for a car dealer now

* Small changes in TTS conversion script

* Fixed bark generation issues

* Encodec Model inclusion

* Readme update

* Readme updates

* Small changes in encodec conversion

* Minor readme changes

* Added encodec model to the pipeline

* Export only decoder model

* Use bark with IPEX instead of OpenVINO

* Adding the conversational agent to the overall readme

* Updated with image for the recipe

* Update recipes/README.md

---------

Signed-off-by: igor-davidyuk <igor.davidyuk@intel.com>
Signed-off-by: Ilya Trushkin <ilya.trushkin@intel.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: AnishaUdayakumar <anisha.udayakumar@intel.com>
Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com>
Co-authored-by: Igor Davidyuk <igor.davidyuk@intel.com>
Co-authored-by: Aleksandr Mokrov <aleksandr.mokrov@intel.com>
Co-authored-by: Ilya Trushkin <ilya.trushkin@intel.com>
Co-authored-by: Pavel Esir <pavel.esir@gmail.com>
Co-authored-by: Ethan Yang <ethan.yang@intel.com>
Co-authored-by: Zhuo Wu <zhuo.wu@intel.com>
Co-authored-by: Nikita Savelyev <nikita.savelyev@intel.com>
Co-authored-by: Raymond Lo <raymond.lo@intel.com>
Co-authored-by: Liubov Talamanova <liubov.talamanova@intel.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Alexander Suvorov <alexander.suvorov@intel.com>
Co-authored-by: Edmund Leemhuis <103226580+eleemhui@users.noreply.github.com>
Co-authored-by: Lee <jason.lee@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants