Skip to content

旨在利用大模型对多种输入(文本,语言,图像)进行识别并进行操作,仍在起步阶段。Aims to recognise and manipulate multiple inputs (text, speech, images) using large models, still in its start part.

License

Notifications You must be signed in to change notification settings

Menghuan1918/Smartinput

Repository files navigation


Smartinput

License Releases Commit PR


We're rolling out the macOS app to Plus users starting today, and we will make it more broadly available in the coming weeks. We also plan to launch a Windows version later this year.

Here's a paragraph from GPT-4o's introductory page about the desktop client, without Linux anything of course... So I decided to try to make one myself! And make it have better support for more models. Its intended to run on native models, but is also compatible with the online API.

🗺️ ENGLISH | 简体中文

Note

Unless otherwise noted, all videos or images shown are on 3060M mobile graphics cards running llama3-8B on ollama, and the videos are not accelerated or multiplied!

What's new

  • New showtext window, now able to stream output/edit final text(show in gif here:)

    New_window

  • Windows support(⚠️ Note: Windows currently only supports listening to the clipboard)

    Win

  • Listening Mode Selection

    Select

  • Processing mode selection:Direct processing/pop-up secondary confirmation

    Go

Functions

Sit-click window to drag and drop, middle-click window to copy content, right-click window to hide, tray menu for mode switching.

Currently only global underline word fetching and translation or explanation is implemented, see the video:

2024-05-17.23-56-30.mp4

tary Traslate Coding!

Models Supported

Multiple models are supported! As long as the model's API is compatible with the OpenAI format it will work, just change APIKEY, llm_model, endpoint in the config file. Including but not limited to (the bolded ones are verified, although the others are theoretically fine):

  • Ollama
  • Groq
  • Deepseek
  • Openai
  • Yi Models
  • Zhipu
  • Moonshot

Of course, those accessed using one-api are fully supported.

Configuration

By default the configuration file is saved in $HOME/.config/Smartinput/config.

  • APIKEY: secret key, empty by default.
  • llm_model: model name, default
  • max_tokens: the maximum tokens for a single reply from the model, default 4000
  • temperature: temperature of the model, default 0.3
  • endpoint: model request
  • proxies: not implemented yet, just put it here.
  • timeout: timeout of the request, default is 60 seconds.
  • max_retry: maximum retries, default is 3.
  • font: the font used, default is DejaVu Sans
  • font_size: font size, default is 12.
  • lang: the detected language, it will be recognised automatically when you start it for the first time. You can also change it after.The software interface and answer target language will follow this setting.

Use

Important

Since xclip is used, it does not support reading selected text in wayland windows! Also please install xclip before using it:

Ubuntu/Debian: sudo apt install xclip

Arch/Manjaro:sudo pacman -S xclip.

Download the packaged binary from releases.The configuration file is automatically stored in $HOME/.config/Smartinput/config after the first run

Linux

Unzip and run Refresh_Desktopfile.sh, it will be automatically installed in the system desktop

Windows

Unzip and run main.exe

Or you can clone the source code to use it, please refer to:

conda create -n smart_input python=3.12
conda activate smart_input
pip install -r requirements.txt
python main.py

TODO

  • Improve smoothness
  • Allow customised prompts
  • Shortcut Binding
  • Wayland support
  • Completing the further Q&A interface
  • Reading files using RAGs

Reference and Learning

The code references many designs from other great projects that have inspired me as well as given me ideas, thanks to the selfless developers! In no particular order:

https://github.com/bianjp/popup-dict

https://github.com/binary-husky/gpt_academic

About

旨在利用大模型对多种输入(文本,语言,图像)进行识别并进行操作,仍在起步阶段。Aims to recognise and manipulate multiple inputs (text, speech, images) using large models, still in its start part.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published