diff --git a/docs/ai-chat.md b/docs/ai-chat.md index 7543d39a3b..8c0fbf8d55 100755 --- a/docs/ai-chat.md +++ b/docs/ai-chat.md @@ -9,74 +9,70 @@ global: --- Protects against the following threat(s): +- [:material-server-network: Service Providers](basics/common-threats.md#privacy-from-service-providers){ .pg-teal } - [:material-account-cash: Surveillance Capitalism](basics/common-threats.md#surveillance-as-a-business-model){ .pg-brown } - [:material-close-outline: Censorship](basics/common-threats.md#avoiding-censorship){ .pg-blue-gray } -Since the release of ChatGPT in 2022, interactions with Large Language Models (LLMs) have become increasingly common. LLMs can help us write better, understand unfamiliar subjects, or answer a wide range of questions. Based on a vast amount of data scraped from the web, they can statistically predict the next word. +Since the release of ChatGPT in 2022, interactions with Large Language Models (LLMs) have become increasingly common. LLMs can help us write better, understand unfamiliar subjects, or answer a wide range of questions. They can statistically predict the next word based on a vast amount of data scraped from the web. -However, to improve the quality of LLMs, developers of AI software often use [Reinforcement Learning from Human Feedback](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback) (RLHF). This entails the possibility of AI companies reading your private AI chats as well as storing them, which introduces a risk of data breaches. Furthermore, there is a real possibility that an LLM will leak your private chat information in future conversations with other users. To solve these problems, you can use trusted and privacy-focused providers or run AI models locally so your data never leaves your device. +## Privacy Concerns about LLMs -
-Ethical and Privacy Concerns about LLMs +Data used to train AI models, however, include a massive amount of _private_ data. Developers of AI software often use [Reinforcement Learning from Human Feedback](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback) (RLHF) to improve the quality of LLMs, which entails the possibility of AI companies reading your private AI chats as well as storing them. This practice also introduces a risk of data breaches. Furthermore, there is a real possibility that an LLM will leak your private chat information in future conversations with other users. -AI models have been trained on massive amounts of public *and* private data. If you are concerned about these practices, you can either refuse to use AI, or use [truly open-source models](https://proton.me/blog/how-to-build-privacy-first-ai) which publicly release and allow you to inspect their training datasets. One such model is [Olmoe](https://allenai.org/blog/olmoe) made by [Allenai](https://allenai.org/open-data). +If you are concerned about these practices, you can either refuse to use AI, or use [truly open-source models](https://proton.me/blog/how-to-build-privacy-first-ai) which publicly release and allow you to inspect their training datasets. One such model is [Olmoe](https://allenai.org/blog/olmoe) made by [Allenai](https://allenai.org/open-data). -[Ethical concerns](https://www.thelancet.com/journals/landig/article/PIIS2588-7500(24)00061-X/fulltext) about AI range from their impact on climate to their potential for discrimination. - -
+Alternatively, you can run AI models locally so your data never leaves your device. AI models that are run locally offer a more private and secure alternative to cloud-based solutions, as your data never leaves your device and is therefore never shared with third-party providers. This provides peace of mind and allows you to share sensitive information to the local model. -For the best experience, a dedicated GPU with sufficient VRAM or a modern system with fast LPDDR5X memory is recommended. Fortunately, it is possible to run smaller models locally even without a high-end computer or dedicated GPU. A computer with at least 8GB of RAM will be sufficient to run smaller models at lower speeds. Below is a table with more precise information : +## Hardware for Local AI Models -
-Hardware Requirements for Local Models +Local models are also fairly accessible as it is possible to run smaller models on modest hardware. A computer with at least 8GB of RAM will be sufficient to run smaller models at lower speeds. Using more powerful hardware such as a dedicated GPU with sufficient VRAM or a modern system with fast LPDDR5X memory will offer the best experience. -Here are typical requirements for different model sizes: +LLMs can usually be differentiated by the number of parameters, which can vary between 1.3B to 405B. The higher the number of parameters, the higher the LLM's capabilities. For example, models below 6.7B parameters are only good for basic tasks like text summaries, while models between 7B and 13B are a great compromise between quality and speed. Models with advanced reasoning capabilities are generally around 70B. -- 7B parameter models: 8GB RAM minimum, 16GB recommended -- 13B parameter models: 16GB RAM minimum, 32GB recommended -- 70B parameter models: Dedicated GPU with 24GB+ VRAM recommended -- Quantized models (4-bit): Can run with roughly half these requirements +For consumer-grade hardware, it is generally recommended to use [quantized models](https://huggingface.co/docs/optimum/en/concept_guides/quantization) for the best balance between model quality and performance. Check out the list below for more precise information about the typical requirements for different sizes of quantized models. -
+| Model Size (in Parameters) | Minimum RAM | Minimum Processor | +|---|---|---| +| 7B | 8GB | Modern CPU (AVX2 support) | +| 13B | 16GB | Modern CPU (AVX2 support) | +| 70B | 72GB | GPU with VRAM | -To run AI locally, you need both an AI client and an AI model. +To run AI locally, you need both an AI model and an AI client. -## Downloading AI models +## AI Models + +### Find and Choose a Model There are many permissively licensed **models available to download**. **[Hugging Face](https://huggingface.co/models?library=gguf)** is a platform that lets you browse, research, and download models in common formats like GGUF. Companies that provide good open-weights models include big names like Mistral, Meta, Microsoft, and Google. But there are also many community models and 'fine-tunes' available. For consumer-grade hardware, it is generally recommended to use [quantized models](https://huggingface.co/docs/optimum/en/concept_guides/quantization) for the best balance between model quality and performance. To help you choose a model that fits your needs, you can look at leaderboards and benchmarks. The most widely-used leaderboard is [LM Arena](https://lmarena.ai/), a "Community-driven Evaluation for Best AI chatbots". There is also the [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard), which focus on the performance of open-weights models on common benchmarks like MMLU-PRO. However, there are also specialized benchmarks which measure factors like [emotional intelligence](https://eqbench.com/), ["uncensored general intelligence"](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard), and many [others](https://www.nebuly.com/blog/llm-leaderboards). -
-Model Security and Verification +### Model Security -When downloading AI models, especially from Hugging Face, it's important to verify their authenticity. Look for: +When you have found an AI model of your liking, you should download it in a safe manner. When you use an AI client that maintains their own library of model files (such as [Ollama](#ollama) and [Llamafile](#llamafile)), you should download it from there. However, if you want to download models not present in their library, or use an AI client that doesn't maintain its library (such as [Kobold.cpp](#koboldcpp)), you will need to take extra steps to ensure that the AI model you download is safe and legitimate. -- Model cards with clear documentation -- Verified organization badge -- Community reviews and usage statistics -- **When available**, verify the file checksum (a type of anti-tampering fingerprint). On Hugging Face, you can find the hash by: +We recommend downloading model files from Hugging Face, as it provides several features to verify that your download is genuine and safe to use. - 1. Clicking on a model file - 2. Looking for "Copy SHA256" button below the file - 3. Comparing this hash with the one you get after downloading (using tools like `sha256sum` on Linux/macOS or `certutil -hashfile file SHA256` on Windows) +To check the authenticity and safety of the model, look for: -Those steps help ensure you're not downloading potentially malicious models. +- Model cards with clear documentation +- A verified organization badge +- Community reviews and usage statistics +- A "Safe" badge next to the model file (Hugging Face only) +- Matching checksums[^1] + - On Hugging Face, you can find the hash by clicking on a model file and looking for the **Copy SHA256** button below it. You should compare this checksum with the one from the model file you downloaded. -
+A downloaded model is generally safe if they satisfy all of the above checks. ## AI chat clients -| Feature | [Kobold.cpp](#koboldcpp) | [Ollama](#ollama) | [Llamafile](#llamafile) | -|---------|------------|---------|-----------| -| GPU Support | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-check:{ .pg-green } | -| Image Generation | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } | -| Speech Recognition | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } | -| Auto-download Models | :material-close:{ .pg-red } | :material-check:{ .pg-green } | :material-alert-outline:{ .pg-orange } Few models available | -| Custom Parameters | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-alert-outline:{ .pg-orange } | -| Multi-platform | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-alert-outline:{ .pg-orange } Size limitations on Windows | +| Local Client | GPU Support | Image Generation | Speech Recognition | Automatically Downloaded Models | Custom Parameters | +|---|---|---|---|---|---| +| [Kobold.cpp](#koboldcpp) | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-check:{ .pg-green } | +| [Ollama](#ollama) | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } | :material-check:{ .pg-green } | :material-close:{ .pg-red } | +| [Llamafile](#llamafile) | :material-check:{ .pg-green } | :material-close:{ .pg-red } | :material-close:{ .pg-red } | :material-alert-outline:{ .pg-orange } Few models available | :material-alert-outline:{ .pg-orange } | ### Kobold.cpp @@ -194,3 +190,5 @@ Our best-case criteria represent what we *would* like to see from the perfect pr - Should be easy to download and set up, such as having a one-click install process. - Should have a built-in model downloader option. - Should be customizable (the user can modify the LLM paramaters, such as its system prompt or its temperature). + +[^1]: A file checksum is a type of anti-tampering fingerprint. A developer usually provides a checksum in a text file that can be downloaded separately, or on the download page itself. Verifying that the checksum of the file you downloaded matches the one provided by the developer helps ensure that the file is genuine and wasn't tampered with in transit. You can use commands like `sha256sum` on Linux and macOS, or `certutil -hashfile file SHA256` on Windows to generate the downloaded file's checksum.