The Local AI Manual
The Strategy
Most AI tutorials focus on API keys and monthly subscriptions. Tebian's "AI Mode" focuses on your GPU. This guide explains how to run Large Language Models (LLMs) like Llama 3 and Mistral locally on your hardware using Ollama. No cloud. No telemetry. No censorship.
We use a C-based runner that talks directly to your CUDA (NVIDIA) or ROCm (AMD) cores, ensuring maximum performance for your private brain.
1. The Ollama Engine
Ollama is a lightweight, C-based runner for LLMs. It handles the quantization and memory management for your models, allowing them to fit into your VRAM.
- Hardware Acceleration: Auto-detects NVIDIA/AMD GPUs for 100% speed.
- Quantization: Reduces model size (e.g., 8GB to 4GB) with 99% accuracy.
- REST API: Allows other apps (like our
t-ask) to talk to the model.
2. The `t-ask` CLI Assistant
Tebian includes t-ask, a Go-based (and soon Rust-based) CLI tool that connects your terminal to your local AI. You can summarize files, write code, and answer questions without leaving your shell.
- Piping Support:
cat logs.txt | t-ask "Find the error". - System Prompts: Pre-configured "Developer," "Writer," and "Admin" personas.
- Context Aware: Remembers your previous questions for a seamless chat experience.
3. Choosing Your Model
Tebian's "AI Menu" provides one-click downloads for the world's most capable local models.
- Llama 3 (Meta): The current king of open-weight models. Great for general tasks.
- Mistral: Highly efficient and fast. Perfect for mobile or low-power machines.
- CodeLlama: Specialized for programming and debugging.
4. Memory Management (VRAM)
Local AI is memory-intensive. To get the best speed, your model should fit entirely into your GPU's **VRAM**. Tebian's setup script helps you pick the right model size for your hardware.
- 4GB VRAM: Use 3B or smaller models.
- 8GB VRAM: Use 7B or 8B models (Llama 3).
- 12GB+ VRAM: Use 13B models or run multiple small models at once.
Why Local AI on Tebian?
By running AI as a system utility on a stable Debian base, you turn your computer into a true Intelligent Workstation. You aren't just a user of AI; you are the host. It's faster, more private, and completely free. One ISO. One menu. One Private Brain.