Definitive Manual

The AI Bible: Part 1

Training the Sovereign Brain: Fine-Tuning Local LLMs on Consumer Hardware.

The Difference Between Running and Knowing

Running a local LLM (inference) is like reading a book. Fine-tuning an LLM is like writing one. Most AI guides stop at "How to chat with Llama 3." Tebian goes further. We provide the tools to teach Llama 3 your code style, your documentation, and your way of thinking.

This treatise explains the mathematics of QLoRA (Quantized Low-Rank Adapters), a technique that allows you to fine-tune massive 70B parameter models on a single consumer GPU (like an RTX 3090 or 4090). This is the frontier of digital sovereignty.

1. The Mathematics of QLoRA

Training a full model requires updating billions of weights. This usually takes hundreds of gigabytes of VRAM. QLoRA solves this by freezing the main model (in 4-bit quantized mode) and only training a tiny "Adapter" layer on top of it.

The Memory Equation

With QLoRA, the memory requirement is drastically reduced:

  • Base Model (Frozen): 7B model @ 4-bit = ~5GB VRAM.
  • Adapter (Trainable): 64MB of parameters = ~200MB VRAM.
  • Gradients/Optimizer: ~2GB VRAM.

Total VRAM: ~8GB. This means you can fine-tune a state-of-the-art model on a standard gaming laptop running Tebian. We provide the Axolotl configuration scripts to automate this process.

2. Dataset Curation: Garbage In, Garbage Out

The secret to a good AI isn't the model; it's the data. Tebian includes tools to convert your existing digital life into a training dataset.

  • Git Scraper: Turn your GitHub repos into `instruction/response` pairs. (e.g., "Write a function to connect to Redis" -> [Your Code]).
  • Obsidian/Markdown Parser: Turn your personal notes into a knowledge graph.
  • Chat Log Cleaner: Sanitize your Signal/Matrix logs to teach the AI your "voice."

We use Apache Arrow format for high-performance data loading, ensuring your GPU isn't waiting on your CPU during training.

3. The Training Loop (Unsloth)

Tebian optimizes the training loop using Unsloth, a library that rewrites the PyTorch kernels in manual Triton assembly. It makes fine-tuning 2x faster and uses 50% less memory than standard HuggingFace scripts.

The Sovereign Workflow

  1. Prepare: Run `tebian-train prepare` to tokenize your dataset.
  2. Train: Run `tebian-train start`. Watch the loss curve in real-time via a local TensorBoard instance.
  3. Merge: Once trained, merge the adapter back into the base model or keep it separate for runtime loading in Ollama.

4. The Privacy Implication

Why fine-tune locally? Because if you upload your company's code or your medical records to OpenAI for "fine-tuning," you have lost control of that data. It is now part of their ecosystem.

When you train locally on Tebian, the weights are yours. The data is yours. The resulting intelligence is a permanent asset that lives on your hard drive. You can copy it to a USB stick, put it in a safe, or deploy it to your air-gapped server. It is Private Property.

Conclusion: Building the Exocortex

The future belongs to those who own their intelligence. By fine-tuning local models, you are building an "Exocortex"—an external extension of your own mind that knows what you know but thinks at the speed of silicon. Tebian is the foundry for this new organ.