MATRIX: Where Your GPT Model Goes to Die

Source: YouTube Video Creator: Code AI Lab (David Andre) Status: ✅ Analyzed (transcript captured)

Summary

David Andre makes the case for running uncensored AI models locally — arguing that prolonged use of censored cloud models will “fine-tune you” to their biases. He walks through setting up Super Gemma 4 26B Uncensored GGUF V2 via Ollama and open-sources an auto-research jailbreak loop for any model.

The MATRIX Concept

“If you use the LLM for many years, it will start to fine-tune you. Whatever model you talk to day-to-day, that model will influence you more than you influence that model.”

The “Matrix” is the idea that cloud models (ChatGPT, Claude, Gemini) subtly shape your thinking with their built-in biases — you don’t notice it because it’s gradual, but over time the model trains you, not the other way around.

Legitimate Use Cases for Uncensored Models

Use CaseWhy Censored Models Fail
Cybersecurity / Malware analysisRefuses to describe how attacks work
Pen testing / Red teamingCan’t advise on exploitation
Political analysisModels are heavily left-leaning
Fiction / Creative writingRefuses adult, dark, or violent themes
Journalism / OSINTRefuses extremist content analysis
Medical / Sexual healthGuardrails block legitimate questions
Mental health journalingConcerns about “harmful content”
Confidential business docsData leaves your machine

How Refusals Actually Work

Refusals are built into the model weights during training — not just system prompts. You can’t “trick” them with prompt engineering on the cloud layer because:

  • Input filters → Hidden system prompt → Fine-tuned model (RLHF) → Output classifier → Policies
  • Locally: Prompt → Model. That’s it. Full control.

Liberation Methods

Two main approaches to remove refusals:

  1. Obliteration — Surgically find and delete the specific weights that cause refusal behavior (no retraining needed)
  2. Fine-tuning on uncensored datasets — Train on thousands of examples where the model answers freely

The strongest uncensored models combine both: obliterate first, then fine-tune to restore quality.

The Model: Super Gemma 4 26B Uncensored GGUF V2

  • Base: Google’s Gemma 4 26B
  • Creator: Jeong Song (South Korea)
  • Size: ~16GB download, needs ~20GB VRAM
  • Run via: ollama run hf.co/jeong-song/Super-Gemma-4-26B-Uncensored-GGUF-V2
  • Speed: ~200 tok/s on M-series Mac with 128GB RAM, ~40-50 tok/s on 32GB

The Auto-Research Jailbreak Loop

David open-sourced a repo that automates jailbreak discovery for any model:

  • Agent 1 (Reviewer): Tries prompts with hidden “bad stuff”
  • Agent 2 (Judge): Evaluates whether the model answered
  • Runs hundreds/thousands of prompt variations autonomously
  • Works on ChatGPT, Claude, Gemini, Grok — any model

Full Transcript

📄 View Full Transcript