Transcript
I had my entire weekend planned [music] out. I was going to lock in and use the most powerful AI model on the planet, Fable 5, to build this crazy idea I’ve been sitting on. Then Friday at 5:21 p.m., the US government sent Anthropic a letter. And by Friday night, the model was gone, disabled for everyone. No warning, no appeal. And I sat there thinking about how fragile this whole thing actually is. We’ve all been building our businesses, our workflows, our entire creative process on top of models that live on someone else’s servers, controlled by someone else’s terms, one government letter away from disappearing. So this weekend, I’m not building with any frontier [music] models. Not none. And this is the episode I needed to make. By the end of this episode, you’re going to understand what local models are, why they suddenly matter more than they did a week ago, [music] exactly which ones to use, what hardware you need, and a few startup ideas that only exist because intelligence [music] now runs on your desk for free. I think it’s opened up a bunch of money-making opportunities that I’m going to share by the end of this >> [music] >> So, let me paint the picture of what actually happened because the lesson is bigger than just this one model ban. So, Frontier models are incredible. You know, I’ll be the first to say that. Nobody’s arguing that. But they all share the same weakness. You don’t own them. You rent access. And rented access could be revoked at any time by a government, by a policy change, by a pricing change. Like they could just make it so expensive that it’s, you know, you can’t access it. By the company deciding your use case, violating a term you didn’t read. We just watched this happen in real time. The single most powerful model on Earth is gone overnight. And I want to just be clear that I’m not anticloud. I use these cloud models every day and the cloud models are going to be the strongest. You know, they’re going to be better than local models just in terms of like you’re getting the best possible stuff. They’re the smartest tool available. But what it’s taught me is that you do need to own a part of your stack. You need a layer that nobody can take away from you. And the way I think about it is like electricity. Most of the time you’re happy being on the grid, right? It’s cheaper. It’s easier. someone else maintains it. But the people who are truly resilient have a generator on in the garage. You know, a hurricane comes and lights go out, well, they got this generator that that continues going and they can actually use their stuff. Um, local models are basically that generator for you. And I know what a lot of people are going to say in the comments. They’re going to be like, “Well, local models aren’t good at all.” And it’s just not that true anymore. I think the switch probably happened about 6 months ago. Two years ago, running a model on your laptop was literally garbage. Maybe a year ago, too. Um, but today, a model that runs on a gaming GPU or a decent Mac is good enough for about I would say 80% of what most people use things like ChatGPT or cloud for. the gap between free and local [snorts] and expensive cloud uh closed faster than I think a lot of people expected, including myself. So, let’s actually talk about what a local model is. And I want to make it dead simple. You know how I am on this channel and this podcast. I’d like to just dumb it down uh for myself and for you because I don’t want to scare people off. A local model is an AI model that runs entirely on your own computer. You don’t need internet. You don’t need an API key. And you don’t need per token cost. No company is watching what you do. You just download the model file once and from that point on it’s yours. It runs on your machine the same way a video game or a photo editor might run on your machine. And that’s really it. That’s the whole concept. We don’t need to over complicate it. Basically, the intelligence lives on your hardware instead of someone else’s. And you get three main things that you don’t get with cloud models. The first thing you get is privacy. Your data never leaves your machine. Um, and it’s ju this isn’t nice for you just personally. Um, it’s an entire unlock for selling to a bunch of different uh industries that you might want to sell to like healthcare or legal or finance industries that legally cannot send their data to a third party API. And there’s actually a ton of those industries. Um, we’re going to talk about uh more of that when we get into the startup ideas. So, let’s let’s put put a hold on that. The second main point is you get zero marginal cost. So after you’ve got the hardware and of course you do need to spend uh money on hardware and hardware is getting more and more expensive. Um but after you’ve got the hardware uh every query is free uh it’s unlimited and you can run a model 24 hours a day for a month and your bill is just going to be the electricity. Um that does really change the math on an entire category of products and it opens up a lot. The third thing is nobody can turn it off. Uh, the model on your drive works whether or not the company that made it even exists. Uh, whether a government likes it or not doesn’t matter. Uh, whether or not your internet is up. It works on an airplane. Uh, it works in a bunker. It just works. Um, so yes, you get a lot you know, you get some main benefits, but with every everything in life, there’s pros and there’s cons. So, let’s talk about what the trade-offs uh are, cuz I don’t really want to sell you a fantasy. I’m not here to sell you a fancy. I’m here to tell you what are the pros, what are the cons, and how to and what I what I think is interesting about it. The trade-off is that local models are generally not as smart as the absolute frontier models. The biggest open models can match the cloud, but they need serious serious hardware. Um, and there’s, you know, you’ll see people on X and they are doing insane things with local and a lot of the times is they’re they’re spending 5, 10, 15, $20,000 on machines. The ones that run on a normal laptop are a notch below the best cloud models. Um, but the way I’m starting to think about it and reframing it is you don’t need frontier intelligence for most tasks. You need good enough intelligence that’s private, free, and always on. And then you got to match the right model to the right job. And that’s becoming a whole new skill set. And we’re going to get to that. So, um, how do we get good at local models, which is something that I’m spending my weekend trying to figure out and sharing everything in real time. This is really the meat of this, uh, episode. If you really want to get good at this and not just nod along and watching YouTube videos and podcasts, here’s the order I’d learn it in. Uh, the first is start with runtime. Everyone gets this backwards. They go hunting for the perfect model before they can even run one. That’s the wrong order. The first thing you download is the runtime, the program that actually runs models on your machines. There’s two main names to know, Ollama and LM Studio. Ollama is usually the favorite of a lot of my developer friends because it runs from the command line. Uh it’s it’s relatively uh simple um because it’s one command it and then it runs the model. But LM Studio is the one I’d start non-technical people on because it has a real interface. It’s got a model browser. Uh you click and it runs. Um, and there’s no terminal and you know those things are scary. Um, the this is sort of the part that a lot of people over complicate it. Um, just download one of these first, whichever one seems uh to resonate with you more and you’ll have a model running in, you know, 10, 15, 20 minutes. The second thing is you’re going to want to match the model to your hardware. A model’s size is measured in billions of parameters. You’ll see numbers like four uh 4 billion, 12 billion, uh 27 billion, 70 billion. Bigger basically means smarter, but bigger also means more memory to run. The single most useful thing to understand in this entire episode is the rough mapping of model size to hardware. A 4 billion model runs on basically anything. An 8 GB laptop, uh even a lot of phones. A 12 billion model is the sweet spot for a machine with 16 GB of RAM. This is where most people should live. A 27 to 35 billion model needs a really good Mac with 30 GB or more or a dedicated GPU. This is where it starts feeling genuinely capable. Uh, in my experience, a 7 billion and up model needs serious hardware. a maxed out Mac Studio or a dedicated box like an Nvidia uh DGX Sparks Spark with a 128 GB uh unified memory. The DGX Spark is interesting and I’ve talked about it on on this podcast before because it’s purposely built for exactly this 128 GB of memory decides to stay on 24/7. It runs Linux and it’s really becoming uh the default for AI box on your desk for people who are you know serious. I’m not affiliated with Nvidia. Um just what I’m noticing in the industry you run your model on it uh you leave it running and connect it uh you connect to it from from your phone. Um so your desk becomes this almost mini at least the way I see it as a mini uh data center. The third uh third thing to know is the third main thing to know is uh knowing which model for which job. Um there’s obviously a bunch of models and I can’t I don’t have enough time to cover all of them, but I’ll give you the four main ones that you know you need to know about. Qwen 3 and the new 3.6 series. The best all-around choice I think for most people. It’s Alibaba’s open model uh family. It’s it’s quite strong at coding, strong at multilingual. It’s clean commercial license. They’ve got a 27 billion and a 35 billion uh versions. And they it feels like it punches above its weight. It outperforms previous generation models, four times their size. Um if you only learn one, this is probably the one to learn. Um, but that’s that’s one of them. The other one is DeepSeek. You’ve probably heard of Deepseek. Um, this is uh quite good at hard thinking and coding problems. Um, but heads up uh the reasoning models take 10 to 30 seconds to think before they uh before they answer uh before they answer. And that’s normal. Uh if you install DeepS and you’re like, why is it taking so long? That’s just usually uh what I’ve seen it takes about 10 to 30 seconds. Uh the third is Gemma. Um and this is Google’s open model. Um and if I was Google right now, I would be, you know, launching a new version of Gemma right now and just taking advantage of this moment. Um this one runs remarkably small. Uh there’s actually a version that fits in 16 GB of RAM. Uh and that one that’s the one that can fit on your phone. Uh it’s beautiful, clean writing. Um the fact that Google gives this away for free uh is actually crazy. Um and I wouldn’t be surprised if Google double downs on this uh in the future. Then there’s Llama by Meta. It’s it’s uh really become very important in the whole open ecosystem. It’s got a huge community, a ton of fine tunes. It’s got a lot of tutorials that you can go and check out. it runs almost anywhere. Um, so when in doubt, there’s probably a llama for your situation. The fourth main point uh that you should learn around local models is what’s called quantization. Um, this no one really talks about and it’s a really important trick with respect to local models. Um, and quantization is this concept of shrinking a model so it runs on weaker hardware with barely any loss in quality. Uh, the analogy I think of uh, for this is a raw model is like a uncompressed photo. Quantization is like saving a high quality JPEG. It’s a lot smaller and your eye really can tell the difference. When you’re downloading models, you’ll see labels like Q4 or Q5. Quantization is like uh that’s the compression level. Um that’s the quantization compression level. And Q4 roughly has the memory uh a model needs with pretty minimal quality loss. uh and this is how a model that supposedly needs a server ends up running smoothly on your laptop. So understanding this concept is really key. Um and uh you know is is like it’s key because it’s it’s the thing that makes your hardware suddenly do twice as much. Uh the fifth main point is you’re going to want to connect to your agent. Um, so running a model and chatting with it is cool, but the real unlock is pointing an agent at your local model. So you can use something like Hermes to do that. I’ve covered Hermes. I think last week I did an episode on Hermes desktop app. You can go check that out. Hermes is the most used agent in the world right now, I would say. Uh, it’s definitely gaining the most amount of hype and buzz and it’s actually built specifically to run locally and never stop. You point a Hermes profile at your local model and now you have an agent that runs free, runs offline, remembers everything, writes its own skills and you can message it over, you know, your messaging app of choice like Telegram or whatever while the heavy work runs on the box of your desk. So, super cool. Again, I have that episode that I did um last week that I’ll include in the description if people want to watch it and learn more about agent profiles and pointing it uh to local models. Um so that’s those are the key points I would say around what do I need to know about local models? Um that helps you get you know up and running. Um but you know what are you know how do we take it to the next ne next level? How do we separate the pros from the tourist? One is uh the context window is your real constraint locally. So cloud models hand you a giant context window for free. That’s the way to think about it. Local models make you pay for it in memory. So the bigger the context, the more RAM it eats. So keep your sessions tight, super tight, and don’t dump your entire life into one thread or your machine is going to choke and you’re going to be like, “Local models aren’t very good.” You’re going to want to give your local model tools. So a small local model with web search, file access, the ability to run code beats a giant model with none. The capability gap closes fast when you wire up the right tools. So, think about it as the model is the engine and the tools are the wheels. Now, common thing that happens with local models is sometimes it forgets um your tools. I don’t know if other people have noticed this. Um so, I’m still trying to I you know, I’m learning in real time, you know, how to how to get the most of it, how it how it doesn’t forget, but just know that that is something that is a quirk that, you know, as of recording this June 2026, uh that happens. Um sometimes um remember that privacy is the killer feature here. So everything is running offline. Your data is not leaving the machine. Um and just you know I’ll talk about that actually more with the startup ideas and how how you can leverage that. Um, the last thing I’ll say about, you know, just concepts that separate the pros from the tourists, um, it’s actually super helpful to run a small local model versus a frontier cloud model side by side for a week. Um, because that actually helps you build the instinct. I think it’s the fastest way to build the instinct actually. And you’ll be shocked with how often the free local model is good enough. So you’re going to see yourself stop reaching for the expensive option for things a 12 billion, you know, handles fine. And that instinct, knowing what to run where is the skill that I, you know, we’re trying to learn here. This this whole Fable 5 moment of being banned and stuff like that. That is just it’s just a wakeup call for us to learn how to do local local models. And that’s probably why you’re here listening to me talk about it today. So I wanted to give you this is the start I want to give you some startup ideas. I mean after all this is the startup ideas podcast. I’m here not only to clarify how you know how you learn how to use AI and be practical but I also am here for helping you uh get your creative juices flowing around startup ideas that only exist uh you know for a certain reason. And there are some startup ideas that only exist now because local models exist and because a lot of people I mean this is mainstream news. A lot of people are seeing like hey these cloud models could get banned. So there’s going to be a huge amount of demand in my opinion for local models over the next few years. So one startup idea I wanted to give you is ondevice AI for regulated industries. So this is a big one. We kind of talked about it earlier, but healthcare, legal, finance, they have money, they have problems AI can solve, but they legally cannot send their data to a cloud API. So, a product where the model runs entirely on the customer’s device. The data never leaves the building. That opens a market that the cloud-based competitors can’t enter right now. So, that privacy constraint is your uh your remote and you just start selling to these types of people. Uh the second startup idea is you basically you sell it as the data your data never leaves version of existing AI tools. So you know go you know pick any popular cloud AI product notetakers meeting summaries uh document analyzers and then you just build local versions of those products. It’s the same product, but the pitch is basically nothing you give us touches the internet. And you slap that on to the main value proposition of the landing page. Uh you do it for lawyers, you do it for doctors, therapists, and anyone handling sensitive documents. Um that is the sentence that might help close the deal. Third startup idea, the airgapped agent for sensitive operations. So, some businesses can’t be online at all for security reasons, defense contractors, uh, certain financial operations, anyone paranoid about leaks. So, you do an agent setup that runs fully offline on local hardware, um, and they’re going to have, you know, willingness to pay. So it’s not just the startup idea number one is just regul uh regulated industries but startup ideas number three is is really around uh leakages and sensitive operations. Um so you might have not such a sensitive industry but they have a sensitive operation. That’s that niche. Um the fourth idea I have for you is offline AI for places with no internet. So ships, planes, rural clinics, field operations, disaster zones, um you know, useful AI, useful agents that work with zero internet is a product the entire cloud industry simply just can’t serve. Um and then the last idea I’ll give you is resilience as a service. So after this weekend, every serious company is going to be asking, “What happens to our AI workflows if our provider gets cut off?” and you just sell the answer. So it’s basically a fallback layer that kicks in when cloud models disappears. So you’re selling insurance against exactly what happened with the Fable 5 banning. you know, overall this has been I’m still like processing the news and stuff like that, but what I keep coming back to um is this um this weekend uh for me was supposed to be about building with the most powerful model on the planet, but instead it became about something more durable. The lesson isn’t that cloud is bad and local is good. I don’t want that. That’s not the case. The lesson is don’t build your entire life on something that can disappear with a single letter. Own a part of your stack. Have the generator in the garage. Local models are the insurance. And this is the weekend I finally bought the policy. Um, and you know, when you play with uh these local models, you’re going to learn that yes, they’re not perfect. Yes, they’re not, you know, the most powerful model on the planet, but for 60% 70% 80% of routine tasks, they’re actually quite good. And there’s a huge range of those use cases. So, over the next few days, uh, I encourage you to play with these, you know, don’t just watch this or listen to this and nod. download Ollama or LM Studio, pull Qwen 3, run it, uh, point Hermes at it, pick a real task, and force yourself to do it entirely local. And that’s really how this all all the stuff clicks. Um, and once you actually play with it and you get your hands dirty, uh, you’ll understand a little bit more of what I’m saying. And so then next time something gets banned or something gets priced out of you know oblivion um you can still run your business, you can still ship your ideas, you can still uh do things and you know in the best case scenario is you have cloud models doing XYZ and local models doing ABC. Um, if this was interesting to you, you learned a thing or two, um, do me a do yourself a favor. Actually, I was going to say do me a favor, but do a like, a comment, and subscribe. That just means more of this stuff is going to appear in your feed. Um, and also tells me I should, you know, continue doing this and sharing what I’m learning in real time. Um, I hope uh I hope you build something cool. I hope you learned a thing or two. I’m rooting for you. Now, go build something today that nobody could turn off. And I’ll see you in the next one. Take care and have a creative day.