Imagine if you could use Hermes Agent 100% privately and free on your own computer. In this video, I'm going to show you exactly how to power Hermes Agent 100% locally. So, you can build and use your Hermes operating system from $0 and have everything completely private with no limitations, rate limits, and work anywhere in the world with no internet. And I'm also going to cover a brand new Hermes update that makes this even easier, even if you're a complete beginner. And if you're new, I'm Jack. I built and sold my left tech starter with a gazillion customers. Now I build my own AI companies and I share on this channel the stuff that actually works. So if you haven't already, grab that beautiful coffee and let's jump straight in. So Hermes Agent is the world's number one AI agent assistant right now. And with Alama, we can own our own intelligence. And as somebody who recently scored a perfect 100 on an IQ test, I've been speaking a lot about intelligence recently. Now the idea here is it's a private AI and we're going to own it on your own machine. Now one core concept that you have to understand before we even get started here is that local AI is the direction travel. It is the future. Jensen Huang at Nvidia conference very very recently had this to say that everybody who uses computers today as a tool, every engineer, every creative artist will need an AI supercomputer. By the way, and if you watch the end of this video, you're going to know exactly how to run Hermes agent on your phone from anywhere um completely private and locally hosted. Now, the idea here with local AI is that it runs on your laptop. Your data stays in the room and you never get a monthly bill because it's 100% private. You physically own it. Everything and the data is not going to open AI or anthropic. It's completely yours. Now, if you think about the analogy that Jensen used here is that in a phone in the 1990s, a beautiful decade to be born if you ask me, the whole purpose and concept of the phone was that we would basically just make calls. And I'm told uh with a great degree of confidence that people used to have these kind of bricks on the side of their heads walking around. Whereas today we effectively do everything except make phone calls. It probably counts at least in my case probably 2% of my actual phone use is on calls. And Jensen's philosophy here which I think is really interesting obviously Jensen being the CEO of Nvidia is that today the one thing you don't do with your phone is make phone calls. You just about do everything else. his um direction and philosophy which he believes very strongly and is backing Nvidia's future on is the idea that the same will be true with your computer that we will all have these supercomputers. So understanding exactly how to go local and how that connects with your agent and claude code is going to be an incredibly powerful skill and I'm going to show you all that with no fluff in this video. So here's the thing. The idea is we're going to have a Hermes operating system which is your entire world in one place. Now we can speak to Hermes. We can run our beautiful Hermes operating system that shows our entire memory system uh and covers everything that we possibly may want to use. We can chat with it directly, see our connections, come down here, even look at goals. We can build personas and skills. We can even connect this to GitHub, do beautiful things, see our information, get skills, and even view documents in a beautiful configured way. This is the power of the operating system, which is really cool. Now, the idea of why we want to do a private operating system is going to be really interesting. So the idea here is that we have one home for every app. We can see our usage everywhere. We can schedule and run things. We read documents. It remembers what it learns and it proactively comes back and suggests how you can be better based on the conversations you've had. But crucially, it actually connects everything together. So it isn't just about Hermes. Sometimes we're talking to chat GBT and other times we're talking to Claude code. So being able to see and understand our entire AI world in one location is really really really important. And crucially with private you're not tied in to any vendor. You literally you're basically running everything you want to locally. Now interestingly Hermes have just released a desktop app and I'm going to show you exactly how that actually makes one thing easier in this video. When you should use it, when you shouldn't, and what this thing actually physically means. Now before we build up, we have to understand why we're actually doing this in the first place. So the cheat code essentially is ownership. So the idea here is that your data never leaves the home or if you're a business, all of the data, all the models are running directly on your own computer. That's one of the reasons why I really like running it locally, not on a VPS for several different reasons. I'm not sponsored to sell you some kind of VPS. I tell you what I do. I run it locally. This is what I tell my people. Run it locally. I just think it's a lot better way to do that. The idea here is your data never leaves your home. No internet needed. No company watching. It's all yours. And basically whether you're 16,000 ft underground or you're on a SpaceX rocket ship makes no difference. Okay? Works without the internet. No company gets your data uh to the extent to which you think that analyzing it is is completely your call. But they don't get it anyway. No brain limits ever. It's free forever. There's no gatekeeper and you own it outright, which is fantastic. So the idea here is that we want to stop, so to speak, renting our intelligence. Now, there are a few interesting trade-offs with this. I'm going to get into this video, but the idea here is that you're going to have it all running perfectly local. Okay? Again, it's completely free to do. Um, the top labs when open are usually, typically speaking, we see in markets, if you're behind, you just open source the thing and push it. And a laptop now beats old GPU. Now, just to put this into perspective, is that the because you might be thinking, Jack, what about performance? The best local model today is about one year behind wherever we're currently at. So, for example, the best local model today is as good as the best model that existed in around mid 2025. So that would be Claude Sonnet 4 just to give you some perspective. That's how good they are and how close they are behind current models. Our expectations just change so quickly. So these aren't exactly, you know, caveman basically running around in our laptops. And what we're going to be doing in this video is going to be using alarm to essentially unlock everything for us. So we have the uh metered way of doing AI, which is maybe using chat GBT, OpenAI, uh Cloud, etc. in cloud services. Just fancy way of saying that we're just running it on their service, their infrastructure. But with our powerful, handsome llama over here, he has the keys to unlock Quen, Deepseek, uh, Gemma, Mistral, loads of these kind of interesting open source models that we can do. We can download them once, run them for free, forever, and nothing ever leaves our machine. So, the very first thing I'd love you to do is head over to this beautiful website. Come over and just click on download. This actually sits like an app and you can when you get these things literally just chat to your models there if you want to. But, of course, we're getting connected to Hermes and we can chat to anything. So, we're going to come down and download that for Mac OS. And once we've done that, I need you to open up a terminal. So, that's command spacebar and type in terminal. And the terminal will appear. And then effectively all we're going to do is literally come over here and just copy this information like so. And then when I bring the terminal up, this will just install the latest version of Alama onto our computer. You can see all the code is going to be in the background, which is fantastic. And if this sounds like I'm actually speaking Icelandic right now, you can go ahead and grab this code full course. I'll put a link for it down below. It takes you from foundation setups, building websites, power features, memory systems, Hermes stuff I have never covered on YouTube. It is the best course I've ever created. You unlock all of this and you also get the entire cla code Hermes operating system immediately as well. I'll put a link down below for you so you can grab that if you find that beneficial. Now on the terminal, this is now fully done. What we want to do is open up the app, search for command spacear, and type in. And he will appear when we say his name. Cool. So this is the app. So if you wanted to at this point you could just talk to your model privately and effectively you can literally just click this download button like Gemma fork whatever it is and you'd be ready to go but the first thing that we need to do is understand what is the best model for you to look at and I've looked at alternatives to Alarm and it comes to the free space I just find it easiest to use. Now, if you click on the app, if you're on a MacBook, the top left, you come down to about this Mac. You'll then get some information. All you're going to do is literally screenshot this guy by coming up like so. And we can literally ask Hermes, hey, what is the best local model for us to run for this thing here? And the best way to show you that might be actually using the Hermes app itself. So, if you come over to this website, which is Hermes agent news research, you click on desktop app, come down, and what we can do is download the Mac OS here. What's cool is I haven't done it on this computer so we can go through the entire process together and you can just get an overview of what it is and how this fits into the stack. So you go we click on it and then we double click on Hermes and she does pop up. We're happy for that to happen. Click on open. Install Hermes. Although we've already got it which is fine. Now this Hermes app installation step is completely optional. You can just chat to it in Telegram if you want to. I just want to show you what Hermes have done so you understand it. Come down and click launch Hermes. This rocket here icon by the way is classic when we when you vibe code stuff. I've noticed it kind of like always pops up. Now guys, when I downloaded it, it could not launch the desktop app. If that happens, it is because your Hermes is not up to date. So, all we're going to do is come down and do Hermes update inside terminal. Um, man, you feel like you're in the matrix sometimes doing in the terminal. It's basically just a way to talk to your computer. Dead straightforward. Come down and we want to restore those changes, which is cool. And send that one off. Beautiful. So, now it's complete. If I come off this and just rerun one more time, that should work fantastically for us. Come down and install. Beautiful. And then we have the Hermes dashboard. So, pretty much we can do is create a new session here. Click on new session and then I'm l going to come down and just say hey and you'll see can we start a conversation and it's just exactly the same thing as talking to it on your um you know your actual telegram. The only difference here it realistically is think of the Hermes desktop app and the reason I've done a dedicated video on it is because for me basically at the moment in its current configuration it's just a um less intimidating way of using the terminal which I think is a wonderful thing and the guys are crushing it. So huge job to those guys. Now, what I'm going to do is give it a message. Hey, there based on the specifications of my MacBook. What do you think would be the best performance model that I could grab from Oama to download and run a model on my computer? Okay. And then going to send in the image. And then we send that one off and let Hermes work its magic. Cool thing just if you are going to be using this desktop app is you can come down to here and you can actually pick the models that you've installed. And I think what is really cool is you can kind of pick like a minimal, low, medium, high max, which I think is quite a nice one then. So top recommendations, Quen 33 32B or Quen 532B is best overall performance right now. Excellent speed and quality. So let's try that one. Let's say we want something speed. What and all we're going to do is literally copy this like so. Okay. Then basically to run it, you're going to basically open up terminal one more time. So come up to terminal like so. And you're going to give it two commands. And basically you can say, hey, what commands do I need to run to install insert the blank? So the first one for us is going to be alarm basically quan 3, which is going to be fantastic. And it's going to pull down the manifest for us. And you can see the whole thing is literally downloading. If you want to check out on the website, by the way, and come check out models, you can do that as well. Have a nice little browse. And it's really cool. I love the competition in the open- source market. And again, remember, it's as good as models a year a year behind. So, a year is not that much time when you really think about it. It's just incredible to see how much they're actually developing. So, this is downloading this puppy here. And take about 3 minutes, then we're ready to rock and roll. And just while that's downloading, one cool thing you can do. So if I come down here and I say, "Hey," and I begin a conversation, I can if I want to what we call branch something out into a new chat. So if I'm building on something in particular, I can come down here, click on this, click on branch and new chat. Okay? And then now it forks, which means like let's say that you're working on a project like I don't know to grow your LinkedIn or something and you think actually I really want to do two things now. One is I want to build strategy and the second thing I want to do is do some DM outreach. Fork the chat, same contact, two different windows. I think kind basically the strategy that they're building here is essentially to kind of be the use any model platform. That's kind of a direction traveler going in and it's cool they're building out the artifacts and you can save it. One of the reasons why I kind of built out our operating system this way and I do think the future is 100% configurable is because you can actually just like add different things like I can click on this and see it. I can actually come down and actually see the images as I'm actually pulling them out as I go. Right? Like this is an overview it did for me and you can edit it. So really interesting and as you see here this is now completely finished. So we need to give it a second instruction. Now that code basically would be to chat to it. So if I ever want to chat to it in the terminal I run this code here but in reality guys who's chatting to it in the terminal unless you've got a huge Windows laptop and you're just like completely nerding out. In reality we can come over to Alarm and you'll see in Alarma that this mysterious Quen model has mystically appeared. So if I come down here I'll be able to select this here which is Quen 3 30B just means billions of parameters which is fantastic. And I'm going to go ahead and just shut this one down so we can chat to it in a cooler interface that can say, "Hey there. Um, give me three interesting facts about color theory and design." And immediately we're getting all the thinking, which is fantastic. And by the way, if you're looking for what I'm doing for speech dictation, just tell what it is. I'm using something called glider.com, which is company that we founded, fastest in the world, super private. So if you want to check it out, I'll put a link down below with some goodies so you can have a little play around with that. We freaking love it. And look at this. This is completely guys, how freaking how fast was that? Number one. And how cool was that? Again, we went for a bit of a faster one, but now anything that we we talked to Quen about is running on your computer. Think about that. It's literally on your I have never had so much fun than running something locally, which is sick. But then it leads on to the next question, which is how do we get it from the computer into Hermes agent? And so Hermes requires a local model to have 64,000 tokens in its context window. The one we just downloaded doesn't. So what we effectively then want to ask the model effectively is what can I run on my computer that has enough context. I asked Claude this exact question and essentially it said you want to grab the Quen 3 coder 30B. Now the great thing is the thing we just downloaded we can use on a computer for anything for tasks that don't require more than 25 to 30,000 words. But due to the nature of the way that the Hermes agent works and its context on its memory, it needs that 64k. So we're going to download it and then we can connect it directly to Hermes agent. And now that's complete. We can see the Quen 3 kod 64K has arrived and we can chat to it within you know alarm and also it's here right here with the Hermes agent and you can see in the bottom right corner Quen 3 coda 64K so now we can literally chat to it and do various different things with Hermes completely locally and private on your computer and this then raises a really important question for us to understand here which is essentially how good is local realistically like how good is it actually well if you think about that we're one year behind these are the benchmarks marks the extent to which you back and believe these 88.6 for basically called Opus 4.8 and again you can make the argument that optimizing benchmarks I get that quen which we're running 74 so is it the absolute premier model no but you are trading off on privacy performance and price and bear in mind given the fact we're 12 only 12 months behind the best models I can tell you in the future within one year's time we will have a model like cloud opus 4.8 Okay, that you can run completely locally on your computer. Now, obviously, if you have some of the bigger, more powerful models, your computer might go a little bit slower. It might take two business days to go from here to here just to bring your mouse cursor across. So, it's not for work that we want a snappy answer, especially if you're going for a big model, but it absolutely has a really important place in our ecosystem. And if you take for example if you imagine you split all the work you do say for example this laptop right with my beautiful spring spangle on it represents 100% of the work you do with Hermes agent and you chop that up into percentages there'll be some things where like I actually want a private model that doesn't fall into the hands of any company that I can talk to and it's amazing for that sort of stuff and it's only getting better. So transparently it's free forever $0 for token. It is though only as fast as your machine. Like I say, like if it's really slow, it's going to take ages to move across. Now, it's total privacy. Data never leaves your computer. That's it. You could be 16 ft underground. You could be high in the sky. You could be on Mars. It doesn't matter. And I'll tell you, I was flying from Dubai, a beautiful place where I live right now to LA. And I remember like the internet at one point wasn't working or like I hadn't set up yet. And I was just using on my computer. And it felt really freaking cool. It's like so fun to be able to ask these questions locally on your laptop without the need for anything else. It's really freaking sick. I absolutely love it. Now, when it comes to peak performance, the Frontier models are still winning. They're still the best ones. They do the hardest jobs and they're great for those kinds of things. So, the philosophy that I'm progressing and I encourage you to think about because I'm not so I'm always going to keep it real with you guys. I'm never going to say, "Oh, we're always private. We're always local." No. Like, we bring in the best thing for the job. And if something's no longer the best thing, I'll tell you, I didn't. It's no longer the thing about something else. We, you know, we're not like ideological with this stuff. We just follow what works. That's that's the whole ethos of the channel. Stuff that works and is fun and just you can build anything. Now, the idea with this toggling your privacy thing is we have vault mode. Okay. So, things like maybe private data, health stuff, client information. We're going to go vault mode with that. And the cool thing is we can actually dynamically get Hermes agent to bring that in. I can say, "Hey, um, just run up my local, my private, send this to the private model." Okay, go do this over here if you want to. Very freaking cool. Then we have connected mode, which effectively is performance mode, right? Give a little bit more detail on it. When would I use private vault mode? Well, client data, finances, health notes, uh, your code base in proprietary IP. Interesting, right? Like maybe we're building a OpenAI 2.0 and we can't have someone knowing about it. That would be a private thing. It's offline. You're on a plane. You're off grid. Um, doesn't matter. You can do it. like my editor joke sometimes like the electricity went out and he had a private model. He could still like think about things which is really cool. 24/7 background agents all day. They can be running for you 24/7 at absolutely $0 which is code for good news. And then we've got the cloud. So when we want the best answer um quick one from your phone, fresh web info private search pipeline when raw quality beats privacy. And by the way, don't feel bad at all if you download a model and it's just makes your computer too slow. That's cool cuz you need a little bit of headroom above what you download. So, it's fine to download, delete, find models that you think are good. Have This is supposed to be fun. Like, this isn't supposed to be I got to get this. It's a fun thing. It's It's good. It's like has a lot of utility and it's also freaking fun. So, have a bit of time with it. Now, the idea here, you're going to run your business. Now, we are in a year's time going to have a model like Opus 4.8 that's going to run completely on your computer. We just just based on where we're at compared to local private. So, learning the skills that you learn right now in this video is going to put you so freaking far ahead, it's unfreaking believable. And you'll have these in offices, these private company brains running with client data. Everything is kind of, you know, basically boxed off and the cloud is outside. So we had this move, everyone going to the cloud. Now the cloud is old. Now we're going local. Local is the future. It's a big big big trend. Expect to see this blow up. And you've learned the schools exactly how to do that. The idea is that client data is always going to stay ours. One private agent for the entire team and it works in a regulated environment. In glider for example, we are going through sock 2 compliance, GDPR compliance, um ISO 27,000, all that sort of stuff because compliance is freaking absolutely critical and super duper important. I can tell you now when we're building these things out, having these local models does make a significant difference and it's great and really important for regulated work. The idea being we can own our intelligence. But it does bring us on to one interesting question and that's that going private is one thing, but if you don't have an operating system, you're not unlocking the full capabilities of this incredible technology. So, the next thing I'm going to do is set that up by watching this video right