How I Built T3 Chat in 5 Days
- A breakdown of the new AI chat app, T3 Chat, and its rapid development.
- Details on how the app was built in five days.
- Insights into the development process, including challenges faced and tools used.
- A mention of the app's sponsor, Code Rabbit, which simplifies code reviews.
- A call to action for users to provide feedback on T3 Chat's performance.
In case you'n't seen yet, I just put out a new app called T3 Chat and I'm really proud of it. It's the fastest AI chat app I've ever used and, as far as I know, currently exists. If you don't believe me, go try it. Watch my other videos about it. It flies.
We are getting a lot of questions about how I built it, how it's so fast, and most importantly, how the hell did I do this in five days? These are all great questions and not all of these questions have great answers. But I want to do my best to try and clue you guys in on what it took to build something like this as quickly as Mark and I were capable of.
Think of this more like a devg type video in retrospect, where I'm gonna go through each day, what I did, and how the process led to building an app that we're actually proud of and we're able to hit a crazy deadline on.
Before we can do that, we need to hear a quick word from today's sponsor.
If you're anything like me, you're probably pretty tired of these AI tools that claim they can replace your job. They're never any good. The ones that are good are the ones that compliment your job. They take the tedious things, make them less tedious, and give you information that you might not have had otherwise. Things like code review.
And that's why I'm super hyped about today's sponsor: Code Rabbit. They make code review way easier by doing a first pass on your PRs and leaving a bunch of useful feedback, summarizing, drawing diagrams, and so much more.
This is a real pull request where we're no longer allowing people to upload EXE files without paying. Long story. Go check out my pirate software video if you want to know more about that.
But here's what Code Rabbit did. It summarized the pull request, giving a bunch of useful info saying that it's introducing significant enhancements to file upload, validation, and error handling across multiple files in the ingest infrastructure.
Here, it's summarizing all the individual files and what they do. But where it gets real fun is once it starts reviewing the code directly. So here's a comment where it called out that we were returning a partial error and we should give a full error.
Here's somewhere where it caught something that would not be a great experience for users where we would be telling them in bytes how big the file should be and how big it actually was. Nobody knows how to read bytes; we should be giving this in megabytes and gigabytes. Since they called it out, we were able to change it before anyone on the team even had to touch the PR. Super handy.
When it has changes that are simple enough to propose a change for, it appears in line, and you could one-click add it to your pull request. It's free to get started. It's fully free for open source, and if you want a full month of the Pro plan for free, use my code THEO1M free. Check them out today at Soydev Link Code Rabbit.
Before we can get into T3 Chat, we should start with where I got started, which was DeepSeekQ. DeepSeq had just put out a new open-source model called Deep Seq V3 and I was blown away with what you could do with it.
It was really fast, really cheap, and comparable quality to what you'd expect from something like Claude. I played with it, and I was really impressed. But the chat app was awful. It was so annoying to navigate; my experience using it was garbage.
I wanted to take full advantage of this model and I'd also been thinking a lot about it for a while because I've been frustrated with ChatGPT and Claude's web applications for as long as I've been using them. Over the last six or so months, I've been using them more and I've been getting more and more frustrated. So I wanted to play with this model and have a better UI to play with it in.
So, I went and tried a couple of the open-source starter kits for doing an AI chat, and quickly realized they were all garbage. Noah Feibul, who made them, said it's really hard to do these things, and you’ll build everything correctly with the technical assumptions that have existed in the creation of most of these tools. But I wanted to do something fundamentally different.
I've been dodging local first for a while because for most of what we build, it doesn't make sense. An app like upload thing gets nothing out of being local-first. An app like a chat AI app actually benefits a lot from it, so it's disappointing to not see anyone take advantage of it.
So I decided to start scaffolding. I started with Vero. I bet we can even find the point that Vero got me to. Yeah, as you can see it's pretty far from where we ended up. We have redone all of this since, but it gave us a rough starting point.
Using the Vercel AI SDK, I went over all my limits on Vero and had something that was a UI that kind of worked and I was able to get that running in Next on my machine, got all the parts plugged in together, and had it streaming.
I immediately had some things I wanted that Next wasn't going to help much with, though. Specifically, I wanted whole navigation to be on client. As such, I ended up spending most of the day on the routing layer and you'll see something interesting here.
This is the only page in the app on day one because I moved the whole of the routing out of Next over to React Router with a catch-all route that would handle all the different URLs you went to because I didn't want the server to be involved in navigation as you moved around the app.
This combined with my sync layer that I built entirely through React context worked. They meant you lost everything as soon as you refreshed. In my attempts to build this backend sync in a KV where not going great, but it kind of worked with a rough sync layer and all the pieces were coming together.
Navigating it felt good, but it was far from what we wanted it to be. I can probably run it locally; I go to chat, we get launchchat which creates a new chat with an ID and I can say “solve advent of code 2022 day two in typescript.”
Yeah, it took a second because I didn't have all my optimizations and I was so unsure because I'm so used to being fast that I just assumed it was broken. But it worked. It doesn't have auto scroll because I was fighting scroll constantly throughout, but I had at least a decent UI.
I had hacked in syntax highlighting in a way that was okay, but I had something here that worked. The sync engine was not one of the parts that worked, but at least I had all of this. I was proud of where we were at and I had also wasted a ton of time on random explorations.
I tried multiple different ways of storing the data. I have a NEON instance here that I had a schema for and ended up going with a KV through Upstash Redis that worked fine to use Super JSON or something.
I've been using a lot of Super JSON for this project, but yeah, it kind of worked. It proved that this could happen, but it was nowhere near where it needed to be. But it was also five in the morning, so I went to bed.
But first, I made a quick update. I think I have it in here. Do I have my README?
Yeah, I wrote the things that I needed to do and then passed out.
After waking up the next day, I felt like I was far long enough to bring my CTO Mark in. I always feel bad bringing them in on these projects when they're so early, but I knew I couldn't be doing this one alone and I would need a lot of help.
So I caved and brought them in. Did my best to notate all the things that needed to be done. I was also battling my hot water and spent most of the day with plumbers. Funny enough, but we made a ton of progress.
First and foremost, we overhauled the UI. We now had tabs that you could go between, as well as a chat box that wasn't anywhere near as cringe. It still had bugs, and I was still insistent on command enter, which was wrong. Enter is how you should submit.
We made a lot of progress here though. Parts were starting to come together. I had thrown away all of the sync because the context I was using before was garbage. At this point, I had started moving over to Dexy, which is, funny enough, like an ancient library.
If you don't believe me, just look at their website. You can tell this is from the 2010s. It's awesome. They have support for all these new cool things and the team works really hard and builds great stuff. But this library started in like 2011 and it has Internet Explorer 10 support.
This is not a project that I've seen anyone talk about. I understand it's kind of old, but I don't care. It was awesome. It made so many things that I was struggling with way, way easier and I had a lot of fun with it.
So we started architecting things with projects, threads, messages, building a database layer where we could store all of this locally in IndexedDB on your machine. If you're not familiar with IndexedDB, it's a browser standard for storing a lot of data in the browser. Pretty cool chaos, pretty cool.
I ended up with a couple functions here for creating new messages and threads, and then the code for the actual chat used the default hook they provided: Use Live Query, which would sync by getting updates through signals whenever Dexy had something occur.
This method of getting messages in was really nice, especially after the hell I had dealt with trying to do all of this with the Vercel AI SDK. I don't want to shit on them too hard because the SDK is great and the backend side is still what we're using for our streaming in from the LLMs, but the client side was very limited.
It worked great for a quick demo, but as soon as I wanted things like local sync or IDs, which—oh God—I was so frustrated with the message types and the way IDs worked in here. We'll have a whole tangent about that in a bit, don't worry.
But I ended up spending a lot of time hacking the data layer here and dealing with weird client behaviors, trying to get the state to behave and I couldn't get it to behave, so I caved and moved everything on client over to the Dexy layer I was increasingly invested in, which meant that I could just hit a live query that would update when the message updated and just stream the message straight to my local DB.
Worked great. It meant a lot of things we rendered when they shouldn't, even with the React compiler, but overall it worked pretty well. We were a lot happier overall.
We had decent UX flows here where we had gotten some actual tailwind being written to make things kind of pretty, and I had at a point where I was happy enough to show it to people and get some feedback.
It's also worth noting that this is the point at which I stopped using Claude and ChatGPT to ask questions throughout dev, and I was just using T3 Chat for all my dev work.
We also picked the name T3 Chat like that night, and if you look at the commit logs here, it was at 5am where I decided T3 Chat was the name and put it in the corner.
The reason I picked the name is I snagged the domain, so I used it here and I was really happy with it. But the T3 Chat name was day two as well as all this overhauling, and here we are on day three.
You might notice things don't look that different and there's good reasons for it. I spent the first half of this day at Vercel's office detailing my frustrations with the SDK. They were very happy to take me in, thankfully, and wrote down six pages of notes, making meaningful changes to the AI SDK as a result.
Fine and dandy awesome. By the time this video comes out, chances are building something like this will be much easier because of the changes Vercel is making. But I had to do it all myself.
So I spent most of the day gutting the remaining pieces of the SDK floating around and moving everything over to my Dexy layer. Sadly, when I got home and opened my laptop to get back to work, I got a notification from an upload thing user that Malwarebytes was blocking their customers from accessing files on their service they had just released.
And if you follow me on Twitter, you probably already saw this. It went pretty viral for me and also been dealing with things like this. My video about it, Pirate Softwares, complaints about that touch on a lot of this.
Since upload thing allows any developer to let their users upload files, we will inherently end up with people uploading malicious things. It's going to happen despite the fact that we've been aggressive at removing those files and banning the users who do it.
A couple companies in the threat security space and antivirus space would block our domains weeks after the files had been deleted because they weren't robust enough to check, and they never bothered to notify us.
So, I had to spend a lot of time fighting Malwarebytes in their stupid goddamn forum because it's the only place to report false positives. After fighting this for a while, we ended up getting it done.
Spent a decent bit of time on tech figuring out how we prevent it in the future. We have some cool subdomain stuff coming up later, but I ended up spending probably like four to five hours dealing with all of this sadly, which meant I didn't get to spend as much time coding as I would have liked.
That said, I was able to finish the Dexy layer for the most part—not the sync part, just the local part—as well as get some startup credits from Anthropic. OpenAI still hasn't got back to me. It is what it is.
There was one other thing I forgot and I probably shouldn't have. Oh actually, we had a homepage now too. It did not fit well on the screen. We ended up fixing that literally an hour or two ago, but I did finally get off kind of. It's probably gonna break really bad now because of how much we've changed the auth layer since then.
I wasn't running it here the way I am there, but we had a mostly working auth layer with cookies and local store. I spent a lot of time thinking about Auth for this app because I wanted everything local. I didn't want you to have to hit a server and get a thumbs-up from me every time you did something.
And as much as I love Clerk, it very much leans you in that direction of doing everything through middleware on your Next app, and I did not want to fight anything that would come from there.
So instead, I picked a worse battle, which was rolling my own auth and it made me miss Clerk so much. I genuinely wish I had spent the time to try and figure out how to make Clerk work here. I know they've been a sponsor for a while, but they've been a sponsor for a while for a reason. I like the company and I like the product, and I lost so much of this day and the next day to auth that you can see how bad this looks.
So between my time at the Vercel office, my time with Malwarebytes, my time fighting the AI companies, and my time trying to get auth set up, the only actual UI we got done was the delete message button. Yeah, not great.
So I was excited for the next day. The problem being the next day was stream day and if you guys know, if you've been around for long enough, stream days are long.
So stream days I tend to not get to code a whole lot during, and at the end of stream day, I actually had to go to the Vercel office again to hang out and do a little meetup.
Actually, the day before there was one other thing I forgot about. I also spent a decent bit of time hanging out with the Laravel team at the Vercel office, which was very fun. Got to hang out with them a bunch, give them feedback on Claude early. Hung out with Josh who filmed the clip here. Great time. More lost time though, so sadly nowhere near as much code as I would have liked on day three.
Day four, stream day, finished up auth, also had to go to the Vercel office for the meetup that I had agreed to go to there and I had some friends I hadn't seen in a while so I was at the Vercel office two days, which is funny, I'm almost never there, just worked out that way.
I had also spent a bunch of time this day moving off of Next, changing my mind, and moving back. So yeah, I have a prototype version of all of this, working with Vite in React plus Hono on Cloudflare, and all the hacks I had to do to make the streaming work on Cloudflare were enough for me to say fuck it and go back to Next for now.
In the future, we'll move this to doing it the right way, but not yet. So day four was mostly polishing auth streaming and also setting up Linear so we could actually track our issues. I think I also turned on a React compiler that day if I hadn’t earlier.
Yeah, pretty much no change in the UI. Everything still behaves basically exactly how it did. Just auth was the big thing. I what if I go to Auth? It'll work now. Yeah, it does. Cool. Google Auth. Look at that.
All through Open Auth. Open Auth is a really good library that is not easy to set up. Day five, and I know you're probably seeing this day six and he's like wait five days. Can we honestly say that these two days were both full days considering how much of my time I lost to entirely unrelated things?
Yeah. Also, day one started at like midnight, so I'm pretty sure it's five days in terms of the dates of the month overall. But yeah, be flexible with the five days—it was closer to five and a half.
Day five, I spent a lot more time on that sync layer because I had the local DB working great with Dexy, but I had not cracked the cloud side. I'd tried a few things, wasn't happy, and decided to go back to exploring other options.
I had also on stream said I wanted to talk about local first and didn't get to it because I had to end the stream early to go to the other Vercel event. But I had a lot of DMs from people I trust talking about local first stuff because as much as I don't think local first is something we should all be reaching for, there are a lot of developers I really respect and look up to who care a lot about it and had a lot of things they wanted me to consider and look into.
We'd already explored Zero. Funny enough, I forgot to mention this earlier. I had Mark exploring Zero for most of day two and we concluded that as cool as it is, it's not quite ready. If you're not familiar, Zero is by the people who made Replicache.
It's a way to set up a Postgres layer with a cache-like JavaScript server. Actually, I think it might be in Go but there's a server between your database and the client. I know, boring, typical. But you define all of the behavior for the app in the API as a TypeScript file that now is a websocket connection between that cache and your client.
So everything is done on the client and then synced up to the server rather than the other way. Really cool pattern, really crazy potential. Overall it was just a combination of hard to set up, not super flexible, bad source of truth where you had to write the same code in five places and hope it all came together properly, and mandatory downtime when you upgraded.
All of these things were enough that I was unsure and I had also gotten so deep into the Dexy layer that I wanted to lean in further. So we ended up doing a Dexy sync layer on day five that I built myself, but not after trying Jazz Tools.
Jazz seems super cool. I spent a bunch of time talking with the team. We tried really hard to get things set up, but there were a couple fundamental design decisions that ran very against the way I was trying to build.
The way I would shorten my issues were that it's very focused on collaboration and collaborative values as well as every user being fully authenticated before anything happens. I have the PR where I tried moving over to Jazz.
Here, you have to wrap everything with a provider as you would expect. But if the provider doesn't have a signed-in user, it will not render its children. So doing this actually broke the app entirely.
I couldn't get it to render, and it was really unclear why. Turns out you have to be auth before the Jazz provider will even return the children. Yeah, it is what it is. Got it kind of working, but every time I thought things were working, five new ones would break.
Some of it said I just hadn't wrapped my head around the data model. But a lot of it is that the data model was fucking weird. Everything has to be structured through a ME object.
So here's the schema I tried making with Jazz. The schema had a weird hierarchy. You have to globally register your account in their Jazz React package in order to have the types work at all. Then you define account. Account is a class that extends their account.
This is my app account. They recommend you don't assign values directly in it, but you need to be able to access them. So instead, you assign it a root value which you type out. So I made it my app route.
So my app root is a child of my app account. These are properties on classes. And if you know me and my functional programming brain, you know how much I was starting to get angry from this.
I then had to make a Thread List which is an extension of a coref of Thread. And I have my thread which has a title last message at threads, which is a co ref of a message list. Message list is a co list of a coe of a message which is this.
What this all means is I can't select messages by a thread id. I have to do everything through the ME. So if I want to render from a list, I have to go to ME root thread, select with the right ID and then get those messages and render them.
And I did not want this type of hierarchy in my app. I have my createMessage
function. This function takes the thread ID and the message the user wants to send and it does all of the things. It creates their message in the right thread, it gets all the messages from the thread, it creates a correct version of those messages.
Just tidied up to send to the server to start streaming the new message from the AI, and we start streaming it in. I said at the time, jokingly, like very much a joke, thinking there was no way in the world this was true.
Ha ha. If I have to pass the ME object to createMessage
, my head's going to fucking explode. To which they replied about that. To their credit, they were hyped about how many issues I ran into.
They were super responsive and are taking the opportunity to fundamentally rethink the loading and data patterns around Jazz. If I was to move to a sync solution, Jazz is very high up on my list of things I would consider.
But what my realization has been throughout this, it’s actually a confirmation of a theory I had in the past, which is the needs of different local-first apps vary so much that if you are trying to build a generic solution for all the local-first apps, you're not building something anyone actually can use or want.
So these attempts to build generic solutions all kind of sucked for me and I could not find one of these that was even close to what we were trying to do. So I gave up finally, after spending probably three to four hours back and forth on Jazz and rolled my own instead.
I ended up going way better than I expected considering how much time I'd lost. Everything else going on, and I also spent some time experimenting with other models. This is when I started playing with ChatGPT, playing with Claude a bit more too.
And the reason for that is actually kind of silly. I started paying more attention to the different performance characteristics of a handful of models. This site, by the way, super killer. This is Artificial Analysis AI.
They benchmark every model daily to get you performance information. So if we throw like 40 latest 400 mini, you got the latest Claude in here. I love the site.
Scroll breaks when you do that. Claude latest and DeepSeek V3—this was really useful for me to start getting info. You'll see DeepSeek's quality is absurd, but there's a catch.
And the catch wasn't something I felt the first few days. The catch is when I started using it, the output speed was great: 90 tokens per second, which means 90 words effectively coming in every second. And it felt great.
As we got closer to launch, the speeds were going down significantly. They'd gotten to almost half of what they were prior and I was losing confidence quickly. I also noticed that all the alternatives for DeepSeek, because it is an open-source model, which was exciting, went on to throw it on one of the providers and saw all of them were even fucking slower.
So I started obsessing over the performance of the model, probably a little too much, and spent a lot of time testing all the different models. After playing a bit and screwing with ChatGPT and GPT 4.0, I ended up getting 400 mini set up on Azure in a way that was really, really fast and that's what we're using right now.
We're going to introduce the ability for you to select different models in the near future, but for now the goal was fast without killing our bank accounts and I'm happy with where we landed there. DeepSeek is still hilariously cheap so if you're looking for the cheapest that is high quality, check them out.
Oh finally, we had a real homepage by the way. For a long time, everything was on chat, which meant if you just went to the site, you got a blank page. This fixed it. Yeah, I ended up not changing much UI-wise.
Oh, I think I added the collapse for the sidebar which was cool but was not the focus. The next day was grind day. This was yesterday, the day before launch and Mark and I just spent the entire day from when I woke up to when I went to bed hacking, overhauling the UI, making a ton of other changes, most of which to be fair, Mark was making.
But we hadn't merged just yet. We changed the input box to look more like Claude. We changed the sidebar to have a better new chat, not it being there. Reserve that area for your auth information and most importantly, stripe in payments.
I still hate setting up Stripe. There are a hundred ways to do it and none of them feel right. We have a solution I'm okay with, but we also had a couple reports of people paying and not having it correctly flagged their account as paid which makes me want to go mad.
So we'll be spending a lot of time tonight making sure it is as stable as possible. So by the time you see this video checking out is fine, but like chat's already saying it. Stripe as hell. I'm afraid of Stripe.
I have checked out Grok. I had a tab open for earlier. The speed you can get things out of that is nuts but yeah, lol. Trying it and seeing how fast it is, it's really nuts.
We spent a lot of time on Stripe. I also did an onboarding flow that I was really proud of where when you first open the app, it would create three messages that describe what it is and what it does.
I did that instead of a traditional homepage and I think it's really, really cool. I also spent a bunch of time with Aidan, the Million DE v guy who made React Scan. React Scan is a library that lets you see when things re-render.
I have a video all about it and React render patterns coming out soon. It might be out before this. Hard to know. My schedule's chaotic. But he is an expert, like industry-leading expert in all things React performance. He is also the CEO of Million, which was originally an alternative React runtime that would make your React apps way faster.
Now it's also more focused on the linting side where they will catch performance regressions in your app. He is so locked in on performance, it's nuts. And we ended up making a bunch of really cool changes.
The biggest one was markdown chunking. We would start to identify chunks. I think there's a regex in here, or it's the marked lexer which will split the chunks of markdown by the block that each of them are in so that we can memoize the blocks.
So when we get new text, we don't have to re-render the entire message; we only render the block that the new text is going to. And this was a huge win, in particular for messages that have multiple code blocks in them.
This made the performance go from shit to pretty good. Still not where I want it to be, I'm going to spend a lot of time fighting Prism or moving to something else for the syntax highlighting. But we got it running way, way better.
I'm very happy with the result. I also added some fun functions to make it easier to test in dev with a lot of threads to get this all working well and I was really happy with the result.
I still actually have—let me safely open up my environment variables here. I added a React scan environment variable locally so that I can just go to the site and now have React Scan running on it.
You can see when I make a new message here “solve Advent of code day 82021 in vanilla JS.” Oh, that's really funny. I'm gonna just comment out the rate limit for now.
Okay, second attempt and you can see the block you're in re-renders but none of the rest of the UI does anymore and the result is you can hit a locked 60fps even with decent CPU slowdown. It can do 120fps, which is what my MacBook usually runs out when I'm not streaming. But it can dip down to the hundred sometimes, which is why I want to go further.
I'm happy overall though; it's way better. You might have seen the chat itself was re-rendering, but those are memoized re-renders, so they're not actually recalculating. It's just checking and giving a thumbs up like, "hey, this is okay, we don't have to do it."
If you look closely, I'll see if I can do another. Now do it in Rust. You look closely, you'll see there's a little star on these. The star means it's memoized, so it's not actually re-rendering; it's just being checked a whole bunch.
And yes, the things in this given message are being checked a lot, but they are being opted out of really early, so it's not a big deal for performance. Here it is what the performance monitor on now do Erlang. The error is just a react scan thing. Don't worry about it.
But you'll see during the code block, CPU utilization spikes a bunch. But as soon as you're out of the code block and doing other things after it drops to nothing. It's only the code blocks that have this level of CPU utilization.
Now that I have the dev tools open and the CPU slowdown on, and I'm streaming at a really fast speed with React Scan in react dev mode. It's not going as fast. See how immediately faster it goes and how quickly that drops though. It's just the code blocks.
So now you see why I wanna optimize it further. But we've been to hell and back to make this as fast as possible, both by doing everything we possibly can locally on the machine, avoiding re-renders to the best of our ability and streaming things through a data layer that actually makes sense.
Building a routing paradigm that is a combination of the things that work well in Next and the things that I actually like about React Router. The result is, as far as I know, the fastest AI chat app that's ever been built.
There are a couple of other cool things I did. I'm not super proud of the state they are in, but they are getting to a state that I'm really excited about. Like I have this useQuery
with local cache function.
This should be named useActionQuery
with local cache because I pass it a server action. The server action does something like get the user's subscription status, but I also store whatever the result is in local storage.
So instead of showing a loading state, I can show a default state and from that point forward show whatever the server returned previously. Theoretically, what this will enable is if you are on the free tier and you go to the paid tier and you go back to the homepage, it'll show free for just a millisecond before it pulls in the updated value.
So I never have to deal with loading states ever. I never have animations anywhere. I had a couple of the things I want in the README, like my strong stances, avoid animations as much as possible and indicate changes as aggressively early as possible—things like on mouse down, stuff like that.
And the result is an app that, with a lot of work and thought into every layer, every render, every piece of data touching everywhere, it's something that flies and I'm really proud of it. Hopefully, at the very least, this can help you guys understand that React isn't slow; it's just easy to use it in a slow way.
Admittedly, we had a couple times where one small sync resulted in things re-rendering in ways that caused performance issues, but for the most part, it was just fine and I'm genuinely really happy with the results.
Have you had a chance to try T3 Chat yet though? I'm curious if you feel the wins that we put the time into here. Do you actually feel the difference between Claude and T3 Chat? I can't imagine you wouldn't, but if you somehow don't, please come tell us.
Hit up the feedback channel for T3 Chat in my Discord if you have any issues at all, especially performance-related ones, because we take them all very seriously.
I hope you enjoyed this breakdown of how we managed to build the app in five days. Faster, but you get the point. The goal here was to build something that felt better than every other chat app and I'm proud to say that Mark and I somehow managed to do it.
Let me know what you think, and until next time, keep chatting.