Alexandr Wang: Building Scale AI, Transforming Work With Agents & Competing With China

Original Video ContentExpand Video

Since our conversation with Scale AI CEO Alexander Wang, Meta has made a significant investment in Scale, valuing it at $29 billion.

The discussion dives into the history of Scale, the AI industry, and Alexander's journey from MIT to founding Scale AI.

The importance of a deep care for one's work is emphasized as crucial for success in any field.

Alexander shares insights into the future of AI, including applications in various industries and the evolution of management roles.

The conversation touches on training data, evaluation metrics, and the importance of agents in the evolving workforce.

Since we recorded this lite cone episode with Scale AI CEO Alexander Wang, Meta has agreed to invest over $14 billion in Scale, valuing the company at $29 billion. Alex has also announced he will lead Meta's new AI Super Intelligence Lab. Our conversation you're about to hear covers the history leading up to this investment, from Scale's early days at YC to its integral role in the training of foundational models.

Let's get to it. The AI industry really continues to suffer from a lack of very hard evals and very hard tests that show really like the frontier of model capabilities. The biggest thing is you just have to really, really, really care. When you interview people, when you interact with people, you can tell people who are just sort of like "phone it in" versus people who sort of like hang on to their work. It's like so incredibly monumental, forceful, and important to them that they do great work. Very exciting times to see how the frontier of human knowledge expands.

Welcome to another episode of the Light Cone. Today we have a real treat. It's Alexander Wang of Scale AI. Jared, you worked with Alexander way back in the beginning, actually. What was that like, what year was it? Put us in the spot. Yeah, Alex, I mean, most of what we want to talk about today is like what Scale is doing now. The current stuff is like so awesome and so interesting. Since Scale got started at YC, I thought it just seemed appropriate to start all the way at the start.

And it is funny, Diane and I were at MIT last month talking to college students and like, of all the founders, the one that they like most look up to and like want to emulate is actually you. Like, everybody wants to be the next Alexander Wang. Because everybody knows the story of how you like dropped out of MIT and ended up starting Scale, but they don't know the real story. And so I thought it'd be cool to go back to the beginning and just talk about the real story of how you ended up dropping out of MIT and starting Scale.

So before I went to MIT, I worked at Quora for a year. And so this is 2015 to 2016 or no, sorry, 2014 and 2015 was when I worked as a software engineer and this was already at a point in the market where ML engineers, as they were called, or like machine learning engineers, made more than software engineers. So that was already like the market state at that point. I went to these summer camps that were organized by rationalists, the rationality community in San Francisco. So and they were for precocious teens.

But they were organized by many people who have become pivotal in the AI industry. One of the organizers is this guy, Paul Cristiano, who used to. Who's the inventor of RHF actually, and now he runs or he's a research director at the US AI Safety Institute, who was at OpenAI for a long time. Greg Brockman came and gave a speech at one point. Eliezer Yudkowski came and gave a speech at one point. And actually I was very like, when I was, I don't know, it must have been 16, I was exposed to this concept that like potentially the most important thing to work on in my lifetime was AI and AI safety. So something I was exposed to very early on.

So then when I went to MIT, I started MIT when I was 18. I like, studied AI quite deeply. That was most of what I did in the sort of day job. And then, um, kind of got ANSI applied to YC and then the idea was kind of like, okay, how can you apply sort of like AI to things. And this was in the era of Chatbots, which is like, crazy to think about actually, that there was like this like mini chatbot bubble boom. Yeah, yeah, 100% in. In 2016, which was, I guess spurred by magic, right? Or some of these apps.

And Facebook had a big vision around Chatbots. And anyway, there's a little mini chatbot boom. So the initial thing that we wanted to work on was chatbots for doctors. Right? Which is like a funny idea because do you guys know anything about doctors? Yeah. No, not at all. Like, basically, no. It was just sort of like, oh, doctors are a thing that sound expensive. And so.

And I think it was like, I think it's like indicative of like. I mean, I don't know, you guys see this all the time, but I feel like most of the times young founders, like first 10 ideas are like, first of all, they're very mimetic. So they're probably like, there's a lot of like the same ideas. Or it's like a dating app. There's like something for like, you know, social. You know, the same ideas.

And then I think that, like, I think young people have a very poor sense of alpha. Like, what are the things that they're actually like, going to be uniquely positioned to do? And I think, you know, most young people don't have a sense of self, so it's not clear. So when we were in YC, we were roommates with another YC company and we were sort of like, we were sort of observing this like chatbot boom that was happening at the time.

And, but it was very clear that like chatbots, if you wanted to build them, and this is funny to say in retrospect, required lots of data and required lots of like human elbow grease to be able to get them to work effectively. And so like just like kind of off the cuff at one point was like, oh, like what if you just did that? What if you just did the data and the language data and the human data so to speak for the chatbot companies.

We were also very lost by the way, I think you probably remember we were quite lost mid batch like many YC companies. I think. And so then we switched to this like concept. I think the initial idea was like API for human tasks or something along those lines. And one night I was just like trolling around for domains. ScaleAPI.com was available and then we just bought it, we launched it. I think a week later we were on Product Hunt.

Yeah, I remember the Product Hunt page is still live. I was reading it last night and I remember the tagline. It was an API for human labor. Like that, that's my recollection of sort of like the like distilled insight that you had was like what if there is an API, what if you could call a human with like an API? Yeah. And that was, I mean I think it was like three days for us to put up the landing page and launch on Product Hunt. I think this idea captured some amount of imagination of the startup community at the time because it was sort of like this weird form of futurism where you have like humans delegated, like APIs delegate to humans in this interesting way.

Yeah, it's like an inversion of these. Yeah, yeah, ye humans doing work for the machines and not the other way around. Yeah, yeah, yeah. It's funny because the initial phase, you know, we sort of, we just worked with all these engineers who reached out to us from that Product Hunt which was a real grab bag of use cases. But then that was enough for us to raise money at the time and like, you know, and to get going.

And then a few months after that, it became clear that like self-driving cars was actually the first major application that we needed to focus on. And so there were many very big decisions I would say in the first like years of the company. One thing that was curious is at that point there were other solutions that were already the game in town. Like Mechanical Turk from Amazon was sort of the thing that people were using but you ended up capturing this whole other set of people that didn't know about it and you had a way better API and you kind of won.

Yeah, it was not clear at that point because you probably were compared a lot with Mechanical Turd. Yeah. So Mechanical Turk was definitely the sort of like the concept in most people's mind at the time. I mean it was just, it was kind of one of these things where I think a lot of people had heard about it, but anyone who had used it knew it was just awful.

And so it's like whenever you're in a space and that's kind of the like that's like the thing is like people mention a thing but it sucks. That's usually like a pretty good sign. And so that was, that was enough to give us like early confidence. But then I think the thing that like really I would say that actually was fundamental to the success of the company was actually focusing on this like, on this like seemingly very narrow problem of self-driving cars.

I think that um, you know I remember very early on when it was maybe like six months after we were re out of YC basically there's another YC company, Cruz that had reached out to us on our website and sort of like in the blink of an eye they became our largest customer. And they found you just from your launch? Yeah, just yeah, I think maybe even Google like it's not even totally obvious but just vaguely from our launch and vaguely it was actually an XYC founder that was working at Cruz that reached out to us so maybe some YC mumbo jumbo.

We're a Kiretsu, who knows the world works in mysterious ways but. And so they grew very very large. So then early on we made this decision. I remember we, we went to our lead investor at the time and you know we had this conversation. It was like hey actually we think we should probably just focus on this self-driving thing. You know it was actually a very interesting conversation because the reaction was like oh that's just like obviously way too small a market.

Um and like you know you're never going to build like a gigantic business that way. And we're like we think it's probably a much bigger market than you think it is because there's like you know all these self-driving companies are getting crazy amounts of funding and the automotive companies are doing huge programs in self-driving. And it clearly is the future.

Like it feels like something that should exist. And so we're like, if we focus on it, we think we can build like build the business much more quickly. And it's funny looking back because both things are true. It is both true that it enabled us to build the business to get to scale pretty very quickly and it's also true that that was not a big enough market to sustain a gigantic business.

The story of Scale in many ways is like this progression of like how do you continue, you know, AI is this incredibly dynamic space. Lots of things are constantly changing and a lot of I think what we pride ourselves on at the company is how we've been able to continue building on and contributing to this very fast moving industry.

When did you become much more aware of the scaling laws? Because one of the interesting facts that sort of emerged is that you're a little bit the Jensen Huang of data. I think that in self-driving scaling laws were not really a thing because and the fundamental, the biggest reason actually was that like one of the biggest problems in self-driving is that your whole algorithm used to run on the car. And so you're very constrained by the amount of compute you have access to and is available to you.

So like a lot of the engineers and a lot of the companies working on self-driving never really thought about scaling laws. They were just all thinking about like, okay, how do you keep grinding these algorithms to be better and better, better that are like small enough to fit onto these, onto these cars.

But then we started working with OpenAI in 2019. This was like GPT-2 era and I would say like GPT-1, GPT was sort of like this curiosity. GPT-2, I remember OpenAI like they would have a booth at these like large AI conferences and they would like, you know, their demo would be to allow researchers to like talk to GPT-2. And it was like mildly like it was, it wasn't like particularly impressive but it was like kind of cool.

It was like kind of this thing. And then um, I think by GPT-3 it was sort of like that's when the scaling laws clearly, you know, felt very real. And that was, I mean I think GPT-3 was 2020. So it was actually like long before the world caught on to what was happening.

Yeah. Did you know as early as 2020, did you have a strong inkling that this was really going to be like the next big chapter of scale or not until chatGPT took off? Was that clear? I think that like in 2020, I think it was clear that scaling laws were going to be a big thing, but it was still not totally obvious.

I remember this like interaction, you know, I got early access to GPT-3 and then it was like in the playground and then I was like playing with it with a friend of mine and I told my friend of mine, oh, you can like talk to this model. And during the conversation, um, my friend got like visibly frustrated and angry at the AI, but in a way that wasn't just like, oh, this is a dumb like toy.

It was like in a way that was like somewhat personal. And that's when I was. I realized like, whoa, this is like somehow qualitatively different from anything that existed before. I feel like it was passing the Turing test at that point. Kind of. It was like semblances. Yeah. It was like sort of like the glimpses of it potentially passing the Turing Test.

Right. But I think the thing that really caused the recognition of I would say generative AI, which is still even the term in some ways it was really DALL-E. I think that convinced everyone. But I think, I think my personal journey was like GPT-3 like was like highly interesting.

And then. And so it was like one of many bets at the company. And then in 2022, over the course of DALL-E and then later chatGPT and GPT-4 etc., we worked with OpenAI on Instruct GPT, which is kind of the precursor to chat GPT. It became very obvious that that was at the formative moment for the company and for frankly the world.

That's when we saw it as well with the big shift in companies because it was that 3.5 moment release end of 2022 and we started seeing a bunch of companies and smart people changing directions and pivoting their companies in 2023. And that was that moment, this dynamic that you referenced, which is kind of the, you know, Scale is the Nvidia for data kind of thing.

I think that became quite obvious. I would say GPT-4 really was the moment where was like. It was like, wow, this is like, like scaling laws are very real. The need for data will basically grow, consume, know all available information and knowledge that humans have.

And so it was like, wow, this is this like astronomically large opportunity. Yeah. For seemed like the first time it was something that you could get to not hallucinate basically ever. You could actually have a zero hallucination experience in limited domains which is, we're still sort of in that regime even at this point.

The classic view is that if it's hallucinating you're not giving it the correct data in the prompt or context or you're trying to do too much in one step. Yeah, I mean I think the reasoning paradigm has a lot of lags and it's actually been interesting this last era of model improvement because the gains are not really coming from pre-training which is, so we're like moving on to a new scaling curve of reasoning and reinforcement learning.

But it's like shockingly effective and I think that you know, the analogies between like AI and Moore's Law are pretty clear which is like, you know, you'll get on different like technical curves but like if you zoom way out it'll just feel like this like smooth improvement of models.

One of the things that has been popping up with some of the like really big well-known rappers is they're getting access to full parameter fine-tunes of the base models, especially the frontier base closed source models. Is that like a big part of your business or something that people are sort of coming to you for just like these verticalized full parameter fine-tuned data sets?

Yeah, I think this is going to be a blueprint for the future. Right. So right now I mean like the total number of large-scale parameter fine-tune or reinforcement fine-tune models is still pretty small. But if you kind of think about it like that like one version of the future is that every firm's core IP is actually their specialized model or their own fine-tuned model.

And just in the same way that like, you know, today you would generally think that the CoE, the IP of most tech companies is their code base. Um, in the future you would generally think that their specialized IP might be the model that powers all their internal workflows. And what are the special things they can add on top?

Well, they can add on data and environments that are somehow specific, very specific to the day-to-day problems or information or challenges or business problems that they see on a day-to-day level. And that's the kind of like really gritty real-world information that you know, nobody else will have because nobody else is like doing the same, the exact same business motion as them.

There's a lot of weird tension in that though. I remember friends of ours from one of the top model companies came by and they were like, hey, do you think YC and YC companies would give us their evals so we could train against it. And we were like no dude, what are you talking about? Why would they do that? Because that's like their moat.

And then I guess now that based on this conversation, it's actually, I mean evals are pretty important as a part of RL cycles and then even the evals are not really the valuable part. The valuable part is actually the properly fine-tuned model for your data set and your set of sort of problems.

Yeah, it's like these Lego blocks, right? If you have the data and you have the environments and then you have a base model, you like can stack those on top of each other, get a fine-tuned model and obviously the evals are important. This is some of the tension and this is basically in a nutshell the sort of like does AGI become a Borg that just sort of like swallows the whole economy in like, you know, as one firm or do you still have a specialized economy?

My belief, generally speaking, is that you still do have a specialized economy. Like, like these models are platforms but the like like alpha in the modern world will be determined by, you know, to what degree you're able to sort of like encapsulate your business problems into data sets or environments that are then conducive towards building like differentiated models or differentiated AI capabilities.

Yeah, that's why asking for evals was so crazy to me because it's like okay, you get the evals, the base model is way better and then not, you know, now all your competitors have exactly the same thing. That used to be your advantage. I think we will undergo a process in AI where we learn what the bright lines are. Right.

I mean I think that like it's like very obvious and intuitive to tech companies that they should not give away their code base and they should not give away their database. Like they should not give away their data, they should not give away their code base. The analogs of that in a highly AI-fueled economy I think will identify over time. But are, yeah, the evals, your data, your environments, etc.

I think you have a very tech-optimistic view of what the future is going to be with how jobs are going to be shaped. Can you talk more about that? Because I think you hinted at it before. It's going to be more specialized. It's not that all these jobs are going to go away.

Right. First off, it's undeniably true that we're at the beginning of an era of like a new, a new way of working. Like, like, you know, this, there's this term that people have used a lot time which like the future of work. Well, we are like entering the future of work or there' certainly the next era. And so work fundamentally will change.

But I do think humans own the future and we are like, we have a lot of agency actually and a lot of choice in how this sort of like reformatting of work or how the reformatting of sort of like workflows ends up playing out. You know, I think you kind of see this play out in coding right now.

And I think coding in some ways is really the sort of like case study for other fields and other, you know, other areas of work where sort of the initial phase is the sort of like assistant style thing where you know, you're kind of doing your work and then the models are kind of like assisting you a little bit here and there.

And then you go to a, you know, the sort of like cursor agent mode kind of thing where you're, you're like synchronously asking the models to like carry out these workflows and you're sort of like you're managing like one agent kind of, or you're sort of like, you're kind of like pair programming with a single agent and then, and then now with like Codex or other systems like it's very clear.

The paradigm is like, oh, you have this like, you have this like swarm of agents that you're going to deploy on like all these various tasks and you're just going to like, sort of like, you know, like give all these tasks and you'll have this sort of like this, this cohort of agents that are sort of like doing this work that you, you think is appropriate.

And that last job has a semantic meaning in the current workforce. It's a manager. You know, you're basically managing this sort of like this set of agents to do actual work. And so, and I think that like AGI or you know, AGI or Doomers or whatnot, like they take this view that like, oh, even this job of like managing the agents will just be done by the agents.

So like humans will be taken out of the process entirely. But our belief, my personal belief is that, you know, this is, management is very complicated. Management is also like more about like what's the vision that you have and what's the sort of like, what's the like end result. You're aiming towards and those will be fundamentally I think like you know we have a human demand and human desire driven economy so those will be driven by humans.

And so I think the terminal state of the economy is just is large scale. Humans manage agents. In a nutshell, I have a funny story where founder friend of mine is trying to promote one of his you junior employees. But they're really, really smart and they're working on the agent infrastructure.

And then he was like hey, do you want to like, you know, I'm looking for someone who could step into management. You've never managed people before, do you, you know if we hired some people under you, like how would you feel about that? And this you mid 20 something really smart, sort of...

He's just like, he an engineer and he's like why would I do that? Just give me more compute the model. Look at what just happened to the model literally like last month and I didn't have to do anything. It just started doing things that it couldn't do a month ago. Why would I want to manage people?

Like just give me like I will just manage more agents for you and it's fine. Okay, so what are the unique things that humans will do over time? I mean I think this like this like element of vision is very important. This element of like kind of like debugging or sort of like fixing when things go wrong.

Like most of a manager's job, speaking as a manager is, is just like putting out fires, dealing with problems, dealing with like, like issues that come up. Like I think intuitively, you know, the idealistic manager job seems like this very cushy job because you're like oh yeah, all the other people do all the work and I'm just sort of like I just vaguely supervise.

And then the reality is obviously like highly chaotic. I think people often jump to this like you know, extreme reality where it's like oh yeah, these like, you know, you're just going to manage the agents and you're going to sort of like live this like you know, kind of Victorian life where all your problems are solved. But, but no, I think it's still going to be pretty complicated like getting agents to like coordinate well with one another and like coordinating the workflows and, and, and debugging the issues that come up like these are still complicated issues.

And you know, having seen what happened in self-driving which was more or less that like you it was easy to get to 90%, very, very hard to get to 99%. I think that like something similar will happen is with large scale agent deployments and that like you know, final 10% of accuracy will be like, you know, will require a lot of work.

Yeah. Even for self-driving cars right now there's the remote assist for all these super edge cases. So there's still a human at the end managing the car. Yeah. And the ratio by the way, I mean the companies don't publish them but I think the ratio is something like five cars to one teleoperator or maybe even less than maybe three cars per teleoperator. So u the ratio is like you know, much lower than people think.

I think that like humans are much more involved even in self-driving cars than I think most people appreciate. I mean which if you put it in that perspective I think it's still very optimistic. It's just the output of getting rides instead of doing. In today's world if you Uber driver just do one car. In this world you can do five cars.

Right. Well you have to believe for this like for an optimistic version of the future where you know, unemployment is still low, etc. You just have to believe that humans are like almost insatiable in their desire and their demand and that like you know, prices will go down, things will become, you know, the economy will become more efficient and we'll just like want more.

And I think this has been a pretty reliable trend for like the history of humanity is that like, you know, we have somewhat insatiable demand. And so I, I have like conviction that like you know, the economy can kind of get as efficient as it needs or as it like can get like hyper, hyper efficient. And then human demand will just like continue to sort of like fill the bucket.

Yeah. In the 20th century, you know, when you said computer, maybe early 20th century, people didn't think of like a computer as it is today. They thought of a human being that would sit in front of a punch card tabulator and that was like what a computer was doing. It's a job title.

Literally that was a real person's job. And then of course now today it's like where are all the computers? Well they're actually real computers now. I don't know. That was the Apollo mission. It was a bunch of people just crunching numbers with their trajectories of the Apollo. And that was it. Because the computer that went on the rocket is actually was a microcontroller with I think only like single digit hertz.

It was like very tiny amount of computations, it was just humans doing it. Totally. And even this like I mean I think the concept of being a programmer is somewhat, is like highly esoteric in the sense that like, oh, you're like writing the instructions for these like machines to just like, you know, just continue do repetitively.

And in some ways it's like the leverage boost that all humans will get is like similar to the leverage boost that like programmers have had historically for a long time. I think like a lot of people in Silicon Valley say this like the, the closest thing to alchemy in our world, pre AI, let's say, is programming.

Because you sort of like, you can do something that creates like, like an infinite, there's these infinite replicas of whatever you build and they can sort of like run an infinite number of times. And, and I think the entire human workforce will soon see that, that large of a leverage boost which is extremely exciting because I think that like programmers have like benefited over the past few decades from this like unique perch where they, they have like you know, one 10x or 100x engineer can like, can build something like absolutely incredible and like very, very valuable and like very shockingly productive.

And all of a sudden I think like, like humans in all trades I think will gain this like level of leverage. Alex, I'm curious to return to a point that you made earlier about like how scale has kept reinventing itself. If you had to like describe the arc of scale, like what, what's, what's the story and what were the turning points?

Our initial business was all around producing data, generating data for various AI applications and primarily self-driving car companies. Right. For the early years it was really like you're saying you're really focused on that? Yeah, for the first three years fully focused on that. One of the properties of focusing on that business, of building that business is over time we had this like obligation to really like get ahead of most of the waves of AI, if that makes sense.

Because you know, for AI to be successful in any vertical area, it needed data. And so like our demand for our products would precede a lot of times the actual sort of like evolution of AI into those industries. So you know, as an example, we started working with OpenAI on language models in 2019.

We started working with the DoD on government AI applications and defense AI applications in 2020. It was like long before I think the you know, recent sort of like drone fueled, you know, AI AI craze in the Department of Defense. We started working with enterprises long before there was sort of like this, you know, the recent sort of like larger waves around enterprise AI implementation.

So almost sort of systemically or intrinsically we've had to basically build ahead of the waves of AI. I think this is actually quite similar to Nvidia. You know, whenever like Jensen gives his annual presentations about you know, Nvidia and a two trends outlook like he always is so ahead of the trends and that's because he has to get there on the trend before the trend can even happen.

That's I think been one, one way in which our businesses continue to adapt. Because AI is like this, you know, it's this, this like it's the fastest moving industry I think ever in the history of the world. And so you know that each turn, each evolution has been, has moved incredibly quickly.

The other thing that happened late 2021, early 2022, we started working on applications and so we started building out AI based applications and now much more so agenic workflows and agenic applications for enterprises and government customers.

And this was an interesting evolution of our business because. Because historically like our core business is highly operational. You know, we build this like data foundry. We have all these processes to produce data. It's a very operational process that involves like lots of humans and human experts to be able to produce data with quality control systems in place.

That highly operational business and the success of that business is what created the momentum for us to you know, sort of dream about building an applications business. When we went into it, I had studied other businesses that had basically successfully added on very different businesses and what are sort of like the unique traits or why do some of those work?

And one of them that is probably the most interesting I think is like the most singular in modern business history is Amazon building AWS. You know, if in 2000 you had written a short story that said that like you know, this large online retailer would build this large-scale cloud computing, rent a server business, like it would seem like nonsensical.

I remember when they launched AWS in 2006, Amazon stock went down because all the analysts thought it was such a terrible idea. It had never been done before. It just like it doesn't seem related at all to their core business. It has, it's like this like weird thing.

But the sort of like wisdom of that was I think twofold. I think like first and from talking to people who are like there at the out, you know, the sort of like the genesis moment of this business, like one thing, probably the most important thing was that they had conviction that the sort of like underlying business Model they’d see would basically be this like this like infinitely large and growing market.

Like that market would literally grow forever. There would be like this like exponential of the amount of compute that needed but up needed to be built up in the world. And if you did that there was like sufficient cost of you know, cost advantages from economies of scale.

I think like startups, you know, you kind of like, you kind of have to like switch modes at a certain point where like early on you're trying to go for very, very narrow markets, like almost the narrowest markets you can. And then you're just trying to like gain momentum and then sort of like slowly grow out from those hyper narrow markets.

And then at some point you, if you like have ambitions to be a 100 billion dollar company or more, then you have to sort of like switch gears and say where are the infinite markets and how do you build towards those infinite markets? And so this was sort of like the moment where we realized that and, and the simple realization was that every business and every organization was just going to have to reformat their entire businesses with AI-driven technology.

And now obviously like agent-driven technology. And that would just be like over time that would swallow the entire economy. And so it was like another one of these like okay, that's an infinite business to build out AI applications and AI deployments for large enterprises and governments.

I think a lot of people don't realize that you guys are in the middle of this transformation. They still think of Scale as the data labeling company. But have you fast forward 10 years, do you think most of Scale will actually be the agent business?

Yeah, it's growing much faster at this point. I think it's an infinite market. So the crappy thing about most markets is that they have like a pretty shallow S curve. But then you look at hyperscalers or like these like mega-cap tech companies and they just have like these like ridiculously large markets.

So you really want to get into these, these like infinite markets. So our strategy so far has been to focus on building use cases for, you know, focus on a small number of customers and be quite selective. So we work with, you know, the number one Pharma company in the world, the number one telco in the world, the number one bank, the number one healthcare provider.

And we work a lot with the U.S. government, you know, the department, Department of Defense and other government agencies. And the whole thing is like how do we take a very focused approach towards building stuff that resembles real differentiated AI capabilities?

And all of this I think sounds somewhat right, but we have this multi-hundred million dollar business in building all these applications. By my account I think it's one of the largest AI application businesses in the industry. Certainly what our investors tell us and it's fueled by our differentiation in the data business because our belief fundamentally is that kind of what we talked about before the end state for every enterprise or every organization is some form of specialization imbued to them by their own data.

Our day jobs historically have been producing highly differentiated data for you know, these like large-scale model builders in the world. And then we can apply that wisdom and that capability in those operational capabilities towards enterprises and their unique problem sets and give them specialized applications.

Honestly like it kind of sounds like Palantir at the like most zoomed out level. You sort of like squint in that you're a technology provider. We're like a technology provider to like the most, you know, some of the largest organizations in the world with a focus on data.

And I think the key difference is like you know, Palantir has built a real focus around these data ontologies and really solving this like messy data integration problem for enterprises. And then our whole viewpoint is like what is the like most strategic data that will enable differentiation for your AI strategy and how do we like generate or harness that data from within your enterprise towards developing that?

I guess you will end up being pretty big competitors in another five, 10 years. But for now like it's basically so greenfield. I mean they think it's an infinitely large market so you might not ever meet actually. Which is interesting.

Yeah. I think in practice now we actually frankly we're more partnered with Palantir. Makes sense than competitive with them. Yeah, well that's because the problems that these giant organizations are actually so massive and intractable that they throw up their hands like they have no shot at ever hiring people who could possibly solve the problem.

But a company like Scale or a company like Palantir can actually hire kind of the same kind of people who would apply to YC. Actually it's kind of like, yeah, I don't know. The through line in my head right now is realizing like there's plenty of capital and then the limiting agent is actually really great, technical, smart people who are optimistic and actually work really hard. There's like not enough of those people.

That's true for the world. And by the way, I think one of the cool things about agents as we were talking about before is that like all of a sudden those people get near infinite leverage. So I think we're going to. I think that bottleneck gets exploded now hopefully due to, due to AI again.

I think you know, just like how in cloud AWS is the largest by far, but there's so many other cloud providers that actually are all like, like it's not a winner take all kind of business per se and it doesn't have to be. Yeah, exactly. And, and, and I think that um, it's just too big of a market to even be close to winner takes all.

Like I just. There's no single organization that could have the operational breadth to be able to swallow the whole market. Talk about operations. You clearly are living in the future, which is super cool. I'm sure you're running Scale with all these agents and tools already to make it very efficient.

Could you share some of the things that you're doing internally as a company and agents you're adopting so you can do more with less people? You know, we saw this early because when the model developers were starting to develop agents and starting to develop using reinforcement learning, like actual, you know, like reasoning models where the models could actually like really do end to end workflows.

We were responsible for producing a lot of the data sets that enabled the agents to get there. And then we saw just like how effective that, that training processes. I think that like the efficacy of reinforcement learning for agent deployments is like, is pretty insane.

So then once we realized that we realized like okay, if you can actually like you know, turn existing human-driven workflows into environments and data for reinforcement learning, then you have this ability to convert these like human workflows and human workflows, especially ones where you're like okay with some level of faultiness and, and okay with a certain level of reliability, you can convert those into agentic workflows.

So there's all sorts of like, you know, agent workflows that, that happen in our hiring processes and happen in our quality control processes and happen to sort of just like automate away certain like data analyses and data processes as well as various sales reporting.

Like it's sort of like embedded at every major org of the company and the whole thing is like it's just like mindset. Like can you identify these like very repetitive human workflows and basically like undergo this process where you convert that into datasets that enable you to build automation tools.

What do these datasets actually look like? I mean for browser uses, like is it an environment? And then you know, here's a video of a human being going through this process of filling out this form and deciding like yes, no on this drop down or something.

I mean, you know, what's a concrete example just for the audience? One of the processes that we go through is like, you know, you'll take a sort of like full packet from a candidate and you'll like want to distill that into like you know, a brief of some sort that sort of like gives all the salient details about that candidate for like decision by a sort of like broader committee.

And these kinds of cases, you know, broadly speaking, like deep research +1 kind of things are like the lowest hanging fruit. It's just sort of like can you take these processes that like more or less look like, you know, you have to like click around a bunch of places and pull a bunch of pieces of information and then blend them together and then push, produce some analysis on top of that, like that process.

That fundamental like information-driven sort of like analysis process is the easiest thing to drive via agentic workloads and the kinds of data you need are just like, you know we call them kind of environments. But usually it's just like what is the task? What is the full sort of like dataset that's necessary to conduct that task and what is like the rubric for how you conduct that effectively?

Do you need RL and fine-tuning when like prompt engineering and meta prompting seems so good? I think that, yeah, I mean, I think, I think prompting, I mean as the malls get better, prompting will get better. But like prompting gets you to a certain level and then reinforcement learning gets you beyond that level.

And um, actually this is a good point. I think that like probably most of the time and art in our business it's mostly prompting that just is like works really well. I mean that's the weird thing is like oh shoot, you don't have to crack open the models and then frankly like the next models are going to be so good and then the evals are mainly about picking which model or you know, at what point do you switch to the next one?

I do think startups need basically like a strategy for how they like will walk up the complexity curve, so to speak. Like you need to like you whatever product or business you build like needs to like really benefit from the ability to race up this complexity curve which is the broader curve of capability of the models.

I mean you actually created this leaderboard that has a lot of these super hard tasks that are trying to go into this next curve of reasoning. Can you tell us about it? One of the things that we built in partnership with the Center For AI Safety is humanity's last exam. It was a funny name.

I think, unfortunately, there will be yet another exam beyond it. But, you know, the idea was how, like, let's effectively work with, you know, the smartest scientists in the field. And, you know, we worked with many brilliant professors, but also a lot of individual researchers who are quite brilliant.

And we just collated and aggregated this data set of what the smartest researchers in the world would say are the hardest scientific problems that they've worked on recently are they solved them or they sort of like came to the right. You know, they were able to solve the problems, but they're sort of like the hardest problems that they're aware of and know of.

I was curious how you came up with these problems. So each of the professors contributed new problems. So these are not. These are problems that have never appeared in any textbook or any exam ever. They just like came out of their brains and they like typed up like a new problem, like from scratch.

Am I saying this right? Yeah, yeah. And the general guidance was like, you know, what has come up recently in your research that you think is like, is a particularly hard problem. Right. The problems are stupidly hard, incidentally. They're like insane. I don't hear if you guys have looked at these problems. They're totally crazy.

Yeah, it's totally crazy. And by the way, there cannot be searched on the Internet. It is like, you need to have a lot of, a lot of expertise and actually think about them. Yeah. For quite a long time. Yeah. They require a lot of reasoning.

I recently, like right now. So we have a time limit where the models can only think for, I think it's 15 minutes or 30 minutes. And one of the most recent requests from one of the labs is like, can you please increase that time limit to like a day so that the model has like up to a day to think about the.

To think about the problems. But yeah, no, they're deviously hard problems. Unless you have expertise in the specific problem, you probably don't have a chance of getting it right. But even this evaluation, like, I think when we first launched it, you know, and this was just earlier this year, the best models were scoring like 7%, 8% on it.

Now the best model scores north of 20%, it's moved really, really quickly. And I think, you know, I think... Do you think we're going to get a benchmark saturation for this one as well? I think eventually yeah it'll. Be saturated and then we have to move on to new evaluations.

I mean, I think the like the saving grace for the naming was that it is the last exam. The new evals will be sort of like real world tasks, real world activities which are sort of like fundamentally fuzzier and more complicated.

Have you solved any of the problems yourself, Alex? I know you were a competitive math person for a long time. Yeah, yeah. I mean the math problems require a lot of. They're like very deep in the fields. I think I was, I managed to get a handful but like most of them are like hopeless.

Yeah, I looked at the ones that the models can solve and so, so that was, that was one of the evals and we've produced a number of other evaluations but. But yeah, I think that like the in the AI industry really I think continues to suffer from a lack of very hard evals and very hard tests that show really like the frontier of, of model capabilities and these evals.

When you get, when you build an eval that sort of like becomes popular in the industry, it has this like deeper effect which is that that's all of a sudden the like North Star in the yardstick that, that researchers are trying to optimize for. And so it's actually this like very gratifying activity. You know, we built humanity's last exam. You know, most of the like all the model providers, you know, will always report their results.

There's like tons of researchers who are really motivated by, by doing a good job. I mean it's, it's. And the models are going to get you know, deviously good at like, you know, frontier research problems I guess. Sam's starting to talk about that Stage four innovators of AGI is coming and that's the prognostication for the next year.

Do you think that's correct? The next 12 to 24 months is really the moment that literally new scientific breakthrough is coming from the operation of reasoning in these models. I mean, I think it's super plausible in fields like biology. And this is probably one of the ones that comes up the most.

But there's like, there's probably intuitions that the models have about biology that humans don't even have because it's just, you know, they have this like different form of intelligence. Right. And so you'd expect there to be some areas in some fields where the models have some fundamental deep advantage versus humans.

And so I think it's like very realistic to expect in those fields. Biology, I think is sort of like the clearest one for me. Can I already happened for chemistry last year. The NOBEL prize went to U, the Google team, Dems and John Jumper with AlphaFold.

Yeah, exactly. That was like a huge jump. Before that there was this competition where they were trying to get more protein fold structures that were going to get soft and it was like abysmal and AlphaFold destroyed it. It's a strange time to be a scientist, but an exciting time for science.

There's this short story, it talks about this future where like, you know, there's effectively AIs that are like, that are conducting all the frontier of R and D research. And scientists, you know, what scientists do is they just sort of like look at the discoveries that the AIs make and sort of like try to understand them.

Yeah, I mean, I think that like, very exciting time to see how the frontier of human knowledge expands. And then, I mean, I think that'll be great because in areas like in biology will fuel breakthroughs in medicine and healthcare and, and all other and all these other things.

And then the majority of the economy will chug along, you know, giving humans what they want. China open sourcing or DeepSeq open sourcing their models is like another very interesting question. Like how does that play out? And there's this awkward sort of thing that the best open-source models in the world now come out of China.

I mean that's sort of this like awkward reality to contend with. What do you think we can do to just make sure that it's the American models that are ahead or is that written in the stars or, you know, something tells me that's not the simplest explanation for me about why the Chinese models are so good is espionage.

I think that there's, there's a lot of secrets in how these frontier models are trained. And when I say secrets that they, you know, it sounds more interesting than they are, but there's just a lot of tasks and knowledge. There's a lot of like, you know, tricks and small U and intuitions about where to set the hyperparameters and like, you know, ways to make these models work and to get that model training to work.

The Chinese labs have been, have been able to move so quickly and accelerate and make such fast progress, whereas some even like very talented US labs like have made progress less quickly. And I just purely think it's because you know, a lot of the secrets about how to train these models, you know, those secrets leave the frontier labs and make their way back to these Chinese labs.

I think the only way to model the future is that China has pretty advanced models. You know, the solace right now is they're not the best models. They're sort of like a half step behind, let's say. But it's tough to model what will happen when it's sort of truly neck and neck.

We're very behind on energy production, which is just pure regulation. Like that could be fixed in two seconds. But you know, hasn't been yet. That's a huge problem. I mean if you look at, you know, not that the past will be a predictor of the future. If you look at what US total grid production looks like, it's like, looks flat as a pig.

And if you look at, you know, Chinese, I saw that aggregate, you know, grid production, it's like, you know, it's doubled over the past decade. It's just like, it's just this like straight up into the, I saw that and it's astonishing. It's, I mean that's just a policy failure.

China just, you know, the vast majority of that is coal and coal's growing in China and in the United States actually renewables have grown a lot. But renewables trade off against the sort of fossil fuels. So we've sort of like done a transition of our energy grid, whereas they're just continuing to compound.

Let's say we have this issue on power production board. We're advantage in chips. I think like net net. We will come out ahead on compute. If you look at data, I mean this goes towards a lot of the questions you've been, you've been asking about. But like, I mean, I think China is like fundamentally very well positioned on data.

It's weird to say because it's obviously like, you know, we help all the American companies with data in China. They can ignore copyright or other privacy rules and, and they can sort of, you know, build these large models without abandon. And then, and then the second issue is that there's actually large-scale government programs in China for data labeling.

There are, you know, seven data labeling centers like in various cities that have been started up by the government itself. There's large-scale subsidies for, for AI companies to use data labeling, a voucher system. In fact, there's like college programs. Because you know, one of the interesting things is in China, like employment is such a large national priority that they, like, you know, when they have a strategic area like AI, they'll like figure out, okay, what are all the jobs and they'll like create these like funnels to create those jobs.

And then we're seeing this in robotics data too where like there's already in China, they're like large scale factories full of robots that just go and collect data. And, and strangely enough, like even a lot of US companies today actually rely on data from China in training these like Robotics foundation models.

Long story short, I think China likely has a data, an advantage on data and then the algorithms. You know, the US is on net much more innovative but if espionage continues to be a reality then like you know, you're basically even on algorithms. So. So it's hard to model.

But I think that probably like, you know, it's like 60, 40, 70, 30 that the United States like has like an undeniable continued advantage. But there's like a lot of worlds where China just like catches up or potentially even overtakes. I mean the scary thing for me is you know, watching Optimus or YC has some robotics companies like Weave Robotics and we look at those things, the software can be as good or better than anything coming out of China.

But when it comes to the hardware, it's like bomb cost over here, 20,000 to 30,000 bucks. Like you can't know. We can't even make like high precision screws over here and then over there Sameine, the same robot, the embodied robot could be made like I don't know, two, three thousand, four thousand dollars.

Right. It's like you just walk down a street in Shenzhen and like they got it, you know. And so how do you compete against that at sort of, that at state level? The degree to which China is incredible at manufacturing, I mean that's, that's a very big problem.

And it relates to defense and national security. It's a fundamental issue because on some level defense and national security will boil down to which countries have more things that like can deter conflict or can go into, you know, can, can shoot other things down.

Yeah, I don't think it's going to be fighter jets and aircraft carriers anymore. I mean it's probably going to be, you know, this micro war of. It's like hyper micro. It's drones and embodied robots and I mean. Yeah, exactly. Drones, embodied robots. Cyber warfare, the cold war era philosophy of like, you know, you build like bigger and bigger bombs.

It's like the exact opposite of that. It's actually like, it's like the fragmentation and, and, and move towards sort of like, you know, smaller, more nimble, attributable Resources is the, is the that's like one of the big picture trends, I would say.

And then the other big picture trend is just what we believe which is the move towards agenic warfare or agenic defense, which is basically, you know, if you if you actually mapped out the what warfare looks like today or like what, like the, you know, the actual process of a conflict.

You know, if you look at Russia, Ukraine or other conflict, other conflict areas like the decision-making processes are driven, are remarkably manual and human-driven. And it's just like all these, all these like very critical battle-time decisions are made like with very limited information unfortunately in these like very manual workflows.

And so it's very clear that if you used AI agents, you would have perfect information and you would have immediate decision making. And so we're going to see this like huge shift towards agent-driven warfare and agent-driven conflict. And it has the potential of turning these conflicts into these like almost incomprehensibly fast-moving kinds of scenarios.

And that's something that you guys are actively working on. Right? Can, is there anything that you can talk about? I assume some of it is classified, but. Yeah so one of the things we're doing is we're building this, this system called Thunder Forge with the Indo-Pacific Command out in, in Hawaii.

It's responsible for the sort of the Indo-Pacific region and it is the flagship DoD program for using AI for military planning and operations. So we're basically doing exactly what I said we are. We take the, the existing human workflow, the military works in what’s called a doctrinal way or they're sort of like governed by the doctrine of this like, you know, very established military planning process.

And you just convert that into you know, a series of agents that work together and conduct, you know, the exact same task, but it's just like all agent-driven. And then all of a sudden you, you turn these like very critical decision making cycles from you know, 72 hours to 10 minutes and it kind of like changes it from, you know, you know, when you play chess, if you play chess versus a human, they just spent all this time thinking, you know, you know, it's sort of this like slow game and if you play chess against a computer, it's just like these immediate moves back and it's like this sort of like unrelenting form of warfare.

I mean some of it is like that being able to see the chain of thought immediately was, is the most powerful thing. Yeah, because I don't want the answer. I Want to see how you got there. And then actually seeing the reasoning itself was so powerful. I mean, that's actually why the launch of that first DeepSeek was way more interesting, because I think 01 had come out, but they hid the reasoning and it was s like, no, the reasoning is actually a really important part of it.

And the only reason why they hid it was they didn't want other people to steal it, which they did. Anyway. I think that that's another like, interesting thing about this space, which is that, you know, so far you could really model it as like, there's like advanced capabilities and you can try to keep those secret and you try to keep those closed, but they open over time, kind of no matter what you do.

Well, I mean, clearly, Alex, you've done a lot of incredible things and transformed your company multiple times and you have all these Deep Matter expertise in many areas. You're clearly hardcore. Is there advice for the audience to be more like you?

You know, I think that the biggest thing is you just have to really, really, really care. And I think it's like a folly of youth in some ways that when you're young, like, almost everything feels like you so astronomically important that you just like, you try immensely hard and you care about every detail.

You know, everything matters just way more to you. And I think, and I think that trait is really, really important. And you know, it's like just in varying degrees for different people. So I wrote this post many years ago called Hire People who Give a Shit. And it really is pretty simple.

You notice. I notice, you know, when you interview people or when you interact with people, you can tell people who are just sort of like, phone it in versus people who sort of like, they like, hang on to their work as like, you know, it's like, it's like so incredibly monumental and forceful and important to them that they, they do great work.

And it's sort of like eats at them when they don't do great work. And when they do great work, they're sort of so satisfied with themselves. And so there's sort of this like, the magnitude of, of care. And one of the greatest indicators of like a just like how much I enjoyed working with people or like, frankly how successful they were at Scale was really just this like, what is what, you know, to what degree of their soul is invested into the work that they do.

And so I think that, that, you know, if you were to pick one thing that that probably is the sort of like unifier in some way. It's like, you know, I care a lot. I care a lot about every decision we make at the company. You know, I still review every hire at the company.

You know, we have this process why where I approve or reject literally every single hire at the company. And, and so I care immensely and then this and then like, I work with all these people who care immensely, and then that enables us to really sort of like, we. We feel much more deeply what happens in the business.

And as a result, we sort of like, you know, we'll change course more quickly, we'll learn more quickly, we will. We'll take our work more seriously. We'll adapt more quickly. And I think that that's been quite important to the success that we've had.

Alex, you were telling me a story recently that stuck with me about how, like, quite recently, even when Scale was a very large company, you were personally hand reviewing all like, the data that was being sent to partner companies and being like, basically like the final quality control.

Like, you know, like, you know, that data point is not good enough. Yeah, exactly. I think a lot of founders would probably, would probably not agree with this, but what your customers feel and when your customers are happy and sad, like, it really like, gets to you.

And so you have. When you have unhappy customers, it's like, it's like personally a very painful thing, broadly speaking, you know, we have this value at our company. Quality is fractal. And, and I do believe that, like, high standards sort of like they trickle down within an organization.

And you know, it's very rare that you see an organization where, like, where like, standards increase as you get lower and lower down in the organization. You know, most of the time when people realize their manager or their manage manager or their like, director or whomever don't really care, then they sort of like, you know, that that removes the sort of like, like deep desire to need to care.

And so it's like, incredibly important that that high standards and this sort of like this deep sort of care for quality is. Is this like, deeply embedded sort of tenet of the entire organization. Founder mode man. Founder mode man. We got to have you back. Thank you so much for spending time with us with that. Sorry, we're out of time, but we'll see you next time.