Qwen3 Coder Is HERE — In-Depth Testing (Games, UI & Web Dev)

मूल वीडियो सामग्रीवीडियो बड़ा करें
  • Exciting news: Quin 3 Coder has finally been released!
  • Speed improvement: Noticeably faster token generation compared to previous versions.
  • User interface design: The UI and features seem to surpass previous models significantly.
  • Some issues: A few functional glitches were noted during testing, particularly with the calculator.
  • Overall impression: Impressive outcomes overall, especially for UI generation in coding tasks.

Now I suppose I should verify. Look at the way that goes translucent when you drag it. Alright, let's check out our first-person shooter dubbed Neon Strike.

So, wow, Quin 3 Coder has been released. This is very exciting as a lot of us have been eagerly awaiting this day. Now truthfully, I have also been eagerly awaiting doing a video on this, and it's not actually posted on Hugging Face yet.

So as much as I want to start out by taking a technical look and then reading through the model card and things like that, as of right now, the only place that this is actually available for myself to test is just through the chat Qun AI where we can see right here that Quin 3 Coder is listed as a strong coding agent capable of long horizon tasks.

The only other information I have about this model comes from some local llama subreddit posts where supposedly it has a maximum context length of over a million tokens, which is fantastic. In addition to that, some other folks have mentioned that it is a mixture of experts model with a total size of 480 billion parameters with 35 billion active.

So basically, let's just go ahead and jump in right here and begin testing Quin 3 Coder.

So those of you who have kept an eye on my channel, thank you. But two, you'll notice that I did just make a video on the updated version of Quin 3.235B like 12 hours ago, so I have not slept. With that, I do apologize for any erratic behavior that we may notice in the course of this video.

I have just gone ahead of course and began with a tried and true tested approach asking it to generate a web-based operating system using HTML, JavaScript, and CSS.

First impressions, and what I will notice right here is that this thing is far, far faster in terms of the actual token generation speed than what I noticed when I was testing the newly updated version of Quin 3.235. This was extremely fast.

I will say, first and foremost, I noticed that once again as the Quin 3.235B did, it just goes ahead and generates it all entirely in one HTML script, which is not the best if you are going to use this in like a production environment because you would want the script segmented out to actually be able to more easily understand and work with them.

However, we're just going to go with it.

Let's take our first look at the Quin 3 Coder result for a web-based operating system. Holy crap. Okay, yeah, this is definitely the right and the new model, this is based on what I see right here. Probably the best result I've seen from doing this prompt period that was previously held by Kimmy K2.

Now I suppose I should verify. Look at the way that goes translucent when you drag it. I have to verify functionality before I give this thing the crown. But this is okay, the windows can be resized. This is a significant improvement in anything I've seen, especially open source.

Wow, okay. It's almost more Ubuntu style. It does have a clock, and that is the correct time. The kind of galaxy background here is quite beautiful. Very Linux aesthetic. It does have a Windows icon there next to the start menu bar, which sometimes does happen.

Okay, so that's a little hard to see, but let's just go ahead and... Okay, settings doesn't do anything. Terminal, of course, it works. And it's green. I need to type help because I would like to see the available commands.

Okay, so help, clear, date, echo, who am I, about? Let's do who am I? User. Yep. And that is correct actually from like a Linux standpoint. Don't quote me on that.

So let's do about a web-based operating system simulation. Let's check the date. Okay, looks right to me. And then clear will obviously, hypothetically, clear this terminal. Quite well done.

Wow, okay, let's try our calculator. If we double click this. Damn, that's good looking. And look at the way that this goes translucent when you actually drag it. This is really quite good.

Alright, let's do 84 times 6, which would be 504. Oh, it didn't work. 84 times 6? Okay, so 4 plus... Alright, there is a slight error in the calculator where seemingly if you press one of the buttons to actually perform some form of arithmetic, it just resets the number to zero and then doesn't actually do anything. So I will say that's a dig on this. Functionally, I've not had one mess up a calculator before. I don't believe so; it is what it is.

Let's take a look at the text editor. Oh, beautiful. It even tells us you can type any. Anything. No, I'm kidding. I'm not gonna. Hey.

Yeah. Alright, well that checks out. Can I delete this? I can. Interesting.

Now I noticed Quin 235B, the updated version that I just tested, included a web browser in this. I don't see that here, but that is okay.

File Explorer. Okay, so these don't actually open when you click them. That's alright. And then we just kind of have these things. Interesting. It actually saved the thing that I typed in the text editor, even though I closed it.

Let's just verify that these do go in focus. When you drag one, it means it goes over the others, which it does. Yeah. This is really very, very good.

It seems just based off of a first initial result. I would say that this has a very nice grasp on actual like UI and front-end development. Just based off of what I'm seeing right here.

And if we open the terminal, that shows up there as well. Alright, I think what I'm going to do now is try some of these other options. So settings didn't do anything. Let's see what shutdown does. Okay, that doesn't do anything.

Which, I mean... Oh, I didn't do the... Okay. You can't open multiples of them. Alright. That's probably for the best, but that is a fantastic result.

We're now going to test this with a web-based 3D racing game in the style of Low Poly. A first-person view from the driver's seat of a car, an opponent, and some simple options to pause, restart, etc.

I will say this is a prompt that has like it's tough for the models. Kimmy K2 did a decent job with this but the gameplay logic was never quite there if you will. And the updated variant of Quin 235B that I just tested this with also struggled with this to the point where the second iteration of it was actually worse than the first.

So again we're seeing the token speed here is extremely fast. I just saw like something a variable for sunlight or something. So I'm interested to see if this tries to implement some form like skybox or pretty lighting if you will. And again.

So again, I do want to make note that I test these in Google Chrome just because it seems a little more kind of like supported.

Algh. We have a nice little retro aesthetic right here. Alright, well the controls menu here is very thorough. That is good.

Let's... Alright, so it is quite slow and in addition to that, not the best, not the worst but not the best. So in I will give this of course another of space is the handbrake. I don't think this is going to do anything at one mile an hour.

Algh. So pause. Okay, that did work; actually, that's a decent looking pause and restart menu. Okay, resume does work. Let's just try to restart the race. Okay. That did work.

Let's go ahead and see how the Model 1 fixes a previously generated result. Being that this supposedly does have an insane context length, that should very well be able to handle that. But two, let's see how it takes some somewhat harsh user critique.

O, it does show me a preview right here in the artifact window. I would like to see if it is also garbage in this window. Okay. It is. So alght.

So I've just kind of given it some critiques of things that it can perhaps go ahead and improve. Now I'm interested to see.

I apologize for the disappointment. You're absolutely right. Far from the engaging racing experience.

Okay, so it's supposedly going to do much better. Now, of course, we try not to leave really mean comments like this to humans, but for an AI, it's probably okay. So let's just go ahead and see how it's doing.

It legitimately generated the exact same thing. Part of me almost has to wonder if it kind of did this on purpose because I was such a jerk to it in my comment. Obviously unlikely, but possible.

This script is shorter, so it is a bit different.

Algh, let's... Hey, you know, that is what.

We can go faster. You can. Hey, you know, this is a better result. Better, of course, is a metric that is judged based upon the previous results.

So this is better. Yeah. Algh. Well, we're probably gonna leave this test for maybe like Kimmy 3 or Quin 4 or something of the sort, but it did improve upon its previously generated result.

So this is going to be... I don't know how to quantize this. Quantify this. I've pasted in the entirety of The Great Gatsby, but I'm also asking it to use this text to generate an intricate platformer game for the Web.

I'm not 100% sure what this is going to come up with here, but this will definitely test some of its context capabilities as well as its capability to be creative. I suppose I don't, you know, I don't want to look at the code because sometimes I look at the code and because I'm such a coding beast, I just realize what it's gonna look like. So I'll look away.

Algh HTML. Okay. The Great Gatsby. Navigate the dazzling parties and shadowy valleys of the Jazz Age. Collect golden opportunities and reach the green light at the end of Daisy's dock. Use the arrow keys to move and jump.

Let's do it. Oh, so we have a supposedly fixed iteration right here. Let's just go ahead and see if this has...

Alright, well. Oh, the party is over. The lights are flickering out. I like that. That's the jump. The jump logic is a little messed up, I will say.

Alright, well, it was more of a test of like, can it actually parse this entire book that I pasted into it and then use that to make a game? Which it did.

Now, truthfully this is probably an antiquated test, but I do have some nostalgia for it. So we're going to go ahead and just try to get this to generate a simple website for Steve's PC Repair.

Beautiful. It did a few things that I like to note. So I think from a business perspective, the yellow for the buttons that would draw the user in to essentially spend money at your service are done in yellow such as contact and get a free quote. They do have some nice hover effects as well.

In addition to that, this background image that it shows is definitely, definitely quite good. We have a vintage handheld gaming device here as well as a... Is that a Commodore? The Pet? I don't know; I may be wrong. I was not alive when that was built.

So we have okay hover effects on these cards for our solutions. Clicking them doesn't give us anything. But all of the little... I don't know what you'd refer to these as.

These little emoji icon things are kind of correct for the specific thing they are referencing. About Steve's PC repair. I love fake origin stories. Found in 2010 by Steve Johnson. Been serving our local community with Honest Reliable for over a decade.

He's a certified computer tech with 15 years of experience. Very good. And then they have like a thing here that's alright. Oh, fake testimonials. This is good.

This is what we want. Steve saved my business laptop when no one else could. Thank you, Michael Johnson. Kind of suspicious that he has the same last name as Steve though.

Algh, 'bring my family's computers here for years. David Peterson. Very cool. Get in touch. 123 Tech street in San Francisco. We do have a fake phone number, a fake email address, some business hours.

That's very cool. Steve's closed on Sundays. I don't know about that. I feel like you could get a lot of business on Sundays.

Even that. This contact form is actually quite aesthetically pleasing. And the drop-down here for the actual service needed.

The footer, this is a properly done footer, which I noticed is kind of... you would think it's like a simple thing, but it's very hit or miss when doing these sorts of testing. So this is a GLM like footer which GLM was very good at doing these footers.

So we have our quick links. The social media icons are all here incorrect which is nice to see. The contact info for some reason did copyright 2023. That's also hit or miss. Some models do 2023, some do 2025. I don't think I've seen 2024, which is interesting.

Algh, this is definitely a solid result for a simple test. I am very pleased with this.

I do want to do another kind of web test for it so I'm asking it to do a startup page for my SaaS startup called GPU for rent. I need to draw in both prospective clients and investors with this.

So it needs to be functional and aesthetically pleasing as well as utilizing the most up-to-date implementation of psychological and subliminal marketing and colors. I am glad to see that it wasn't like hey that's not right, you shouldn't do that. And it just went ahead and started generating.

So I'm INTERESTED to see this. And then yeah, okay. So it did specifically mention some of the psychological and subliminal marketing elements such as color psychology, floating elements, subtle animations, gradient text, rounded corners, and layered backgrounds.

Alright, I'm for it.

Alright. I mean this does definitely look fitting for a cloud startup. Our software is a service startup. GPU for rent. We do have a little microchip emoji.

Oh wow. I don't normally swear in these videos when I see things, but that is really the hover effect there.

Not only does the text change but it has the little line under it that is actually... that's quite impressive.

Alright, so it tells us about the high-performance GPUs. Now these cards don't have hover effects, but that is okay. It does tell us here 24/7 support. These are all things that would kind of help.

Awesome. This looks fantastic. This right here looks really good. The way I put the most popular banner on the middle one right here.

Okay. A V100. Is it V100? Is that a real thing?

So of course we have the Nvidia V100 GPU right here. A very popular offering. The A100 right here. It even gives additional information for the actual system that the card is contained within.

So 28 very good. Get started. Get started. Contact sales. Interesting. So for enterprise, it has contact sales but for starter and pro, it has get started.

Trusted by our fake testimonials. Sarah Johnson revolutionized our ML training pipeline. Michael Chen, CTO at RenderPR Pro Studios. Scalability. David Rodriguez, he's a founder. We couldn't afford expensive hardware. No, I'm kidding, sorry.

Like I said, I'm extremely overtired, so. Alright, start your free trial. Again, the footer is very good. Look at this.

Designed with love for the future of computing. That's like this sort of like, like tool, like kind of snippet you would see on one of these SaaS startup websites.

This really did a very good job capturing this. Again, we have 2023 which is. That's fine. And the social icons here, Discord. Interesting that it put that in GitHub. So it actually changed these comparatively to Steve's PC repair to be more fitting of this style of business where it put a GitHub and then a Discord logo here instead of like a Facebook or whatever the other one for Steve's was.

This is really quite good. And then obviously these are just kind of linked to the Blob or whatever it's referenced as on the page for. This is very good at UI design.

I'm giving it the most lackluster and basic prompt here. Just saying, generate a web-based music sequencer.

This test is obviously just kind of more like quick and fast because I want to just do a wide variety of different types of testing here with this in a short amount of time.

Alright, let's take a look. That is actually really quite good looking. Even the way it put these logos or emojis for the different instruments here.

This is a... I will say this model is a significant improvement in its ability to generate UI by a long, long shot. That is something.

Alright, so for the last thing I'm going to do, because to be honest with you, I could sit here and test this for hours as I'm very much enjoying it and it is very quick to generate.

I want to give it a vague prompt but just see what it comes up with. So I'm going to ask it to generate a web-based FPS, which of course is first-person shooter and we'll just go ahead and see what it actually comes up with.

I know folks will more likely want to see a model like this tested in an agentic coding scenario, but definitely for a first initial hands-on test, I personally like testing it this way better as we can do a lot more things as opposed to just focusing on one repository through Rotodde or something like that, which is... it's harder to get a feel for the flavor of the model's capabilities in doing that, at least from a personal opinion of mine.

Alright, let's check out our first-person shooter dubbed Neon Strike.

So wow. That... oh man. No, I can't use the... Interesting. So the mouse does not do anything except if you click it does shoot. But I...

Algh. Let's take a second here. Actually, this is quite good. It just generated this in I think like 900 or so lines of code right here. 803 lines of code that it generated this in.

And it's actually we have a health bar; we have ammunition that actually properly updates based on how we use it. The only thing I see is that the actual aim does not move with the mouse. So you can click to shoot but the aim does not work.

I have run out of ammo and my weapon is the pulse rifle. Look at the... I don't know how well you'll be able to see it, but the actual little icon to describe the weapon.

To show your cursor, press escape. Oh no. Okay, so then... And I don't think I can get any of these because...

Alright, so unfortunately the only issue with this is that I cannot actually move the aim here with the mouse. If I could do that, this would really be like a... I mean even so, just being able to do this with an open-source model in one single prompt, that was like truly a lackluster prompt.

This is really quite impressive.

Alright, well truthfully that is going to probably conclude my testing on the newly released Quin 3 Coder model. As I had mentioned in the beginning of this video.

Unfortunately, it is still not actually like officially released, at least here on Hugging Face. So there is no model card for me to be able to read off or technical things to talk about.

Being that that is the case, I'm going to go out on a limb here and perhaps guess I will say that this is going to benchmark in terms of like performance.

I would say the benchmark JPEGs are probably going to show this higher than all the other open-source models in terms of like coding things. I don't know how it will compare to closed source, and I don't want to take a guess because it's very likely that I'll end up incorrect.

But with that, I will say first impressions of this model, I can't really talk too much about like speed or anything like that. Being that I am using this through a web interface.

This is of course something that can be run locally. But if those rumors are true that this is like a 480 billion parameter MOE model with 35B active. Don't quote me on that. I forget what I had mentioned in the beginning.

It will take a relatively beefy system to run this, but a model like this is something that you could probably get away with if you are patient. A DDR5 heavy RAM build.

So those are not super expensive, at least comparatively to having a bunch of GPUs in a system. But the performance here is very good.

I will say that there are marked improvements that I've noticed here in its ability to do UI. Every single test that I did had a fantastic UI, especially compared to previous QUIN models that I've used.

But even like the GLM models and things like that, this is very, very impressive in terms of its UI generation.

I will say its creativity seemed interesting with the Gatsbyed game. Again, these are like simple and fast prompts just to kind of get an init.

Think of this as like a... like you go to do a taste test somewhere. It's like the samples at like a Costco or a big box store or something if you're familiar with that, where you just go and try a bunch of different things to kind of get a feel for it.

And that was kind of what I wanted to do with this. So this is really seemingly very impressive and I do look forward to actually seeing a model card here or something like that so there can be more prudent information and benchmarks about this model.

I suppose with that, that is probably going to conclude this. If you have any questions, please feel free to leave them in the comments. And with that, thank you for watching.