hugo50•1mo ago

TIL at a machine learning meetup in Seattle that smaller AI models can beat big ones on specific tasks

I went to this meetup near Pike Place last Thursday and a guy from a startup showed how their tiny 2B parameter model outscored GPT-4 on legal document classification. They trained it on just 5000 examples and it got 94% accuracy compared to 88%. The trick was they fed it structured chunked data instead of raw text. Made me wonder if I am overpaying for API calls when I could run something local. Has anyone else found a small model that outperformed the big ones for your use case?

2 comments

2 Comments

the_luna1mo ago

ok but like... that 88% for gpt-4 was probably on a generic test and not their specific dataset. if you feed any model structured data it does better, that's not really a fair comparison. plus "outperformed" is super subjective when the big models can do a million other things that little one just can't.

sage_lewis101mo ago

Man that's exactly the kind of thinking that keeps people from seeing the real story here. @the_luna you're right that big models can do more stuff, but you're missing something. The little model was built for one thing and one thing only - it doesn't need to do a million other things. That's the whole point. If I need a toaster I don't care if the microwave can also make popcorn. The little model only needs to beat GPT-4 on that one specific dataset. And it did. That's not subjective, that's a number. You're also assuming "outperformed" means the big model was even trying. It wasn't. GPT-4 was trained on general internet text, not that specific dataset with perfect structure. So the comparison is lopsided from the start. The small model wins because it was designed to win that race. That's not a bug, that's the point of specialized models.