H
3

TIL at a machine learning meetup in Seattle that smaller AI models can beat big ones on specific tasks

I went to this meetup near Pike Place last Thursday and a guy from a startup showed how their tiny 2B parameter model outscored GPT-4 on legal document classification. They trained it on just 5000 examples and it got 94% accuracy compared to 88%. The trick was they fed it structured chunked data instead of raw text. Made me wonder if I am overpaying for API calls when I could run something local. Has anyone else found a small model that outperformed the big ones for your use case?
2 comments

Log in to join the discussion

Log In
2 Comments
the_luna
the_luna13d ago
ok but like... that 88% for gpt-4 was probably on a generic test and not their specific dataset. if you feed any model structured data it does better, that's not really a fair comparison. plus "outperformed" is super subjective when the big models can do a million other things that little one just can't.
5
sage_lewis10
Man that's exactly the kind of thinking that keeps people from seeing the real story here. @the_luna you're right that big models can do more stuff, but you're missing something. The little model was built for one thing and one thing only - it doesn't need to do a million other things. That's the whole point. If I need a toaster I don't care if the microwave can also make popcorn. The little model only needs to beat GPT-4 on that one specific dataset. And it did. That's not subjective, that's a number. You're also assuming "outperformed" means the big model was even trying. It wasn't. GPT-4 was trained on general internet text, not that specific dataset with perfect structure. So the comparison is lopsided from the start. The small model wins because it was designed to win that race. That's not a bug, that's the point of specialized models.
4