Discussion about this post

User's avatar
Imran's avatar

Just to be clear the dollar figures are not comparable.

Altman - scaling to a new frontier of parameter sizes

Wenfeng - copying OpenAI outputs to train a similar model

Pan - using the DeepSeek technique to train a toy model

Chen - using DeepSeek technique to train an even smaller toy model

Expand full comment
swiley's avatar

It looks to me like they're just tuning Qwen2-VL on counting and a couple other visual tasks.

I don't know if you've tried but it takes surprisingly few batches to tune a language model.

Expand full comment
1 more comment...

No posts