Last Week AI Went from $100 Million to $30. This Week It Goes to $3.
Another week, another decimal point to the left.
Last week we reported that UC Berkeley Ph.D. candidate Jiayi Pan and his team successfully replicated Deep Seek’s $5 million AI model for $30. This week Liang Chen and a team of PhD computer science students at Peking University and University of Hong Kong have introduced Deep-Agent/R1-V, a new AI model costing less than $3. Their code, models, datasets, more details, and all open-source resources are available on Github.com. They shared the following about their achievements:
We firstly reveal that Reinforcement Learning with Verifiable Rewards (RLVR) outperforms chain-of-thought supervised fine-tuning (CoT-SFT) in both effectiveness and out-of-distribution (OOD) robustness for vision language models.
In our experiment, we incentivize VLMs to learn generalizable visual counting abilities, rather than overfitting to the training set.
The 2B model outperforms the 72B model in OOD tests within just 100 training steps.
The training was conducted on 8 A100 GPUs for 30 minutes, costing $2.62.
This new model relies on open-source resources from DeepSeek, Open-R1, QwenVL, Open-R1-Multimodal, CLEVR, and SuperCLEVR.
Knowledge can go hyper-exponential where it’s free to be discovered and shared.
We just released The Best of 2024. It includes 50 stories about the growth in abundance on our infinitely bountiful planet. The book is a great gift for a young person who might be worried about too many people on the planet. Or for an older person who thinks resources are finite. (Get one for Thanos).
Order yours today. It’s sure to be a collector’s item. Thanks again for reading Gale Winds. We look forward to another year of discovering and sharing all of the creative superabundance around us.
Gale Pooley is a Senior Fellow at the Discovery Institute, an Adjunct Scholar at the Cato Institute, and a board member at Human Progress.
Just to be clear the dollar figures are not comparable.
Altman - scaling to a new frontier of parameter sizes
Wenfeng - copying OpenAI outputs to train a similar model
Pan - using the DeepSeek technique to train a toy model
Chen - using DeepSeek technique to train an even smaller toy model
It looks to me like they're just tuning Qwen2-VL on counting and a couple other visual tasks.
I don't know if you've tried but it takes surprisingly few batches to tune a language model.