Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
HarHarVeryFunny
on Feb 7, 2025
|
parent
|
context
|
favorite
| on:
Understanding Reasoning LLMs
DeepSeek's approach with R1 wasn't pure RL - they used RL only to develop R0 from their V3 base model, but then went though two iterations of using current model to generate synthetic reasoning data, SFT on that, then RL fine-tuning, and repeat.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: