Should have seen it coming

Omgboom@lemmy.zip · 2 days ago

Should have seen it coming

theunknownmuncher@lemmy.world · edit-2 2 days ago

https://arxiv.org/abs/2405.20304 they invented their own reinforcement learning framework called Group Relative Policy Optimization

EDIT: deepseek publicly released and published the model and methods to the global community, and there is now an open effort by researchers to reproduce them https://github.com/huggingface/open-r1 it is like the opposite of stealing

Sanctus@lemmy.world · 2 days ago

Yeah the original comment in this chain more describes US Telcos and shit, not this particular instance.