
Wdrobe
Add a review FollowOverview
-
Founded Date March 27, 2022
-
Sectors مشتريات
-
Posted Jobs 0
-
Viewed 3
Company Description
Open-R1: a Fully Open Reproduction Of DeepSeek-R1
Hey there! This article is an intro to the job, not a claim that we’ve replicated R1 yet. We’re integrating in the open, so as quickly as we have examination numbers, we’ll share them. You can follow our progress on Hugging Face and GitHub.
True, but it looks like there’s absolutely nothing to be assessed as of right now. I presume the supreme objective is to train a brand-new reasoning design and after that utilize the very same assessment metrics as o1 and the DeepSeek-R1.
Well, there should be at least some sanity check and recognition to guarantee the design was trained correctly.
Oh yes, if you are talking about the assessment number of deepseek’s model it’s coming extremely quickly!
As pointed out in the blog post there is no model called Open-R1 to test at all … not yet anyhow. This is a blog describing that Hugging face will take the R1 Deepseek model, work out how it was constructed as outlined in the paper and from what they released, and after that replicate that process.
in reality this is quite much how science works … A comes up with a plan, discovery or innovation and it is checked by B, C and D to see if it is reproduceable. Thats been the foundation of research study now for a couple of .
This blog is not stating they have currently done so … Its a blog describing an intent to begin training a design like R1 and calling it Open-R1.
Also DeepSeek-R1 was only released last week, and even in their paper they outlined the compute hours needed. While those are low calculate hours for a SOTA design this does not suggest you can train said design in a week. I ‘d personally like to be able to train a transformer model in a week, but we might require to wait a while for that level of calculate innovation.
So there are no benchmarks for a design that has not been built yet right? As laid out in the blog site, and again in reply to your question.
However fear not, there is a GitHub Repo already and factors (hell I might join myself), some prelim work done, and a master plan. A great beginning position.
n
@edbeeching
has evaluated the launched designs already
( src: https://x.com/edwardbeeching/status/1884273209136275742)
R1 simply trained on o1 outputs, so collectively …/ s. This is what the new AI czars are saying
Hi! This post is an intro to the job, not a claim that we have actually replicated R1 yet. We will totally share the missing out on piece when we have them, you can expect the models and datasets to be upload in this Hugging Face org and the code to be in this GitHub repo
That’s good and important to understand this tremendous buzz that does not have technical comprehension and explanation. Science has to do with reproduction, and if they claim to be open, let them fullfill the open part.
Please do release the training cost.
We will!
Excalidraw Hi n
@bojan2501
thanks, we will certainly be working hard to make certain this training recipe can work for little language models on customer hardware since not everybody has a cluster of H100s in your home:-RRB- The tool we utilized for the images was Excalidraw! https://excalidraw.com
looking forward to it! WTF are your discussing?
should be a joke
It’s actually cool to see how the entire open source neighborhood comes together!
Ops …
5.5 M is number reporter in the deepseekv3 tech report (just the training, not the experiment afaik), for R1 hard to estimate tbh but much less than 5.5 M imo
Historically, they have actually never ever released code or datasets of their LLM training, so I would not expect this time to be different. If they would release it that would be incredible of course!
Yes obviously!
So generally you’re asking to replace existing censorship with another flavour of censorship?
The code for the designs are inside the design repositories, e.g. for V3: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py
Hello Team, I’m Ray Bernard, the author and creator of EQUATOR. My research study team will be dealing with a paper focused on reproducing particular components of DeepSeek R1. Our objective is to recreate the cold start and offer your group with a dataset that consists of COT and other strategies to support these efforts. We like to contribute our work to help. Please let me understand if you discover this helpful. Best, Ray Bernard https://www.facebook.com/groups/1186310571520299/
Where is the assessment numbers? without it you can’t call it recreation.
8 replies
True, however it looks like there’s absolutely nothing to be assessed since right now. I assume the ultimate goal is to train a new thinking model and then use the same assessment metrics as o1 and the DeepSeek-R1.
That’s rather interesting, I was asking myself why the questions the author exposed here are not being asked by others? I think the work they have actually done is remarkable but at the same time I question why they would not put these missing out on pieces on if they are expected to be fully open.
Why even without recreation and understanding of the innovation they could impact so much the market in this way?
4 replies
Hi! This post is an intro to the project, not a claim that we have actually reproduced R1 yet. We will totally share the missing out on piece when we have them, you can anticipate the models and datasets to be upload in this Hugging Face org and the code to be in this GitHub repo
Interesting read, and it is great that we see more effort into this instructions: more optimization and less strength.
Also question what tool did the author use for developing action diagram.
2 replies
Excalidraw I’m so pleased that initiative like this currently exist, I’m gon na try to contribute:-RRB- 1 reply
looking forward to it! So racist articel
2 replies
WTF are your speaking about?
Awesome to have this open recreation began!
For Step # 1 check out https://github.com/open-thoughts/open-thoughts!
https://x.com/ryanmart3n/status/1884284101265612856
Let’s do this thing!
1 reply
It’s really cool to see how the entire open source community comes together!
Does anyone know the actual training expense of r1? I can’t find it in the paper or the announcement post. Is the 6M cost reported by media just the number drawn from v3’s training cost?
2 replies
Ops …
Has anybody asked the DeepSeek group to release their training data and code, or at least share them independently with an independent duplication project like this? Have they rejected such a demand?
A devoted replication depends upon using the same dataset and hyperparameters. Otherwise, any major discrepancies with the released standards would be difficult to pin down-whether due to training data differences or the replication technique itself.
1 reply
Historically, they have never released code or datasets of their LLM training, so I wouldn’t anticipate this time to be different. If they would launch it that would be fantastic of course!
In the meantime we need to make finest guess quotes and see if we can arrive ourselves.
You offer excellent duplication process of Deepseek thinking training. I will attempt something similar to it.
This is actually great information, can we fine tune with particular use case when code is launched?
1 reply
Yes obviously!
Please consider eliminating biased, tainted or unaligned training data and make an effort to get rid of copyrighted works from the crawl from consumption. This will make the design more usable. If you recycled anthropic curation checks, this might also assist, get rid of obviouslybiased information will likely include a great deal of value. We do not want another polluted, unaligned open source design, right? And no corporate would ever utilize deepseek or a model that reuses it, right?
We appreciate your work for the advantage of humankind, we hope.
Miike C from NJ
1 reply
So generally you’re asking to change existing censorship with another flavour of censorship?
Can’t wait! Hopefully the model will be uncensored but whatever you can do is alright! Love seeing open source building itself up. I’m not wise enough to really assist however I can contribute support lol
Hello guys, I am even simply looking for code for DeepSeek-V2, in order to fully understand multi-head latent attention. You do not appear to have code in Hugging Face even for that. Or am I missing out on something? Don’t see anything in src/transformers/models. MLA is not appropriately described in their paper, so it would be necessary to have code for this.