Skip to content
AI Article

Replicating DeepSeek-R1 with the Open R1 Project

Hugging Face's open-source initiative successfully reproduces R1's distillation pipeline, unlocking frontier reasoning models for the developer community.

Mariana Souza
Mariana Souza
Senior Editor · Jun 11, 2026 · 4 min read

DeepSeek-R1 fundamentally shifted the AI landscape by demonstrating that frontier-level reasoning capabilities do not require closed-source, multi-million-dollar infrastructure. However, for developers looking to customize, extend, or fully understand these reasoning models, a proprietary API or static weights are not enough. True open-source progress requires a reproducible, end-to-end training pipeline.

To bridge this gap, Hugging Face launched Open R1, a community-driven, fully open-source reproduction of the DeepSeek-R1 pipeline. The project's goal is to build the missing pieces of the R1 training workflow so that any developer can train, evaluate, and iterate on reasoning models.

The Three-Stage Roadmap

The Open R1 project structures its reproduction efforts around the methodology outlined in the DeepSeek-R1 technical report. This strategy is broken down into three distinct phases:

  1. Replicate the R1-Distill models: Distill a high-quality corpus of reasoning traces from DeepSeek-R1 to train smaller, highly capable models.
  2. Replicate the pure RL pipeline (R1-Zero): Train models directly using Group Relative Policy Optimization (GRPO) on curated, large-scale datasets for mathematics, logic, and coding without an initial supervised fine-tuning (SFT) step.
  3. Multi-stage training: Combine SFT and RL to transition a base model into a fully RL-tuned reasoning model.

In May 2025, the project officially completed Step 1, proving that open-source distillation can successfully match the reasoning performance of proprietary alternatives.

Key Milestones and Datasets

Rather than relying on closed datasets, the Open R1 initiative has systematically generated and released high-quality, permissive datasets to the community:

  • Mixture-of-Thoughts (May 2025): Marking the completion of Step 1, this curated reasoning dataset contains 350,000 verified traces distilled from R1. Covering mathematics, coding, and science, it is designed to teach language models step-by-step reasoning. Along with the dataset, the project released a recipe to train OpenR1-Distill-7B, replicating the capabilities of DeepSeek's own distilled Qwen-7B model.
  • CodeForces-CoTs & IOI24 (March 2025): A dataset of 10,000 competitive programming problems and 100,000 solutions distilled from R1. Alongside it, the project introduced IOI24, a benchmark of difficult international olympiad problems. Notably, a 7B Qwen model trained on the CodeForces-CoTs dataset outperformed Claude 3.7 Sonnet on IOI24, while a 32B variant outperformed DeepSeek-R1 itself.
  • OpenR1-Math-220k (February 2025): A dataset of 220,000 math reasoning traces distilled from R1 using a modified version of NuminaMath, enabling models to match DeepSeek's mathematical performance.

The Open R1 Technical Stack

To keep the codebase accessible and modular, Open R1 relies on a streamlined stack of modern machine learning libraries. The core scripts are divided into three primary tasks:

  • sft.py: Performs Supervised Fine-Tuning on a dataset.
  • grpo.py: Trains a model using Group Relative Policy Optimization (GRPO).
  • generate.py: Generates synthetic data from a model using Distilabel.

Setting up the environment requires CUDA 12.4 and Python 3.11. The project leverages uv for fast package management and relies on specific versions of vLLM and FlashAttention to ensure compatibility with PyTorch 2.6.0:

uv venv openr1 --python 3.11 && source openr1/bin/activate
uv pip install --upgrade pip
uv pip install vllm==0.8.5.post1
uv pip install setuptools && uv pip install flash-attn --no-build-isolation

Developers must also authenticate with Hugging Face and Weights & Biases to manage model checkpoints and track training runs:

huggingface-cli login
wandb login

Training and Scaling in Practice

The training recipes provided in Open R1 are configured out-of-the-box for distributed environments, specifically targeting nodes with 8 x H100 (80GB) GPUs. The repository supports both standard Distributed Data Parallel (DDP) and DeepSpeed (ZeRO-2 and ZeRO-3) configurations to handle large sequence lengths.

For example, to perform SFT on a base model using the newly released Mixture-of-Thoughts dataset, developers can launch the training script with DeepSpeed ZeRO-3 and a context length of 32,768 tokens:

accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py \
  --model_name_or_path open-r1/Qwen2.5-Math-7B-RoPE-300k \
  --dataset_name open-r1/Mixture-of-Thoughts \
  --dataset_config all \
  --eos_token '<|im_end|>' \
  --learning_rate 4.0e-5 \
  --num_train_epochs 5 \
  --max_seq_length 32768 \
  --per_device_train_batch_size 2 \
  --gradient_checkpointing \
  --bf16 \
  --use_liger_kernel \
  --output_dir data/OpenR1-Distill-7B

By providing the exact data generation scripts, training configurations, and evaluation harnesses, Open R1 transitions reasoning models from black-box APIs into reproducible engineering workflows. Developers can now inspect the underlying reasoning traces, modify the reward functions, and train custom, domain-specific reasoning models on their own hardware.

Sources & further reading

  1. Open Reproduction of DeepSeek-R1 — github.com
Mariana Souza
Written by
Mariana Souza · Senior Editor

Mariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon.

Discussion 0

Join the discussion

Sign in or create an account to comment and vote.

No comments yet

Be the first to weigh in.

Related Reading