DeepSeek-R1: OpenAI o1-Level Performance at Just 10% of the Cost
DeepSeek R1 offers open-source AI at 90% less cost than proprietary models, matching top-tier LLM performance—truly ideal for real-world business success.
By Vladimir DamovCategory: AI & AutomationA new contender is stepping onto the global AI stage from Hangzhou, China: DeepSeek-R1. Developed by DeepSeek, an AI research lab under the hedge fund High-Flyer, DeepSeek-R1 stands toe-to-toe with OpenAI’s o1 model—but with drastically reduced training and inference expenses. Even more notably, it’s fully open-source under the MIT license, allowing organizations to adapt it to their specific use cases—an option not available for models like GPT-4 or o1.
DeepSeek-R1 is a first-generation reasoning model that uses large-scale reinforcement learning (RL) from the start, rather than relying on supervised fine-tuning (SFT) as a prerequisite. This RL-first approach unlocks diverse chain-of-thought capabilities—such as self-verification and reflection—and fuels advanced performance in math, code, and general reasoning tasks.
Cost, Licensing, and Origin
According to DeepSeek’s release data, the output token cost for DeepSeek-R1 is just $2.19 per million tokens, whereas o1 charges $60 for the same token count. On the input side, if the API cache is hit, DeepSeek charges $0.14 per million tokens, dwarfing o1’s $7.5. These figures make it attractive for teams looking to leverage powerful generative AI without committing a fortune to usage fees.
DeepSeek emerges from a background in quantitative trading—High-Flyer initially used machine learning for stock strategies, eventually founding an AI lab that focuses on advanced modeling. By 2021, all of High-Flyer’s trading strategies were powered by AI, setting the stage for DeepSeek to push the technology beyond finance.
Benchmark Comparisons
Accuracy and Reasoning
Below is a sample chart comparing DeepSeek-R1 variants to OpenAI-o1, and OpenAI-o1-mini. In tasks like MATH-500, SWE-bench Verified or AIME 2024, DeepSeek-R1 stands out with top-percentile accuracy and pass rates.
Benchmark
DeepSeek-R1
OpenAI-o1-mini
OpenAI-o1-1217
AIME 2024 (Pass@1)
79.8%
63.6%
79.2%
Codeforces (Percentile)
96.3
93.4
96.6
GPQA Diamond (Pass@1)
71.5
60.0
75.7
MATH-500 (Pass@1)
97.3%
90.0%
96.4%
MMLU (Pass@1)
90.8%
85.2%
91.8%
SWE-bench Verified (Resolved)
49.2%
41.6%
48.9%
Note: In many test results, it ties or beats OpenAI-o1, reinforcing the idea that large-scale AI excellence isn’t limited to a single region or proprietary ecosystem.
Here's a simplified explanation of each benchmark test:
AIME 2024 (Pass@1): Tests how well the model solves advanced math problems from the American Invitational Mathematics Examination (AIME). Pass@1 means solving correctly on the first try.
Codeforces (Percentile): Measures how well the model performs in competitive programming challenges on the Codeforces platform. Percentile indicates how its performance compares to other participants.
GPQA Diamond (Pass@1): Tests the model's ability to answer complex general-knowledge questions accurately, with Pass@1 meaning it answers correctly on the first try.
MATH-500 (Pass@1): Assesses the model’s ability to solve college-level math problems, focusing on correctness on the first attempt.
MMLU (Pass@1): Evaluates the model's knowledge across various university-level subjects and its ability to answer questions correctly on the first try.
SWE-bench Verified (Resolved): Tests the model's skills in resolving software engineering tasks, specifically verifying whether the solutions it provides are correct and functional.
Operational Savings and Throughput
DeepSeek-R1 invests heavily in reinforcement learning at scale—with minimal reliance on supervised fine-tuning—and features architectural optimizations to reduce memory footprint. This lowers training overhead by over 40% compared to older DeepSeek models, while decreasing KV cache usage and boosting throughput, much like its sibling DeepSeek-V3.
Why It Matters
Openness Fuels Innovation With an MIT license, DeepSeek-R1 lets you customize the model for your exact needs—an approach typically off-limits for proprietary offerings. From domain-specific optimizations to full-scale modifications, the flexibility here is a game-changer for teams that want to stay in control.
Sustainable Costs DeepSeek-R1 disrupts the pay-per-token model by charging around 90% less than o1. That makes scaling AI workloads affordable, eliminating the sticker shock associated with large-scale inference and fine-tuning.
Proven Real-World Potential Originating from a hedge fund’s successful machine learning strategies, DeepSeek-R1 isn’t just a lab experiment—it’s battle-tested in finance, mathematics, code generation, and more. This track record underscores its readiness for production deployments beyond academia.
Distilled Variants
DeepSeek-R1 includes multiple distilled models, balancing performance with different parameter sizes. You can choose from 1.5B to 70B—plus the full 671B—for various use cases:
Model
Base Model
Param Size
DeepSeek-R1-Distill-Qwen-1.5B
Qwen2.5-Math-1.5B
1.5B
DeepSeek-R1-Distill-Qwen-7B
Qwen2.5-Math-7B
7B
DeepSeek-R1