DeepSeek-R1: OpenAI o1-Level Performance at Just 10% of the Cost

DeepSeek R1 offers open-source AI at 90% less cost than proprietary models, matching top-tier LLM performance—truly ideal for real-world business success.

By Vladimir DamovCategory: AI & Automation

A new contender is stepping onto the global AI stage from Hangzhou, China: DeepSeek-R1. Developed by DeepSeek, an AI research lab under the hedge fund High-Flyer, DeepSeek-R1 stands toe-to-toe with OpenAI’s o1 model—but with drastically reduced training and inference expenses. Even more notably, it’s fully open-source under the MIT license, allowing organizations to adapt it to their specific use cases—an option not available for models like GPT-4 or o1.

DeepSeek-R1 is a first-generation reasoning model that uses large-scale reinforcement learning (RL) from the start, rather than relying on supervised fine-tuning (SFT) as a prerequisite. This RL-first approach unlocks diverse chain-of-thought capabilities—such as self-verification and reflection—and fuels advanced performance in math, code, and general reasoning tasks.

Cost, Licensing, and Origin

According to DeepSeek’s release data, the output token cost for DeepSeek-R1 is just $2.19 per million tokens, whereas o1 charges $60 for the same token count. On the input side, if the API cache is hit, DeepSeek charges $0.14 per million tokens, dwarfing o1’s $7.5. These figures make it attractive for teams looking to leverage powerful generative AI without committing a fortune to usage fees.

DeepSeek emerges from a background in quantitative trading—High-Flyer initially used machine learning for stock strategies, eventually founding an AI lab that focuses on advanced modeling. By 2021, all of High-Flyer’s trading strategies were powered by AI, setting the stage for DeepSeek to push the technology beyond finance.

Benchmark Comparisons

Accuracy and Reasoning

Below is a sample chart comparing DeepSeek-R1 variants to OpenAI-o1, and OpenAI-o1-mini. In tasks like MATH-500, SWE-bench Verified or AIME 2024, DeepSeek-R1 stands out with top-percentile accuracy and pass rates.

Benchmark

DeepSeek-R1

OpenAI-o1-mini

OpenAI-o1-1217

AIME 2024 (Pass@1)

79.8%

63.6%

79.2%

Codeforces (Percentile)

96.3

93.4

96.6

GPQA Diamond (Pass@1)

71.5

60.0

75.7

MATH-500 (Pass@1)

97.3%

90.0%

96.4%

MMLU (Pass@1)

90.8%

85.2%

91.8%

SWE-bench Verified (Resolved)

49.2%

41.6%

48.9%

Note: In many test results, it ties or beats OpenAI-o1, reinforcing the idea that large-scale AI excellence isn’t limited to a single region or proprietary ecosystem.

Here's a simplified explanation of each benchmark test:

AIME 2024 (Pass@1): Tests how well the model solves advanced math problems from the American Invitational Mathematics Examination (AIME). Pass@1 means solving correctly on the first try.

Codeforces (Percentile): Measures how well the model performs in competitive programming challenges on the Codeforces platform. Percentile indicates how its performance compares to other participants.

GPQA Diamond (Pass@1): Tests the model's ability to answer complex general-knowledge questions accurately, with Pass@1 meaning it answers correctly on the first try.

MATH-500 (Pass@1): Assesses the model’s ability to solve college-level math problems, focusing on correctness on the first attempt.

MMLU (Pass@1): Evaluates the model's knowledge across various university-level subjects and its ability to answer questions correctly on the first try.

SWE-bench Verified (Resolved): Tests the model's skills in resolving software engineering tasks, specifically verifying whether the solutions it provides are correct and functional.

Operational Savings and Throughput

DeepSeek-R1 invests heavily in reinforcement learning at scale—with minimal reliance on supervised fine-tuning—and features architectural optimizations to reduce memory footprint. This lowers training overhead by over 40% compared to older DeepSeek models, while decreasing KV cache usage and boosting throughput, much like its sibling DeepSeek-V3.

Why It Matters

Openness Fuels Innovation With an MIT license, DeepSeek-R1 lets you customize the model for your exact needs—an approach typically off-limits for proprietary offerings. From domain-specific optimizations to full-scale modifications, the flexibility here is a game-changer for teams that want to stay in control.

Sustainable Costs DeepSeek-R1 disrupts the pay-per-token model by charging around 90% less than o1. That makes scaling AI workloads affordable, eliminating the sticker shock associated with large-scale inference and fine-tuning.

Proven Real-World Potential Originating from a hedge fund’s successful machine learning strategies, DeepSeek-R1 isn’t just a lab experiment—it’s battle-tested in finance, mathematics, code generation, and more. This track record underscores its readiness for production deployments beyond academia.

Distilled Variants

DeepSeek-R1 includes multiple distilled models, balancing performance with different parameter sizes. You can choose from 1.5B to 70B—plus the full 671B—for various use cases:

Model

Base Model

Param Size

DeepSeek-R1-Distill-Qwen-1.5B

Qwen2.5-Math-1.5B

1.5B

DeepSeek-R1-Distill-Qwen-7B

Qwen2.5-Math-7B

7B

DeepSeek-R1