Key concept of DeepSeek Part 1
DeepSeek has entered the spotlight since before Chinese New Year, and a month ago I had never even heard of the name “DeepSeek Quantification”, but now its popularity has reached a level where it’s even competing between countries. Amidst this frenzy, aside from technical discussions, there are numerous articles with biased opinions and political correctness.
I think this is a great opportunity to exercise my critical thinking skills in navigating through complex data information and biased opinions. I aim to establish a rational understanding of DeepSeek by going back to its original information — the open-source papers and reports on DeepSeek.
My research plan for DeepSeek can be divided into several parts:
- What is RLM (Reasoning Language Model)?
- Comparison between System 1 thinking and System 2 thinking
- What is the Reasoning Schema?
- Technical architecture and research outcomes of DeepSeek V3
- Technologies and research methods used in DeepSeek R1
- Analysis of originality and inspiration from DeepSeek
This will be a series of articles. Stay tuned.
RLM
Reasoning Language Model (RLM) can be defined as an artificial intelligence model that combines the generation capabilities of large language models (LLMs) with advanced reasoning mechanisms. RLMs introduce structured reasoning processes, enabling complex logical analysis and step-by-step reasoning to solve deeply thought-provoking problems.
Definition
Reasoning Language Model (RLM) is a high-level artificial intelligence model that combines the generation capabilities of large language models (LLMs) with advanced reasoning mechanisms such as reinforcement learning (RL), Monte Carlo tree search (MCTS), and others, to achieve systematic and structured reasoning for complex problems. RLMs can perform multi-step logical analysis, generate high-quality inference paths, and continuously optimize their reasoning strategies through self-learning.
Core Components
Reasoning Structure (Reasoning Structure):
- Definition: The reasoning structure is the organizational form of RLM’s inference process, typically represented as a tree-like, chain-like, or graph-like structure. Each node represents an inference step, and connections between nodes represent inference paths.
- Functionality: Reasoning structures help models systematically explore possible solutions, gradually building complete inference paths.
Policy Model (Policy Model):
- Definition: The policy model is a neural network based on LLMs that generates new inference steps. It predicts the next most likely inference step based on the current state of the reasoning process.
- Functionality: The policy model drives the progression of the inference process by generating new inference steps, embodying System 1 Thinking and relying on pattern matching and statistical rules.
Value Model (Value Model):
- Definition: The value model is a neural network based on LLMs that evaluates the quality of inference paths. It predicts the expected cumulative reward for an inference path starting from the current node.
- Functionality: The value model helps models choose the most promising inference paths and optimize the inference process, embodying System 2 Thinking and relying on global evaluation of inference paths.
Reasoning Strategy (Reasoning Strategy):
- Definition: A reasoning strategy is an algorithm that guides how the reasoning structure evolves in RLMs. Common reasoning strategies include Monte Carlo tree search (MCTS), beam search, and ensemble methods.
- Functionality: The reasoning strategy balances exploration and exploitation to optimize the selection and extension of inference paths.
Training Mechanism (Training Mechanism):
- Definition: The training mechanism for RLMs includes supervised learning (Supervised Fine-Tuning, SFT) and reinforcement learning (Reinforcement Learning, RL). SFT trains policy models and value models to generate and evaluate high-quality inference steps. RL further optimizes the model’s reasoning strategy through interaction with the environment.
- Functionality: The training mechanism optimizes policy models and value models to improve the model’s inference capabilities and generalizability.
Working Principle
Inference Process (Inference Process):
- User Input: Users provide a problem or task description as the starting point for reasoning.
Reasoning Structure Construction
- Reasoning Structure Construction: The model constructs a reasoning structure, typically represented as a tree-like structure, where each node represents a reasoning step.
- Strategy Model Generation: The strategy model generates new reasoning steps to extend the reasoning structure.
- Value Model Evaluation: The value model evaluates the quality of the reasoning path, helping the model choose the most promising path.
- Termination Step: When reaching the termination step, the reasoning process ends, forming the final answer.
- Training Process:
- Supervised Fine-Tuning (SFT): Using annotated data to train strategy and value models to generate and evaluate high-quality reasoning steps.
- Reinforcement Learning (RL): Through interaction with the environment, optimizing the model’s reasoning strategy to better choose and evaluate reasoning paths.
- Self-Learning: The model generates new data through simulated reasoning processes, which are used to retrain the model, further optimizing its reasoning ability.
- Data Generation:
- Independent of User Requests: Data generation is independent of user requests, generating new training data through simulated reasoning processes.
- Diversity: Generating diverse reasoning paths to ensure the model can handle various complex scenarios.
Features
- Combination of System 1 and System 2: RLMs combine LLM’s fast generation capabilities (System 1 Thinking) with advanced reasoning mechanisms (System 2 Thinking), achieving efficient solution to complex problems.
- Structured Reasoning: By building a reasoning structure and adopting reasoning strategies, RLMs can perform multi-step logical analysis and generate high-quality reasoning paths.
- Self-Learning Ability: Through self-learning and reinforcement learning, RLMs can continuously optimize their reasoning strategy, improving reasoning ability and generalizability.
- Modular Design: RLM’s architecture supports modular design, facilitating experimentation and optimization, and adapting to different task requirements.
Summary
Reasoning Language Model (RLM) is a high-level artificial intelligence model that combines large language models’ generation capabilities with advanced reasoning mechanisms. By constructing a reasoning structure, adopting reasoning strategies, and optimizing training mechanisms, RLMs can perform complex logical analysis and step-by-step reasoning to solve problems requiring deep thinking. This model not only improves reasoning ability but also enhances generalizability and adaptability, providing a new approach for solving complex problems.