lpm_kernel/L2/README.md
This implementation adds Long Chain-of-Thought (CoT) capability to the data synthesis pipeline when using DeepSeek R1 as the base model. The feature enables multi-step reasoning for enhanced context-aware responses.
Long CoT Mode: When enabled, the system generates synthetic data with extended reasoning chains
DeepSeek R1 Integration: Exclusive use of DeepSeek-R1 model for CoT data generation
Enhanced Training: Produces models with improved long-context reasoning capabilities
Backend Configuration:
Set is_cot=True in trainprocess_service.py initialization
Configure via train_for_user.sh with --is_cot True/False
Environment variables in lpm_kernel/L2/.env:
DEEPSEEK_MODEL_NAME=deepseek-*
DEEPSEEK_API_KEY=your_api_key
DEEPSEEK_BASE_URL=your_base_url
Supported Data Types:
SelfQA data
Preference data
Diversity data
Prompt Structure:
<think>reasoning_content</think>
<answer>final_content</answer>
Model Whitelisting:
Modified Files:
selfqa.py:
Added is_cot initialization option
Updated prompt templates
Modified response handling
preference_QA_generate.py:
Added CoT support
Enhanced question extraction
diversity_data_generator.py:
Added CoT templates
Updated generation logic
New Functions:
Unified get_remote_response() function
Enhanced logging with tqdm integration