Long Chain-of-Thought (CoT) Feature Implementation

Overview

This implementation adds Long Chain-of-Thought (CoT) capability to the data synthesis pipeline when using DeepSeek R1 as the base model. The feature enables multi-step reasoning for enhanced context-aware responses.

Feature Description

Long CoT Mode: When enabled, the system generates synthetic data with extended reasoning chains
DeepSeek R1 Integration: Exclusive use of DeepSeek-R1 model for CoT data generation
Enhanced Training: Produces models with improved long-context reasoning capabilities

Implementation Details

Configuration Options

Backend Configuration:
- Set is_cot=True in trainprocess_service.py initialization
- Configure via train_for_user.sh with --is_cot True/False
- Environment variables in lpm_kernel/L2/.env:
```
    DEEPSEEK_MODEL_NAME=deepseek-*
    
    DEEPSEEK_API_KEY=your_api_key
    
    DEEPSEEK_BASE_URL=your_base_url
```

Data Synthesis Pipeline

Supported Data Types:
- SelfQA data
- Preference data
- Diversity data
Prompt Structure:

	<think>reasoning_content</think>
    <answer>final_content</answer>

Model Whitelisting:
- Only DeepSeek-R1 is allowed for CoT data generation

Code Changes

Modified Files:
- selfqa.py:
  - Added is_cot initialization option
  - Updated prompt templates
  - Modified response handling
- preference_QA_generate.py:
  - Added CoT support
  - Enhanced question extraction
- diversity_data_generator.py:
  - Added CoT templates
  - Updated generation logic
New Functions:
- Unified get_remote_response() function
- Enhanced logging with tqdm integration