Training Jobs¶
The Training Jobs module in DeepExtension allows users to fine-tune language models through a fully visual, no-code interface. It supports various fine-tuning strategies, integrates seamlessly with your datasets, and provides transparent monitoring and comparison tools throughout the training lifecycle.
Start a New Train¶
To begin a new training job:
- Click "Start a New Train" on the Model Training page.
- Select a fine-tuning method from those defined in Training Method Management.
- Fill in the required parameters as described below.
Parameter Reference Table¶
| Parameter Name | Meaning | Typical Values / Range |
|---|---|---|
Base Model / MODEL_PATH |
Foundation model to fine-tune | e.g., Qwen1.5-7B, LLaMA2, registered in Base Models |
Dataset / DATASET_PATH |
Training data to be used | Uploaded dataset in JSONL format |
LORA_RANK |
Rank of LoRA adapter matrix | 4, 8, 16 |
LOAD_IN_4BIT |
Whether to use 4-bit quantization during training | true or false |
MAX_SEQ_LENGTH |
Maximum sequence length for the model | 512, 1024, 2048 |
MAX_INPUT_LENGTH |
Max token length for input prompt | 256 – 2048 |
MAX_CONTENT_LENGTH |
Max token length for training content | 256 – 2048 |
MAX_SAMPLES |
Max number of training samples to load | e.g., 1000, 5000, -1 for all |
NUM_GENERATIONS |
Number of generations for reward/validation | 1 – 10 |
MAX_GRAD_NORM |
Gradient clipping norm | 0.5 – 5.0 |
EPOCHS |
Number of full dataset iterations | 1 – 50 (used if MAX_STEPS not provided) |
MAX_STEPS |
Total training steps | 100 – 50000 |
BATCH_SIZE |
Number of samples per batch | 1 – 64 |
GRAD_ACCUM_STEPS |
Gradient accumulation steps | 1 – 16 |
LEARNING_RATE |
Initial learning rate | e.g., 1e-4, 5e-5, 2e-5 |
WARMUP_STEPS |
Warmup steps before full learning rate | e.g., 0 – 5000 |
WARMUP_RATIO |
Proportion of total steps used for warmup | 0.01 – 0.2 (used if WARMUP_STEPS not provided) |
OUTPUT_DIR |
Where to store training output (system-handled) | Auto-generated by system |
PromptInputColumn |
Dataset column used as input prompt (e.g., the "question") | e.g., question, instruction, query |
PromptOutputColumn |
Dataset column used as expected output (e.g., the "answer") | e.g., answer, response, completion |
Most of these parameters are passed directly to your training logic. To customize their use, refer to Implement Your Own Training.
Once you click "Run the Train", the job will be submitted and processed in background mode on your backend infrastructure.
View Train Details & Monitor Progress¶
To inspect an existing training job:
- Go to the Model Training main page.
- Click "View Details" on a training job.
You will see three tabs:
- Train Overview: Shows all training parameters used
- Evaluation Data: Real-time visualizations of loss curves, reward scores, and performance metrics
- Train Log: Raw logs generated during training for debugging or auditing
Copy a Train¶
To quickly replicate and tweak a previous training job:
- While viewing a training job’s Train Overview tab, scroll to the bottom.
- Click "Copy the Train".
- This will pre-fill the new job form with the exact same configuration — allowing you to make minimal changes (e.g., dataset or learning rate) and run a comparable training job immediately.
This is ideal for A/B testing or iterative improvements.
Training Comparison¶
You can compare multiple training jobs side-by-side:
- Select any two or more jobs on the main page
- Click "Compare" to see configuration, performance metrics, and outcomes all in one view
DeepExtension — Simplifying the full model training lifecycle for enterprise AI