Zum Inhalt

Training Jobs

The Training Jobs module in DeepExtension allows users to fine-tune language models through a fully visual, no-code interface. It supports various fine-tuning strategies, integrates seamlessly with your datasets, and provides transparent monitoring and comparison tools throughout the training lifecycle.


Start a New Train

To begin a new training job:

  1. Click "Start a New Train" on the Model Training page.
  2. Select a fine-tuning method from those defined in Training Method Management.
  3. Fill in the required parameters as described below.

Parameter Reference Table

Parameter Name Meaning Typical Values / Range
Base Model / MODEL_PATH Foundation model to fine-tune e.g., Qwen1.5-7B, LLaMA2, registered in Base Models
Dataset / DATASET_PATH Training data to be used Uploaded dataset in JSONL format
LORA_RANK Rank of LoRA adapter matrix 4, 8, 16
LOAD_IN_4BIT Whether to use 4-bit quantization during training true or false
MAX_SEQ_LENGTH Maximum sequence length for the model 512, 1024, 2048
MAX_INPUT_LENGTH Max token length for input prompt 256 – 2048
MAX_CONTENT_LENGTH Max token length for training content 256 – 2048
MAX_SAMPLES Max number of training samples to load e.g., 1000, 5000, -1 for all
NUM_GENERATIONS Number of generations for reward/validation 1 – 10
MAX_GRAD_NORM Gradient clipping norm 0.5 – 5.0
EPOCHS Number of full dataset iterations 1 – 50 (used if MAX_STEPS not provided)
MAX_STEPS Total training steps 100 – 50000
BATCH_SIZE Number of samples per batch 1 – 64
GRAD_ACCUM_STEPS Gradient accumulation steps 1 – 16
LEARNING_RATE Initial learning rate e.g., 1e-4, 5e-5, 2e-5
WARMUP_STEPS Warmup steps before full learning rate e.g., 0 – 5000
WARMUP_RATIO Proportion of total steps used for warmup 0.01 – 0.2 (used if WARMUP_STEPS not provided)
OUTPUT_DIR Where to store training output (system-handled) Auto-generated by system
PromptInputColumn Dataset column used as input prompt (e.g., the "question") e.g., question, instruction, query
PromptOutputColumn Dataset column used as expected output (e.g., the "answer") e.g., answer, response, completion

Most of these parameters are passed directly to your training logic. To customize their use, refer to Implement Your Own Training.

Once you click "Run the Train", the job will be submitted and processed in background mode on your backend infrastructure.


View Train Details & Monitor Progress

To inspect an existing training job:

  1. Go to the Model Training main page.
  2. Click "View Details" on a training job.

You will see three tabs:

  • Train Overview: Shows all training parameters used
  • Evaluation Data: Real-time visualizations of loss curves, reward scores, and performance metrics
  • Train Log: Raw logs generated during training for debugging or auditing

Copy a Train

To quickly replicate and tweak a previous training job:

  1. While viewing a training job’s Train Overview tab, scroll to the bottom.
  2. Click "Copy the Train".
  3. This will pre-fill the new job form with the exact same configuration — allowing you to make minimal changes (e.g., dataset or learning rate) and run a comparable training job immediately.

This is ideal for A/B testing or iterative improvements.


Training Comparison

You can compare multiple training jobs side-by-side:

  • Select any two or more jobs on the main page
  • Click "Compare" to see configuration, performance metrics, and outcomes all in one view

DeepExtension — Simplifying the full model training lifecycle for enterprise AI