Training Jobs¶

The Training Jobs module in DeepExtension allows users to fine-tune language models through a fully visual, no-code interface. It supports various fine-tuning strategies, integrates seamlessly with your datasets, and provides transparent monitoring and comparison tools throughout the training lifecycle.

Start a New Train¶

To begin a new training job:

Click "Start a New Train" on the Model Training page.
Select a fine-tuning method from those defined in Training Method Management.
Fill in the required parameters as described below.

Parameter Reference Table¶

Parameter Name	Meaning	Typical Values / Range
`Base Model` / `MODEL_PATH`	Foundation model to fine-tune	e.g., `Qwen1.5-7B`, `LLaMA2`, registered in Base Models
`Dataset` / `DATASET_PATH`	Training data to be used	Uploaded dataset in JSONL format
`LORA_RANK`	Rank of LoRA adapter matrix	4, 8, 16
`LOAD_IN_4BIT`	Whether to use 4-bit quantization during training	`true` or `false`
`MAX_SEQ_LENGTH`	Maximum sequence length for the model	512, 1024, 2048
`MAX_INPUT_LENGTH`	Max token length for input prompt	256 – 2048
`MAX_CONTENT_LENGTH`	Max token length for training content	256 – 2048
`MAX_SAMPLES`	Max number of training samples to load	e.g., 1000, 5000, `-1` for all
`NUM_GENERATIONS`	Number of generations for reward/validation	1 – 10
`MAX_GRAD_NORM`	Gradient clipping norm	0.5 – 5.0
`EPOCHS`	Number of full dataset iterations	1 – 50 (used if `MAX_STEPS` not provided)
`MAX_STEPS`	Total training steps	100 – 50000
`BATCH_SIZE`	Number of samples per batch	1 – 64
`GRAD_ACCUM_STEPS`	Gradient accumulation steps	1 – 16
`LEARNING_RATE`	Initial learning rate	e.g., 1e-4, 5e-5, 2e-5
`WARMUP_STEPS`	Warmup steps before full learning rate	e.g., 0 – 5000
`WARMUP_RATIO`	Proportion of total steps used for warmup	0.01 – 0.2 (used if `WARMUP_STEPS` not provided)
`OUTPUT_DIR`	Where to store training output (system-handled)	Auto-generated by system
`PromptInputColumn`	Dataset column used as input prompt (e.g., the "question")	e.g., `question`, `instruction`, `query`
`PromptOutputColumn`	Dataset column used as expected output (e.g., the "answer")	e.g., `answer`, `response`, `completion`

Most of these parameters are passed directly to your training logic. To customize their use, refer to Implement Your Own Training.

Once you click "Run the Train", the job will be submitted and processed in background mode on your backend infrastructure.

View Train Details & Monitor Progress¶

To inspect an existing training job:

Go to the Model Training main page.
Click "View Details" on a training job.

You will see three tabs:

Train Overview: Shows all training parameters used
Evaluation Data: Real-time visualizations of loss curves, reward scores, and performance metrics
Train Log: Raw logs generated during training for debugging or auditing

Copy a Train¶

To quickly replicate and tweak a previous training job:

While viewing a training job’s Train Overview tab, scroll to the bottom.
Click "Copy the Train".
This will pre-fill the new job form with the exact same configuration — allowing you to make minimal changes (e.g., dataset or learning rate) and run a comparable training job immediately.

This is ideal for A/B testing or iterative improvements.

Training Comparison¶

You can compare multiple training jobs side-by-side:

Select any two or more jobs on the main page
Click "Compare" to see configuration, performance metrics, and outcomes all in one view

DeepExtension — Simplifying the full model training lifecycle for enterprise AI