Quick Start: Model Evaluation with Vision Models and Multi-Image Comparison Datasets¶
1. Prepare Your Dataset¶
- Download sample dataset:
- Location:
Tutorials → Quick Start → Run Your First Training - Or use custom dataset:
- Format requirements: Follow
User Guide → Dataset Management → Multimodal DatasetsspecificationsTips:
Our dataset features paired original and modified images (generated by Black Forest Labs) with corresponding annotations andgenerationprompts in the generation field.
2. Upload Dataset¶
- Log in to DeepExtension platform
- Navigate to
Dataset Managementpage - Click
Upload Datasetbutton - Select prepared dataset file
- Click
Submitand check the result
3. Create Evaluation Task¶
- Go to
Model Evaluationpage - Click
New Evaluation Task
Configuration:¶
- Evaluation Mode: Select
Referee Model Mode - Dataset: Choose uploaded multimodal dataset
-
Model Selection:
- Model A
- Model B
- Referee Model (for judgment)
-
Prompt Configuration:
- User prompt (example):
Compare these two images and describe the differences - Ensure both image placeholders are included
- Use identical/similar content for referee prompt
- User prompt (example):
4. Preview & Submit¶
- You can click
Previewto verify configuration - If you are satisfy with the preview information,you can confirm settings and click
Submit Evaluation - You can monitor progress via logs section
5. Analyze Results¶
- Click
View Resultsafter completion -
Key sections:
- Parameter Details: Verify configuration
- System Logs: Check execution records
- Result Comparison: Analyze model outputs and referee feedback
-
Available actions:
- Download CSV results (from list page)
Quick Testing Tips¶
- Access
DeepTextfeature -
Workflow:
- Select vision-language model
- Upload two comparison images
- Enter prompt (e.g., "Describe image differences")
-
View real-time model inferences
Important Notes¶
- Token usage: Preview helps avoid waste
- Image format: Ensure platform compatibility
- Referee model: Recommended to be stronger than test models