Zum Inhalt

DeepExtension Dokumentation

Schnellstart: Modellbewertung mit Vision-Modellen und Multi-Image-Vergleichsdatensätzen

Quick Start: Model Evaluation with Vision Models and Multi-Image Comparison Datasets¶

1. Prepare Your Dataset¶

Download sample dataset:
Location: Tutorials → Quick Start → Run Your First Training
Or use custom dataset:
Format requirements: Follow User Guide → Dataset Management → Multimodal Datasets specifications

Tips:
Our dataset features paired original and modified images (generated by Black Forest Labs) with corresponding annotations and generation prompts in the generation field.

2. Upload Dataset¶

Log in to DeepExtension platform
Navigate to Dataset Management page
Click Upload Dataset button
Select prepared dataset file
Click Submit and check the result

3. Create Evaluation Task¶

Go to Model Evaluation page
Click New Evaluation Task

Configuration:¶

Evaluation Mode: Select Referee Model Mode
Dataset: Choose uploaded multimodal dataset
Model Selection:
- Model A
- Model B
- Referee Model (for judgment)
Prompt Configuration:
- User prompt (example):
```
Compare these two images and describe the differences
```
- Ensure both image placeholders are included
- Use identical/similar content for referee prompt

4. Preview & Submit¶

You can click Preview to verify configuration
If you are satisfy with the preview information,you can confirm settings and click Submit Evaluation
You can monitor progress via logs section

5. Analyze Results¶

Click View Results after completion
Key sections:
- Parameter Details: Verify configuration
- System Logs: Check execution records
- Result Comparison: Analyze model outputs and referee feedback
Available actions:
- Download CSV results (from list page)

Quick Testing Tips¶

Access DeepText feature
Workflow:
- Select vision-language model
- Upload two comparison images
- Enter prompt (e.g., "Describe image differences")
View real-time model inferences

Important Notes¶

Token usage: Preview helps avoid waste
Image format: Ensure platform compatibility
Referee model: Recommended to be stronger than test models