Zum Inhalt

Quick Start: Model Evaluation with Vision Models and Multi-Image Comparison Datasets

1. Prepare Your Dataset

2. Upload Dataset

  1. Log in to DeepExtension platform
  2. Navigate to Dataset Management page
  3. Click Upload Dataset button
  4. Select prepared dataset file
  5. Click Submit and check the result

3. Create Evaluation Task

  1. Go to Model Evaluation page
  2. Click New Evaluation Task

Configuration:

  • Evaluation Mode: Select Referee Model Mode
  • Dataset: Choose uploaded multimodal dataset
  • Model Selection:

    • Model A
    • Model B
    • Referee Model (for judgment)
  • Prompt Configuration:

    • User prompt (example):
      Compare these two images and describe the differences
      
    • Ensure both image placeholders are included
    • Use identical/similar content for referee prompt

4. Preview & Submit

  1. You can click Preview to verify configuration
  2. If you are satisfy with the preview information,you can confirm settings and click Submit Evaluation
  3. You can monitor progress via logs section

5. Analyze Results

  1. Click View Results after completion
  2. Key sections:

    • Parameter Details: Verify configuration
    • System Logs: Check execution records
    • Result Comparison: Analyze model outputs and referee feedback
  3. Available actions:

    • Download CSV results (from list page)

Quick Testing Tips

  1. Access DeepText feature
  2. Workflow:

    • Select vision-language model
    • Upload two comparison images
    • Enter prompt (e.g., "Describe image differences")
  3. View real-time model inferences

Important Notes

  1. Token usage: Preview helps avoid waste
  2. Image format: Ensure platform compatibility
  3. Referee model: Recommended to be stronger than test models