Pipeline

Offline

5-Phase Golden Dataset Pipeline

Each phase runs asynchronously and is resumable. The pipeline processes products through text extraction, multi-model consensus, active learning triage, self-verification, and flywheel checking.

📝

Phase 1: Text Fast-Path

Extract attributes from product text (title, description). Resolves material, color, care, size, gender, age_group, origin, size_type, and material_composition. Products fully resolved here skip all VLM phases — saving time and cost.

materialcolorcaresizegenderage_grouporiginmaterial_compositionsize_type

Tip: Starting from Phase 1 runs the full pipeline. You can start from any phase to re-run specific steps. The pipeline is resumable — if interrupted, it restarts from the last checkpoint.

Run History