5-Phase Golden Dataset Pipeline
Each phase runs asynchronously and is resumable. The pipeline processes products through text extraction, multi-model consensus, active learning triage, self-verification, and flywheel checking.
📝
Phase 1: Text Fast-Path
Extract attributes from product text (title, description). Resolves material, color, care, size, gender, age_group, origin, size_type, and material_composition. Products fully resolved here skip all VLM phases — saving time and cost.
materialcolorcaresizegenderage_grouporiginmaterial_compositionsize_type
Tip: Starting from Phase 1 runs the full pipeline. You can start from any phase to re-run specific steps. The pipeline is resumable — if interrupted, it restarts from the last checkpoint.