04 · Phytoset-10M Corpus

The data behind the latent code.

A unified corpus of curated, field-collected, and synthetically mixed-disease samples. Augmentations include hyperspectral channel jitter, weather-stress overlays, and CutMix-style co-pathology synthesis.

Recent Records
showing 6 of 619,932
IDCropLabelRegionDate
QX-9942-BPotatoPhytophthora · K-deficiencyAndhra Pradesh2024.11.18
QX-9941-ASoybeanCercospora SojinaMato Grosso, BR2024.11.18
QX-9940-DWheatSeptoria · Healthy (mixed)Punjab2024.11.17
QX-9939-CTomatoMosaic VirusAlmería, ES2024.11.17
QX-9938-BMaizeNorthern Leaf BlightIowa, USA2024.11.16
QX-9937-ARiceBacterial Leaf StreakMekong Delta2024.11.16
Source Composition
PlantVillage54,303 38%
PlantDoc field2,598 6%
Quoryn-Field412,109 41%
Synthetic mixed151,022 15%
Augmentation Pipeline
  1. ·RandomResizedCrop · 0.6–1.0
  2. ·Hyperspectral channel jitter
  3. ·CutMix co-pathology blend (p=0.3)
  4. ·Weather-stress overlay (sun · drought)
  5. ·Domain randomization · field vs lab
  6. ·MixUp on latent activations