BreedingAIDB Release 1.0
Release 1.0 comprises 143,477 rice G2P paired data, 284,395 soybean G2P paired data and 12,654 maize G2P paired data. It also provides three core functional modules: Feature Extraction, Phenotype Prediction, and ML Project.

About
We are dedicated to collecting, organizing, and sharing genome-to-phenotype pair data for various crops, aiming to support research in ML for breeding. We hope to develop or integrate various tools to advance the application of ML in breeding. Our aspiration is for BreedingAIDB to serve as a bridge connecting researchers across different fields, facilitating the integration of machine learning into breeding technology for enhanced agricultural production.
This project was supported by Biological Breeding-National Science and Technology Major Project (2023ZD04076).
Please cite us: Zijie Shen, Enhui Shen, Kun Yang, Zuoqian Fan, Qian-Hao Zhu, Longjiang Fan, Chu-Yu Ye. 2024. BreedingAIDB: a database integrating crop genome-to-phenotype paired data with machine learning tools applicable in breeding. Plant Communications, 5(7):100894.
Type of G2P Data
GVCF-to-phenotype paired data
- Unstructured data
- GVCF files cover all loci in the genome
VCF-to-phenotype paired data
- Unstructured data
- VCF files contain high-quality SNP information
gsctool-to-phentype paired data
- Structured data
- The gsctool features are generated by GSCtool