BreedingAIDB Release 1.0

Release 1.0 comprises 143,477 rice G2P paired data, 284,395 soybean G2P paired data and 12,654 maize G2P paired data. It also provides three core functional modules: Feature Extraction, Phenotype Prediction, and ML Project.

rice-crop

About

We are dedicated to collecting, organizing, and sharing genome-to-phenotype pair data for various crops, aiming to support research in ML for breeding. we hope to develop or integrate various tools to advance the application of ML in breeding. Our aspiration is for BreedingAIDB to serve as a bridge connecting researchers across different fields, facilitating the integration of machine learning into breeding technology for enhanced agricultural production.

Crops in BreedingAIDB

Rice

Rice

129, 449 rice genome-to-phenotype paired data available here.

Soybean

Soybean

284, 395 soybean genome-to-phenotype paired data available here.

maize

Maize

16, 284 maize genome-to-phenotype paired data available here.

Type of G2P Data

GVCF-to-phenotype paired data

  • Unstructured data
  • GVCF files cover all loci in the genome

VCF-to-phenotype paired data

  • Unstructured data
  • VCF files contain high-quality SNP information

gsctool-to-phentype paired data

  • Structured data
  • The gsctool features are generated by GSCtool

ML Tools

feature extraction

Feature Extraction

The module utilizes GSCtool to extract essential genomic features.

Phenotype Prediction

Phenotype Prediction

The module offers predictions for grain length and width of rice, which can be easily done by uploading the feature files extracted by GSCtool

ML Project

ML Project

The module integrates Optuna and lightGBM to allow users to customize and optimize ML models for their specific research needs.