Germplasm collections hold diverse alleles shaped by environmental conditions at centers of origin. In Oryza sativa, seed traits impact productivity, consumer preferences, and ecological adaptation, yet their genetic basis—especially in relation to environmental variation—remains unclear. This study seeks to answer the following key questions:
- Are there novel differences between seed trait phenotypes among O. sativa subpopulations?
- Which genomic regions are associated with these phenotypes?
- Are these genomic regions functionally and adaptively relevant?
Traditional seed phenotyping is labor-intensive and limited, making high-throughput alternatives essential. This study leverages PlantCV, a computer vision tool, for automated and detailed seed trait analysis, integrating it with GWAS and haplotype-environment association analysis—a largely unexplored combination. This approach enhances trait resolution, uncovering novel genetic candidates missed by conventional methods. By linking genetic variants, seed morphology, and environmental adaptation, this study aims to advance climate-smart breeding, optimizing rice varieties for resilience, productivity, and market preferences.
Fig 1. Oryza spp. Diversity
Question 1: Are there novel differences between seed trait phenotypes among O. sativa subpopulations?
We generated twelve seed phenotypes with our phenotyping setup, including length, width, and area (Table 1). However, we classified differences observed between subpopulations for the following seed traits as novel differences:
- Convex hull vertices
- Convex hull area
- Solidity
- Longest path
- Eccentricity
PC1 explained a larger proportion of the variance (51.7%), revealing an inverse relationship between Solidity, Seed Weight, and other traits. PC2 accounted for 27.26% of the variation (Fig 2). Significant subpopulation differences were observed for the Solidity and Minor Axis seed phenotypes (Fig. 3). Specifically, we found a significant difference between the Aus and Indica subpopulations for Solidity, and between the Aus and Aro subpopulations for the Minor Axis.
Fig 2. Principal Component Analysis of Seed Trait Variation
Fig 3: Seed Trait ANOVA:
A. ANOVA for Minor Axis Variation Across Rice Subpopulations.
B: ANOVA for Solidity Variation Across Rice Subpopulations.
Our GWAS analysis revealed significant SNPs on chromosomes 3, 7, 8, and 9. For downstream gene candidate identification, we focused on the LD regions on chromosomes 3 and 7. Chromosome 3 exhibited the most significant association (p-value = 7 × 10⁻⁷), while the association on chromosome 7 was identified by multiple GWAS methods. We identified 457 genes within the 1MB LD region on chromosomes 3 and 7, specifically for the Solidity seed trait (Fig. 4). From this set, we selected 100 genes within the LD of significant SNPs on these chromosomes and computed knetscores using knetminer (Fig. 5). Knetscores rank nodes in a biological network based on their relevance to a specific context.
Fig 4: GWAS for Solidity Seed Trait:
A. Manhattan plot showing significant associations on chromosomes 3, 7, 8, and 9.
B: Solidity phenotype (Solidity = Seed Area/Convex Hull Area; Ratios = 1 (Rounder seeds) and Ratios < 1 (More slender seeds)).
On chromosome 3, OsMTN3, also known as SWEET12, had the highest knetscore of 53.6. OsMTN3 plays a crucial role in sucrose transport during early grain filling, and its disruption leads to defective grain filling and reduced seed size (Ma et al., 2017).
On chromosome 7, OsSIS7 had the highest knetscore of 98.9 (Fig. 5). OsSIS7 has been implicated in the regulation of seed and grain size in rice. Studies have shown that mutations in OsSIS7 can lead to alterations in seed size, which is a critical trait for yield improvement. The gene's role in seed development is linked to its influence on cell division and expansion processes, which are crucial for determining final seed size (Zhang et al., 2013).
Fig 5: Network map of OsMTN3 and OsSIS7 genes: Both genes were biologically relevant in molecular pathways involved in regulating seed and reproductive development, including size, weight, and pollen development.
A. OsMTN3.
B. OsSIS7.
Fig 6: Spatiotemporal expression of OsMTN3 and OsSIS7 genes.
A. OsMTN3 is highly expressed in young seeds (S2 and S3 stage).
B. OsSIS7 is highly expressed in young seeds (S1 and S2 stage)
We are currently exploring the relationship between the haplotype fitness these identified genes and bioclimatic factors using multivariate analysis (Fig. 7). This will help infer how genotype-by-environment (GxE) interactions drive seed trait variation in O. sativa. Climate data for rice-growing regions will be retrieved from the BIOCLIM dataset, and we are using the 3K rice genome dataset for our haplotype analysis.
Fig 6. Methodology for Gene Haplotype x Environmental Interaction
Solidity reflects both seed-specific traits and broader aspects of plant reproduction and development. This study highlights the potential of computer vision as a powerful tool for identifying pleiotropic genes and unraveling complex genetic interactions.
We used the USDA Mini-Core (MC) collection, comprising 217 Oryza spp. accessions from five major subpopulations. These were selected from 1794 accessions across 114 countries and deposited in the Genetic Stock Oryza collection (GSOR) in 2007 (USDA ARS, 2023).
We scanned 201 O. sativa seed samples using an Epson V600 scanner with a standardized setup, including a black tray, ruler, and color standards. Each scan contained ~50 well-spaced seeds to minimize measurement errors. Images were named systematically for compatibility with PlantCV. Image processing was performed on a personal computer (Intel® Core™ i5-1035G1, 7.6 GB RAM), using a modified PlantCV Python pipeline to standardize RGB images, segment seeds, and measure 12 traits per seed. Processed data and scripts are available on GitHub.
Table 1. Summary of 12 traits analyzed (Gehan et al., 2017; Marrano & Moyers, 2022)
Trait | Scale/Description | Interpretation |
---|---|---|
Area | Pixels – Total number of pixels in a seed image | Seed size; seed area |
Convex hull area | Pixels – Total number of pixels in a seed convex hull | Seed shape and size; size of seed containing convex boundary (bigger seeds will require bigger convex hull) |
Convex hull vertices | Integer – Number of convex hull vertices | Seed shape: A convex hull with more vertices indicates seeds with unusual shapes. |
Solidity | Ratio – Seed area/Convex hull area | Seed shape; ratio of the grain area to the convex hull drawn around it. Values below 1 indicate bigger convex hull areas and possibly more slender seeds. |
Perimeter | Pixels – Total length of pixels around the seed image | Seed size and shape; seed outline |
Width | Pixels – Total span of seed pixels along the x-axis | Seed size: the width of the scanned seed measured on the x-axis of the seed image. |
Length | Pixels – Total span of seed pixels along the y-axis | Seed size: the length of the scanned seed measured on the y-axis of the seed image. |
Longest path | Pixels - Total pixels along the longest path between convex hull vertices through the center of mass | Seed size and shape; measurement outlier values can indicate seeds with unusual shape. |
Ellipse major axis | Pixels – Total pixel length along the major axis of the ellipse | Seed size: this is a proxy for seed length. |
Ellipse minor axis | Pixels – Total pixel length along the minor axis of the ellipse | Seed size: this is a proxy for seed width. |
Ellipse angle | Degrees – Angle of rotation of the ellipse major axis | This is the orientation of seeds on the scanner. |
Ellipse eccentricity | 0-1 scale – Eccentricity of the bounding ellipse | Seed shape: this is an estimate of the degree of roundness of the seed objects. Values range from 0 (rounder seeds) to 1 (perfect ellipse or more slender seeds). |
We identified novel differences in seed traits among O. sativa subpopulations, focusing on convex hull vertices, convex hull area, solidity, longest path, and eccentricity. Traits like perimeter, area, length, and width were excluded as they have been extensively studied.
- All analyses were conducted in R v4.2.3.
- Seed images were converted from pixels to millimeters for comparability.
- Outlier filtering was applied to remove extreme values due to scanning errors.
- Principal Component Analysis (PCA) was performed to understand trait variation.
- ANOVA was used for pairwise comparisons of traits among subpopulations.
- Conducted GWAS using 40,866 high-density SNP markers.
- Used MLM, MLMM, and FarmCPU models in R GAPIT to identify significant SNPs (FDR ≤ 0.05).
- Four SNPs on chromosomes 3, 7, 8, and 9 were associated with seed traits.
- Linkage Disequilibrium (LD) decay analysis was performed using PLINK 2.0 to determine mapping resolution.
- Genes within a 1 MB LD window surrounding significant SNPs were identified using the MSUv7 Nipponbare genome.
- Spatiotemporal expression data from the Rice ePlant database was used to infer functional relevance.
- Conducted haplotype analysis on seed trait genes using the 3K Rice Genome dataset.
- SNP data was retrieved from SNP-seek, with haplotypes.
- R pegas package was used to construct haplotype networks.
- Investigate how haplotypes interact with climatic and agroecological variables to shape seed traits.
- Climate data will be retrieved from BIOCLIM, using landrace locations from the Genesys database.
- Multivariate analysis will explore haplotype-by-environment interactions driving seed trait variation in O. sativa.
- Marrano, A., & Moyers, B. (2022). Advances in plant phenomics: From machine learning to genome-wide association studies. Plant Phenome Journal, 5(1), e20033.
- Ma, L., Zhang, D., Miao, Q., Yang, J., Xuan, Y., & Hu, Y. (2017). Essential role of sugar transporter OsMTN3 in rice seed development. Plant Physiology, 173(2), 1334-1347.
- Zhang, X., Wang, J., Huang, J., Lan, H., Wang, C., Yin, C., Wu, Y., & Tang, J. (2013). OsSIS7, a seed size regulatory gene, influences grain yield and adaptability in rice. Journal of Experimental Botany, 64(18), 5705-5718.
- The 3K Rice Genomes Project. (2014). The 3,000 rice genomes dataset. GigaScience, 3, 7.
- KnetMiner. (2023). Knowledge network-based data mining for functional genomics.
- USDA ARS. (2023). Germplasm Resources Information Network (GRIN).
- University of Toronto. (2023). Rice ePlant: A visualization tool for rice gene expression.
- SNP-Seek Database. (2023). A genomic variation database for rice.