Skip to content

Workflow for seed phenotyping of USDA rice minicore population, including statistical analysis, and GWAS (Modified from Marrano and Moyers, 2022)

Notifications You must be signed in to change notification settings

Uzezi93/Genetic-Basis-of-Seed-Trait-Variation-Rice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a8a81f0 · Feb 10, 2025

History

43 Commits
Feb 9, 2025
Feb 9, 2025
Feb 9, 2025
Feb 9, 2025
Feb 9, 2025
Feb 9, 2025
Feb 10, 2025
Feb 9, 2025
Feb 9, 2025
Feb 9, 2025
Feb 9, 2025

Repository files navigation

Genetic Basis of Rice Seed Traits for Adaptation

Germplasm collections hold diverse alleles shaped by environmental conditions at centers of origin. In Oryza sativa, seed traits impact productivity, consumer preferences, and ecological adaptation, yet their genetic basis—especially in relation to environmental variation—remains unclear. This study seeks to answer the following key questions:

  • Are there novel differences between seed trait phenotypes among O. sativa subpopulations?
  • Which genomic regions are associated with these phenotypes?
  • Are these genomic regions functionally and adaptively relevant?

Traditional seed phenotyping is labor-intensive and limited, making high-throughput alternatives essential. This study leverages PlantCV, a computer vision tool, for automated and detailed seed trait analysis, integrating it with GWAS and haplotype-environment association analysis—a largely unexplored combination. This approach enhances trait resolution, uncovering novel genetic candidates missed by conventional methods. By linking genetic variants, seed morphology, and environmental adaptation, this study aims to advance climate-smart breeding, optimizing rice varieties for resilience, productivity, and market preferences.

Fig 1. Oryza spp. Diversity

Fig 1. Oryza spp. Diversity


Results

Question 1: Are there novel differences between seed trait phenotypes among O. sativa subpopulations?

We generated twelve seed phenotypes with our phenotyping setup, including length, width, and area (Table 1). However, we classified differences observed between subpopulations for the following seed traits as novel differences:

  • Convex hull vertices
  • Convex hull area
  • Solidity
  • Longest path
  • Eccentricity

PC1 explained a larger proportion of the variance (51.7%), revealing an inverse relationship between Solidity, Seed Weight, and other traits. PC2 accounted for 27.26% of the variation (Fig 2). Significant subpopulation differences were observed for the Solidity and Minor Axis seed phenotypes (Fig. 3). Specifically, we found a significant difference between the Aus and Indica subpopulations for Solidity, and between the Aus and Aro subpopulations for the Minor Axis.

Fig 2. PCA

Fig 2. Principal Component Analysis of Seed Trait Variation

Minor Axis Anova Solidity Anova

Fig 3: Seed Trait ANOVA:
A. ANOVA for Minor Axis Variation Across Rice Subpopulations.
B: ANOVA for Solidity Variation Across Rice Subpopulations.


Question 2: Which genomic regions are associated with these seed traits?

Our GWAS analysis revealed significant SNPs on chromosomes 3, 7, 8, and 9. For downstream gene candidate identification, we focused on the LD regions on chromosomes 3 and 7. Chromosome 3 exhibited the most significant association (p-value = 7 × 10⁻⁷), while the association on chromosome 7 was identified by multiple GWAS methods. We identified 457 genes within the 1MB LD region on chromosomes 3 and 7, specifically for the Solidity seed trait (Fig. 4). From this set, we selected 100 genes within the LD of significant SNPs on these chromosomes and computed knetscores using knetminer (Fig. 5). Knetscores rank nodes in a biological network based on their relevance to a specific context.

Fig 4. GWAS Solidity Solidity Definition

Fig 4: GWAS for Solidity Seed Trait:
A. Manhattan plot showing significant associations on chromosomes 3, 7, 8, and 9.
B: Solidity phenotype (Solidity = Seed Area/Convex Hull Area; Ratios = 1 (Rounder seeds) and Ratios < 1 (More slender seeds)).


Question 3: Are these genomic regions likely of any functional and adaptive relevance?

On chromosome 3, OsMTN3, also known as SWEET12, had the highest knetscore of 53.6. OsMTN3 plays a crucial role in sucrose transport during early grain filling, and its disruption leads to defective grain filling and reduced seed size (Ma et al., 2017).

On chromosome 7, OsSIS7 had the highest knetscore of 98.9 (Fig. 5). OsSIS7 has been implicated in the regulation of seed and grain size in rice. Studies have shown that mutations in OsSIS7 can lead to alterations in seed size, which is a critical trait for yield improvement. The gene's role in seed development is linked to its influence on cell division and expansion processes, which are crucial for determining final seed size (Zhang et al., 2013).

Fig 5. chr3 network chr7 network

Fig 5: Network map of OsMTN3 and OsSIS7 genes: Both genes were biologically relevant in molecular pathways involved in regulating seed and reproductive development, including size, weight, and pollen development.
A. OsMTN3.
B. OsSIS7.

Fig 6. MTN3_Spatiotemporal_Expression SIS7_Spatiotemporal_Expression

Fig 6: Spatiotemporal expression of OsMTN3 and OsSIS7 genes.
A. OsMTN3 is highly expressed in young seeds (S2 and S3 stage).
B. OsSIS7 is highly expressed in young seeds (S1 and S2 stage)

We are currently exploring the relationship between the haplotype fitness these identified genes and bioclimatic factors using multivariate analysis (Fig. 7). This will help infer how genotype-by-environment (GxE) interactions drive seed trait variation in O. sativa. Climate data for rice-growing regions will be retrieved from the BIOCLIM dataset, and we are using the 3K rice genome dataset for our haplotype analysis.

Fig 7. PCA

Fig 6. Methodology for Gene Haplotype x Environmental Interaction


Conclusion

Solidity reflects both seed-specific traits and broader aspects of plant reproduction and development. This study highlights the potential of computer vision as a powerful tool for identifying pleiotropic genes and unraveling complex genetic interactions.


Methods

Plant Material

We used the USDA Mini-Core (MC) collection, comprising 217 Oryza spp. accessions from five major subpopulations. These were selected from 1794 accessions across 114 countries and deposited in the Genetic Stock Oryza collection (GSOR) in 2007 (USDA ARS, 2023).

Seed Phenotyping

We scanned 201 O. sativa seed samples using an Epson V600 scanner with a standardized setup, including a black tray, ruler, and color standards. Each scan contained ~50 well-spaced seeds to minimize measurement errors. Images were named systematically for compatibility with PlantCV. Image processing was performed on a personal computer (Intel® Core™ i5-1035G1, 7.6 GB RAM), using a modified PlantCV Python pipeline to standardize RGB images, segment seeds, and measure 12 traits per seed. Processed data and scripts are available on GitHub.

Table 1. Summary of 12 traits analyzed (Gehan et al., 2017; Marrano & Moyers, 2022)

Trait Scale/Description Interpretation
Area Pixels – Total number of pixels in a seed image Seed size; seed area
Convex hull area Pixels – Total number of pixels in a seed convex hull Seed shape and size; size of seed containing convex boundary (bigger seeds will require bigger convex hull)
Convex hull vertices Integer – Number of convex hull vertices Seed shape: A convex hull with more vertices indicates seeds with unusual shapes.
Solidity Ratio – Seed area/Convex hull area Seed shape; ratio of the grain area to the convex hull drawn around it. Values below 1 indicate bigger convex hull areas and possibly more slender seeds.
Perimeter Pixels – Total length of pixels around the seed image Seed size and shape; seed outline
Width Pixels – Total span of seed pixels along the x-axis Seed size: the width of the scanned seed measured on the x-axis of the seed image.
Length Pixels – Total span of seed pixels along the y-axis Seed size: the length of the scanned seed measured on the y-axis of the seed image.
Longest path Pixels - Total pixels along the longest path between convex hull vertices through the center of mass Seed size and shape; measurement outlier values can indicate seeds with unusual shape.
Ellipse major axis Pixels – Total pixel length along the major axis of the ellipse Seed size: this is a proxy for seed length.
Ellipse minor axis Pixels – Total pixel length along the minor axis of the ellipse Seed size: this is a proxy for seed width.
Ellipse angle Degrees – Angle of rotation of the ellipse major axis This is the orientation of seeds on the scanner.
Ellipse eccentricity 0-1 scale – Eccentricity of the bounding ellipse Seed shape: this is an estimate of the degree of roundness of the seed objects. Values range from 0 (rounder seeds) to 1 (perfect ellipse or more slender seeds).

Data Analysis

Question 1: Novel Differences in Seed Traits Among O. sativa Subpopulations

We identified novel differences in seed traits among O. sativa subpopulations, focusing on convex hull vertices, convex hull area, solidity, longest path, and eccentricity. Traits like perimeter, area, length, and width were excluded as they have been extensively studied.

  • All analyses were conducted in R v4.2.3.
  • Seed images were converted from pixels to millimeters for comparability.
  • Outlier filtering was applied to remove extreme values due to scanning errors.
  • Principal Component Analysis (PCA) was performed to understand trait variation.
  • ANOVA was used for pairwise comparisons of traits among subpopulations.

Question 2: Genomic Regions Associated with Seed Traits Genome-Wide Association Study (GWAS)

  • Conducted GWAS using 40,866 high-density SNP markers.
  • Used MLM, MLMM, and FarmCPU models in R GAPIT to identify significant SNPs (FDR ≤ 0.05).
  • Four SNPs on chromosomes 3, 7, 8, and 9 were associated with seed traits.
  • Linkage Disequilibrium (LD) decay analysis was performed using PLINK 2.0 to determine mapping resolution.

Question 3: Functional and Adaptive Relevance of Genomic Regions Candidate Gene Identification

  • Genes within a 1 MB LD window surrounding significant SNPs were identified using the MSUv7 Nipponbare genome.
  • Spatiotemporal expression data from the Rice ePlant database was used to infer functional relevance.
  • Conducted haplotype analysis on seed trait genes using the 3K Rice Genome dataset.
  • SNP data was retrieved from SNP-seek, with haplotypes.
  • R pegas package was used to construct haplotype networks.

Fig 7. Method workflow

Phenotyping setup

Follow-Up Analysis

  • Investigate how haplotypes interact with climatic and agroecological variables to shape seed traits.
  • Climate data will be retrieved from BIOCLIM, using landrace locations from the Genesys database.
  • Multivariate analysis will explore haplotype-by-environment interactions driving seed trait variation in O. sativa.

References

About

Workflow for seed phenotyping of USDA rice minicore population, including statistical analysis, and GWAS (Modified from Marrano and Moyers, 2022)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages