Family-based integrative genomics identifies neural–metabolic–skeletal pathways underlying scoliosis susceptibility

Show Outline

Summary

Scoliosis is a complex spinal deformity with strong familial aggregation, yet its genetic basis remains incompletely defined. Here, we performed whole-exome sequencing (WES) in 34 scoliosis families, identifying candidate single nucleotide polymorphisms (SNPs) potentially contributing to disease susceptibility. To connect genomic variation with transcriptional regulation, we integrated these data with cis-expression quantitative trait loci (cis-eQTL) summary statistics from eQTLGen and scoliosis GWAS data from FinnGen R12 using summary-based Mendelian randomization (SMR). After harmonization and filtering (PSMR < 0.05, PHEIDI > 0.05), we identified 31 shared genes overlapping between family-based SNPs and eQTL-associated loci. Among them, LINC00299 and NPIPB11 exhibited strong colocalization signals, with LINC00299 expression inversely correlated with scoliosis risk. Functional annotation further highlighted NCAM1, AGPAT3, SMIM12, and CTSH as core pathogenic nodes linking neural development, lipid metabolism, mitochondrial energy regulation, and bone remodeling. Together, these findings reveal a multi-systemic genetic architecture of scoliosis and nominate metabolic-neuro-skeletal pathways as potential targets for precision diagnosis and therapy.

Keywords

Family-based integrative genomics,Scoliosis,Susceptibility

Introduction

Scoliosis represents a common structural deformity of the spine, affecting up to 3% of the global population, yet its etiology remains largely elusive
1,
2,
3
. Familial aggregation and twin studies strongly support a genetic component, but known risk loci explain only a fraction of disease heritability. Previous genome-wide association studies (GWAS) have identified several susceptibility regions; however, pinpointing the causal variants and their regulatory mechanisms has been hindered by the complexity of spinal development and by the polygenic nature of the disorder
4
.

To better define the genetic determinants underlying scoliosis, family-based sequencing approaches offer a complementary strategy that reduces population heterogeneity and enhances power to detect rare, high-impact variants. In parallel, advances in expression quantitative trait loci (eQTL) resources and summary-based Mendelian randomization (SMR) analysis enable systematic integration of transcriptional regulation with GWAS signals to infer putative causal relationships between gene expression and disease phenotypes.

In this study, we combined whole-exome sequencing (WES) from 34 scoliosis families with cis-eQTL data from eQTLGen and scoliosis GWAS data from FinnGen R12
5,
6,
7
. We aimed to identify genes whose regulatory variants exert causal effects on scoliosis susceptibility, characterize their biological functions, and delineate the molecular pathways connecting neural, muscular, metabolic, and skeletal systems in spinal morphogenesis. Our integrative analysis highlights novel pathogenic genes and provides a mechanistic framework for understanding the multi-system basis of scoliosis.

Methods

WES Data from Scoliosis Families

The WES data of scoliosis families included in this study were obtained from 34 scoliosis families previously collected by our research group. The study protocol was approved by the ethics committee (approval number: 20181106003-GZ2021). After obtaining informed consent, peripheral blood samples were collected, and genomic DNA was extracted. WES was performed for sequencing, and following quality control and variant filtering, candidate SNP data were generated for subsequent intersection analysis with the SMR results.Detailed sequencing data, sampling details, and systematic sample information for the 34 families are provided in Supplementary Table 1.

cis-eQTL Data

The cis-eQTL summary statistics used in this study were obtained from the eQTLGen Consortium, which provides standardized cis-eQTL summary files suitable for SMR analysis via its official website. This dataset integrates whole-blood samples from 31,684 individuals of European ancestry across 37 independent studies, offering comprehensive and high-quality cis-eQTL data
8
. The data are available at:https://www.eqtlgen.org/cis-eqtls.html.

GWAS Data

The scoliosis GWAS summary statistics were obtained from the FinnGen R12 database, which integrates genomic and health data from 500,000 Finnish biobank participants. FinnGen is a large-scale genomics initiative that has analyzed over 500,000 Finnish biobank samples and linked genetic variants to health data to understand disease mechanisms and susceptibility
9
. Case-control summary statistics for scoliosis were retrieved using the keyword “scoliosis”. The data are available at: (1) FinnGen R12 summary statistics: https://www.finngen.fi; (2) Scoliosis data:https://storage.googleapis.com/finngen-public-data-r12/summary_stats/ release/finngen_R12_M13_SCOLIOSIS.gz.

Data Preprocessing

The raw summary statistics from the FinnGen R12 scoliosis GWAS and the eQTLGen cis-eQTL data were subjected to standardized preprocessing and format harmonization. For the GWAS data, core fields including SNP identifier, effect allele, reference allele, allele frequency, effect size, standard error, and P-value were uniformly extracted, and field naming conventions were standardized. Duplicate and blank SNP records were removed to ensure unique variant identification. Total sample size information was added, and the data were converted into the outcome data format required for SMR analysis. Subsequently, the cis-eQTL and GWAS data underwent allele orientation alignment, genomic coordinate matching, and gene symbol standardization to eliminate format discrepancies and naming inconsistencies across different data sources, ensuring complete matching of variant information between the two datasets. Finally, standardized input files directly applicable to SMR colocalization analysis were generated.

SMR Analysis

Based on the principle of Mendelian Randomization (MR), this study used SMR (Windows 1.3.1) software to perform colocalization and causal inference analysis between gene expression and scoliosis. By integrating cis-eQTL and GWAS summary data, this method overcomes the limitation of traditional GWAS that can only identify genetic associations but cannot pinpoint functional causal genes. Single nucleotide polymorphisms (SNPs) significantly associated with gene expression in the cis-eQTL data were used as instrumental variables, with gene expression level (cis-eQTL) as the exposure and scoliosis (GWAS) as the outcome. The effect estimates of the same variants from the GWAS summary data were integrated to quantitatively infer the potential causal effect of gene expression on the risk of scoliosis.

The significance threshold was set at PSMR< 0.05. Since two different variants located in linkage disequilibrium (LD) may show significant mutual associations, relying solely on the significance of SMR results cannot directly distinguish true causal associations from false positives due to LD, i.e., it is difficult to exclude pleiotropic effects
10
. To effectively differentiate LD effects from true pleiotropic causal effects, Zhu et al.
10
proposed the HEIDI test based on instrumental variables. In this study,PHEIDI > 0.05 was used as a filtering criterion to remove false positive associations caused by LD interference. Ultimately, significantly associated genes passing both filters were obtained, along with their association information, generating the SMR.filter.csv dataset, which serves as a core candidate set for subsequent functional enrichment and mechanistic analysis.

Family-Based Gene Intersection Analysis

To integrate the results from family-based WES and public omics data, and to improve the reliability and specificity of candidate genes, an intersection analysis was performed between the two independent gene sets. The significant genes identified by SMR analysis were intersected with those carrying key SNPs from the WES analysis of scoliosis families. Venn diagrams were plotted using R software (version 4.4.2) with the ggvenn package
11
to visualize the overlapping genes. Finally, 31 common genes present in both datasets were identified as core candidates for subsequent functional annotation and mechanistic exploration of scoliosis pathogenesis.

Data Visualization Analysis

All genome-wide data visualization analyses were performed using R software (version 4.4.2). The dplyr and tidyr packages
12,
13
were used for data manipulation, and the CMplot package
14
was used to generate Manhattan plots. A Manhattan plot was drawn with chromosomes on the x-axis and -log10 P-value on the y-axis for the 31 common genes, providing a visual representation of the genome-wide distribution of scoliosis-associated SNPs across human chromosomes. The Manhattan plot clearly shows the chromosomal localization of significant association loci and preliminarily reveals the distribution pattern of scoliosis susceptibility regions, offering a visual basis for subsequent core gene screening and functional interpretation.

To further validate the colocalization association between gene expression and scoliosis risk, scatter plots of eQTL effect size versus GWAS effect size were generated for each of the 31 common genes using the magick and TeachingDemos packages
15,
16
in R 4.4.2. In these plots, the x-axis represents the eQTL effect size of the variant on gene expression, and the y-axis represents the GWAS effect size of the variant on scoliosis risk, with the top cis-eQTL and common cis-eQTL sites annotated for each gene. These scatter plots effectively illustrate the correlation trend between eQTL effects and GWAS effects, assist in evaluating the strength of the causal association between gene expression and disease phenotype, and provide visual support for the colocalization validation of core candidate genes.

Functional Enrichment

The Gene Ontology (GO) database provides a standardized annotation system for gene functions and serves as one of the core bioinformatics tools for interpreting gene expression data. GO enrichment analysis systematically annotates gene functions from three dimensions: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC), effectively identifying statistically enriched functional categories and biological characteristics under disease conditions
17
. In this study, GO enrichment analysis was performed using the clusterProfiler package
18
in R software (version 4.4.2), and bubble plots were generated. A P-value < 0.05 was considered statistically significant for enrichment results
19
.

The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a comprehensive molecular function database constructed by the Bioinformatics Center of Kyoto University, covering 17 sub-databases and integrating information on cellular processes, genetic information, human diseases, and more
20,
21
. Enrichment analysis based on KEGG data is a common approach to identify biological pathways and functional modules involved in target genes. In this study, KEGG enrichment analysis was performed using the clusterProfiler package in R 4.4.2, and bubble plots were visualized. A P-value < 0.05 was considered statistically significant for KEGG enrichment results
19
.

Results

Family-based analysis of scoliosis cohorts

Based on the integrated analysis of WES data from 34 scoliosis families and SMR results, a total of 31 common genes present in both datasets were identified. These genes passed stringent filtering criteria (PSMR < 0.05 and PHEIDI > 0.05), indicating that they carry key genetic variants at the family level while also showing causal associations with scoliosis risk at the population level. The PSMR and PHEIDI values for these 31 common genes are detailed in Table 1.

Table 1. SMR and HEIDI test P-values for the 31 common genes associated with scoliosis
Correlation with loci Gene PSMR PHEIDI
Positive NCAM1 0.000922659 0.07680346
Positive AGPAT3 0.004800337 0.1413867
Positive SMIM12 0.00501161 0.05906706
Negative CTSH 0.00691361 0.8436791
Negative/td> LINC00299 0.01113481 0.06945593
Negative NPIPB11 0.01167407 0.2094704
Positive NAPRT 0.01293904 0.2441823
Negative HSPBP1 0.0159643 0.9743946
Negative TMEM132D 0.01659424 0.8338818
Negative DEAF1 0.01783776 0.5167472
Positive TSHR 0.01815261 0.9751504
Positive ALOX12-AS1 0.0189947 0.4511633
Negative L3MBTL4 0.02060692 0.5297665
Positive PRCP 0.02117731 0.6506786
Positive EIF3E 0.02500594 0.5799573
Positive KCNN3 0.02503178 0.8401287
Negative CRYGN 0.02639662 0.9189739
Positive CAMK2D 0.02760503 0.2768263
Negative NMNAT3 0.02938423 0.3255898
Negative NUP214 0.03011871 0.6775703
Not determined DLG2 0.03200031 0.3766503
Positive NMT1 0.03203178 0.1570335
Negative HEG1 0.03310556 0.1746253
Negative GOLGA8O 0.03618912 0.2041633
Positive SYNJ2BP 0.03664463 0.420418
Positive VDR 0.03686418 0.5702664
Negative GRB10 0.03821055 0.2389569
Negative ACOT7 0.03850855 0.6677867
Negative DOCK7 0.04185344 0.8608807
Negative LDHAL6A 0.04815689 0.6968079

Colocalization validation at the LINC00299 locus

The Manhattan plot illustrates the distribution of the 31 common genes across all chromosomes (Figure 1A), revealing that multiple genomic loci are significantly associated with scoliosis susceptibility. Colocalization analysis indicated that the long non-coding RNA LINC00299 exhibits a significant cis-eQTL effect in the context of scoliosis genetic susceptibility.

The single‑gene scatter plot for LINC00299 (Figure 1B) showed that most data points fall along a negative correlation trend line, indicating that alleles increasing its expression tend to decrease the risk of scoliosis. That is, higher expression of LINC00299 is associated with lower disease susceptibility, suggesting that down-regulation of this gene may promote abnormal spinal development. The top cis-eQTL for this locus (marked by a red triangle) also perfectly aligns with the overall fitted trend. Statistical analysis showed a significant SMR causal association test (PSMR = 0.01113481, P < 0.05) and a HEIDI heterogeneity test (PHEIDI = 0.06945593, P > 0.05) indicating no horizontal pleiotropy, thus excluding false colocalization bias.

Further analysis using the regional colocalization plot (Figure 1C) revealed that within the 8.5-9.0 Mb region on chromosome 2, the significant association peak from the scoliosis GWAS and the strong cis-eQTL association peak for LINC00299 are highly overlapping. The two genetic signals are consistently distributed across the genomic interval, exhibiting clear colocalization characteristics.

eQTL-GWAS effect characteristics of core candidate genes

Through integrative colocalization analysis of scoliosis GWAS signals and cis-eQTL signals, this study identified four candidate genes with potential pathogenic relevance:NCAM1, AGPAT3, SMIM12, and CTSH.

For NCAM1(Figure 2A), AGPAT3(Figure 2B), and SMIM12(Figure 2C), the cis-eQTL effect sizes showed a stable positive correlation with the GWAS disease effect sizes, indicating that the direction of genetic variant effects on gene expression was consistent with the direction of variant effects on scoliosis risk. In contrast, CTSH(Figure 2D) exhibited a clear negative correlation between eQTL effect sizes and GWAS effect sizes. Moreover, the top cis-eQTL (marked by red triangles) for each gene fell exactly on the overall fitted trend line, suggesting that altered expression levels of these four genes provide reliable genetic evidence for an association with the risk of scoliosis.

Functional annotation of key genes associated with scoliosis

Functional enrichment analysis was conducted on the 31 consensus genes associated with scoliosis. GO analysis identified significant enrichment across three standard functional dimensions(Figure 3A-B).

In the BP category, these genes were mainly clustered in pathways related to energy coenzyme biosynthesis and neural signal transduction. The most significantly enriched terms included NAD biosynthetic process, nicotinamide nucleotide biosynthetic process, pyridine nucleotide biosynthetic process, pyridine-containing compound biosynthetic process, Rho protein signal transduction, and regulation of cold-induced thermogenesis.

For the CC category, prominent enrichment was found in synaptic complexes, neuronal structures and specialized cell membrane domains, such as basal part of cell, postsynaptic density, asymmetric synapse, juxtaparanode region of axon, postsynaptic specialization, and neuron-to-neuron synapse.

In the MF group, enriched functions were primarily involved in catalytic activity, ion channel regulation and molecular binding. Key significant terms covered exopeptidase activity, dipeptidyl-peptidase activity, bile acid binding, myristoyltransferase activity, nuclear export signal receptor activity, and calcium-activated potassium channel activity. Full detailed statistics for all GO enrichment results are listed in Table 2.

Further KEGG pathway enrichment analysis revealed that these genes were significantly enriched in nicotinate and nicotinamide metabolism, insulin secretion, glucagon signaling pathway, Hypoxia-Inducible Factor-1 (HIF-1) signaling pathway, cofactor biosynthesis, and the renin-angiotensin system (Figure 3C).

Table 2. GO enrichment analysis results of the 31 common genes associated with scoliosis
ID Description Pvalue Enrichment factor
BP GO:0009435 NAD biosynthetic process 5.7e-04 56.17
BP GO:0019359 nicotinamide nucleotide biosynthetic process 6.7e-04 52.16
BP GO:0019363 pyridine nucleotide biosynthetic process 6.7e-04 52.16
BP GO:0072525 pyridine-containing compound biosynthetic process 8.2e-04 47.11
BP GO:0007266 Rho protein signal transduction 8.8e-04 15.76
BP GO:0120161 regulation of cold-induced thermogenesis 1.1e-03 14.51
CC GO:0045178 basal part of cell 9.5e-04 9.00
CC GO:0014069 postsynaptic density 1.4e-02 5.96
CC GO:0032279 asymmetric synapse 1.5e-02 5.70
CC GO:0044224 juxtaparanode region of axon 1.7e-02 59.40
CC GO:0099572 postsynaptic specialization 1.8e-02 5.40
CC GO:0098984 neuron to neuron synapse 2.0e-02 5.19
MF GO:0008238 exopeptidase activity 1.0e-02 12.99
MF GO:0008239 dipeptidyl-peptidase activity 1.5e-02 66.92
MF GO:0032052 bile acid binding 1.5e-02 66.92
MF GO:0019107 myristoyltransferase activity 1.6e-02 60.83
MF GO:0005049 nuclear export signal receptor activity 1.8e-02 55.76
MF GO:0015269 calcium-activated potassium channel activity 1.8e-02 55.76

Discussion

By integrating multi-omics data from family-based WES, cis-eQTL, and GWAS, combined with SMR-HEIDI colocalization analysis and family-based variant validation, this study systematically dissected the genetic susceptibility mechanisms underlying scoliosis. We identified 31 common genes that link family-derived rare variants with population-level expression regulation signals, constructing a convergent genetic map connecting transcriptional regulation to scoliosis susceptibility
22,
23,
24
. Among these, the non-coding RNA LINC00299 and the protein-coding genes NCAM1, AGPAT3, SMIM12, and CTSH demonstrated key functional and regulatory potential, providing new insights into the pathogenesis of scoliosis.

The strong colocalization signal observed for LINC00299 suggests that its expression level is inversely correlated with scoliosis risk, indicating an important role for this long non-coding RNA as a regulatory element in spinal morphogenesis
25,
26
. As a key non-coding regulatory gene identified in this study, dysregulation of LINC00299 may interfere with the gene regulatory network governing normal spinal development, positioning it as a potential critical regulatory node mediating genetic susceptibility to scoliosis.

Beyond LINC00299, four protein-coding genes were also confirmed to participate in core biological processes relevant to scoliosis. NCAM1 mediates neuronal adhesion and synaptic signaling, influencing muscle tone balance and the maintenance of spinal symmetry through the regulation of neuronal development and neuromuscular junction function
27,
28,
29
. AGPAT3 is involved in phospholipid biosynthesis and fatty acid remodeling, playing an important role in maintaining chondrocyte membrane integrity and regulating normal skeletal morphogenesis
30,
31,
32
. SMIM12 is a mitochondrial membrane protein that regulates cellular energy metabolism and oxidative stress homeostasis; its expression imbalance may induce muscle metabolic disturbances and bilateral muscle tension asymmetry
33
. CTSH encodes a lysosomal cysteine protease that participates in extracellular matrix degradation and bone remodeling, thereby regulating vertebral bone metabolism and spinal structural stability
34,
35
.

Together, these genes cover multiple key biological dimensions including neural regulation, lipid and energy metabolism, muscle function, and bone remodeling, collectively forming a multi-system pathogenic regulatory network spanning the metabolic-neural-skeletal axis. This provides new molecular evidence and research directions for deciphering the complex multi-dimensional molecular pathogenesis of scoliosis.Recent genetic studies have identified susceptibility genes for scoliosis through genome-wide association analysis, and bidirectional Mendelian randomization investigations have further revealed causal associations of inflammatory cytokines and gut microbiota with scoliosis, supporting that scoliosis is regulated by multiple genetic and molecular factors
36,
37,
38
.

Functional enrichment analysis revealed that GO annotations of the candidate genes were mainly enriched in core functions such as neuronal synapse structure, energy coenzyme biosynthesis, and ion channel regulation, confirming their involvement in neural development and energy metabolism regulation at the levels of cellular component, biological process, and molecular function. KEGG pathway enrichment further highlighted significant enrichment in nicotinate and nicotinamide metabolism, insulin and glucagon signaling pathways, HIF-1 signaling, and the renin-angiotensin system, suggesting extensive interplay between endocrine-metabolic regulation and skeletal growth. The key genes identified in this study may link systemic metabolic regulation with spinal skeletal development by modulating metabolic homeostasis, energy substrate synthesis, and neuromuscular signaling. These findings indicate that scoliosis is not a purely orthopedic disorder but rather a multi-organ developmental condition involving coordinated neural and metabolic regulation
34,
39
.

Several limitations of this study should be acknowledged. First, the GWAS summary data were obtained from the FinnGen database, which predominantly includes individuals of Finnish European ancestry, and the eQTL data were also derived from whole-blood samples of European populations. Therefore, the generalizability of our conclusions to other ethnic groups requires further validation
40
. Second, although SMR analysis can effectively identify causal relationships between gene expression and disease, it cannot completely exclude bias due to horizontal pleiotropy, and the power of the HEIDI test depends on the number of instrumental variables and the accuracy of the linkage disequilibrium structure
10
. Third, the functional roles of the key genes identified in this study are primarily based on bioinformatics annotations and lack experimental validation in vitro or in vivo; their specific pathogenic mechanisms await further investigation. Finally, scoliosis exhibits high clinical heterogeneity, and the GWAS data used in this study were not stratified by different clinical subtypes (e.g., idiopathic, congenital, etc.), which may affect the fine mapping of our results.

Conclusion

This study integrated family-based multi-omics data and SMR-HEIDI colocalization analysis to identify 31 scoliosis susceptibility-related consensus genes, with LINC00299 and four protein-coding genes emerging as critical regulatory and functional hubs. Our findings reveal that scoliosis arises from a complex neural–metabolic–skeletal interactive network, rather than being a simple orthopedic disease. This family-driven integrative genomic analysis expands the understanding of scoliosis genetic mechanisms, and provides novel candidate targets and a foundational framework for future mechanistic research and translational development.

Author contributions

Xuanchen Li, Zhenrui Lin: Conceptualization, data curation, formal analysis, visualization, manuscript drafting. Xiaosheng Chen, Zhixiang Zhu, Rui Zhang: Data evaluation, Data collection, Data filtering. Weijun Wang, WenYu Zhou, Yi He, Lei Yang: Data analysis, result visualization, manuscript editing. Lihe Jiang, Guodong Huang and Qian Liang: Conceptualization, investigation, data curation, supervision, writing manuscript and review.

Funding

This work was supported by the Natural Science Foundation of Guangdong Province (2024A1515013174), Sanming Project of Medicine in Shenzhen (SZSM202211003), Shenzhen Key Medical Discipline Construction Fund (SZXK022), the National Natural Science Foundation Youth Project of China (82203807), Shenzhen-Hong Kong Jointly Funded Project, Shenzhen Science and Technology Program (SGDX20230116093645007), Shenzhen Medical Research Fund (B2303005), Development and Reform Commission of Shenzhen Municipality (S2002Q84500835), Shenzhen Basic Research Project (JCYJ20220530150810024), (JCYJ20240813141020027).

Acknowledgements

The authors would like to express sincere thanks to all participating schools and students of the study and all the rehabilitation therapists from Shenzhen Youth Spine Health Center for their help with the scoliosis screening and data collection. Thanks to/Supported by Shenzhen Portion of Shenzhen-Hong Kong Science and Technology Innovation Cooperation Zone, project No. HTHZQSWS-KCCYB-2023060.

References

Figure Legends(3)

Supplemental information(1)
Citation

Li, X., Lin, Z., Chen, X., Zhu, Z., Zhang, R., Wang, W., Yang, L., He, Y., Jiang, L., Huang, G., Liang, Q. (2026). Family-based integr ative genomics identifies neural–metabolic–skeletal pathways und erlying scoliosis susceptibility. iCell, 3(1), 23-31. https://doi.org/ 10.71373/DIBK7560