GEO数据库联合机器学习策略识别骨关节炎特征性lncRNA分子标志物及实验验证
doi:
Identification of Characteristic lncRNA Molecular Markers in Osteoarthritis by Integrating GEO Database and Machine Learning Strategies and Experimental Validation
-
摘要:
目的 利用GEO(Gene Expression Omnibus)数据库联合机器学习筛选骨关节炎(osteoarthritis, OA)特征性的长链非编码RNA(lncRNA)分子标志物。 方法 纳入185例OA及76例正常健康人样本,GEO数据库筛选数据集得出差异表达lncRNA,通过随机森林(randomforest, RF)、最小绝对收缩和选择算子(LASSO)逻辑回归、支持向量机递归特征消除(SVM-RFE)3种算法筛选候选的lncRNA模型,绘制受试者操作特征曲线评价模型。收集临床OA患者30例和正常对照15例的外周血,测定免疫炎症指标,RT-PCR定量分析外周血单核细胞lncRNA分子标志物的表达,Pearson分析lncRNA与免疫炎症指标的相关性。 结果 LASSO得出14个关键标志物,SVM-RFE算法确定6个基因,RF算法确定24个基因。Venn图筛选得出3种算法的重叠基因,包括HOTAIR、H19、MIR155HG和NKILA。受试者工作特征曲线显示这4个lncRNA的曲线下面积均大于0.7。RT-PCR法发现与正常对照组相比,HOTAIR、H19、MIR155HG在OA患者外周血单核细胞中相对表达量升高,NKILA表达量下降(均P<0.01),结果与生物信息学预测结果相一致。Pearson相关性分析表明选定的lncRNA与临床免疫炎症指标相关。 结论 HOTAIR、H19、MIR155HG和NKILA可作为OA临床诊断分子标志物,且与临床免疫炎症指标相关。 -
关键词:
- koko体育app: 骨关节炎 /
- 长链非编码RNA /
- 机器学习策略 /
- koko体育app: 诊断标志物 /
- 免疫炎症
Abstract:Objective To screen for long non-coding RNA (lncRNA) molecular markers characteristic of osteoarthritis (OA) by utilizing the Gene Expression Omnibus (GEO) database combined with machine learning. Methods The samples of 185 OA patients and 76 healthy individuals as normal controls were included in the study. GEO datasets were screened for differentially expressed lncRNAs. Three algorithms, the least absolute shrinkage and selection operator (LASSO), support vector machine recursive feature elimination (SVM-RFE), and random forest (RF), were used to screen for candidate lncRNA models and receiver operating characteristic (ROC) curves were plotted to evaluate the models. We collected the peripheral blood samples of 30 clinical OA patients and 15 health controls and measured the immunoinflammatory indicators. RT-PCR was performed for quantitative analysis of the expression of lncRNA molecular markers in peripheral blood mononuclear cells (PBMC). Pearson analysis was performed to examine the correlation between lncRNA and indicators for inflammation of the immune system. Results A total of 14 key markers were identified with LASSO, 6 genes were identified with SVM-RFE, and 24 genes were identified with RF. Venn diagram was used to screen for overlapping genes identified with the three algorithms, showing HOTAIR, H19, MIR155HG, and NKILA to be the overlapping genes. The ROC curves showed that these four lncRNAs all had an area under the curve (AUC) greater than 0.7. The RT-PCR findings revealed relatively elevated expression of HOTAIR, H19, and MIR155HG and decreased expression of NKILA in the PBMC of OA patients compared with those of the normal group (P<0.01). The results were consistent with the bioinformatics predictions. Pearson analysis showed that the candidate lncRNAs were correlated with clinical indicators for inflammation. Conclusion HOTAIR, H19, MIR155HG, and NKILA can be used as molecular markers for the clinical diagnosis of OA and are correlate with clinical indicators of inflammation of the immune system. -
koko体育app
图 1 Combat函数消除数据的批次效应
Figure 1. Eliminating the batch effect of th🎃e data with combat funct🎃ion
A, Five data sets before normalization. B, After normalization of the five data sets.图 2 5个数据集差异表达lncRNA火山图
Figure 2. Volcano plot of differentiall♕y expressed lncRNAs in the five datasꦐets
Black represents all differentially expressed lncRNAs, red represents lncRNAs with log2FC>0, and green represents lncRNAs with log2FC<0.
图 3 LASSO算法筛选14个lncRNA
Figure 3. ﷽ LASSO algorithm was used to screen out 14 lncR♎NAs
A, Each curve in the figure represents the change trajectory of each independent variable coefficient, the vertical coordinate is the value of the coefficient, the lower horizontal coordinate is log (λ), and the upper horizontal coordinate is the number of non-zero coefficients in the model at this time. B, The vertic🐠al coordinate is Binomial Deviance (dichotomous anomaly), which can be interpreted♏ as the magnitude of the error of the model. There are two dashed lines of values in the figure, the left is the line with the lowest error and the right is the line with fewer features.
图 4 SVM-RFE算法筛选出6个关键lncRNA
Figure 4. Support vec🎃tor machine recursive🉐 feature elimination (SVM-RFE) algorithm was used to screen out 6 key lncRNAs
Graph A is SVM error and graph B is SVM accuracy. 5×CV represents 5-fold cross-validation. The number 6-0.173 in Fig 4A indicates that the error rate for the six trait genes screened out was 0.173. The number 6-0.827 in Fig 4B indicates that the accuracy rate of the six trait genes scree♊ned out was 0🌌.827.
图 5 RF算法筛选24个特征lncRNA
Figure 5. Random forest (RF) algorithm was used to screen out 24 feature lncRN🅰As
A, The dynamics of the random forest prediction error versus the number of random trees, with the vertical axis of error representing the error; the horizontal axis of trees representing the tree number. The black, red, and green lines show how the false positive rate varies with the number of decision trees for all samples, samples from osteoarthritis patients, and samples from normal healthy people in the five datasets, respectively. B, The 24 genes sorted by importance.图 6 关键lncRNA的筛选及验证
Figure 6. Screening and validation of key lncRNAs
A, Venn diagram was used to screen for overlapping genes identified by the three algorithms. B, ROC curves for validating diagnostic efficacy after fitting key lncRNA to one variable.图 7 RT-PCR检测lncRNA分子标志物的表达
Figure 7. ꦓ RT-PCR to detect ꦆthe expression of lncRNAs molecular markers
表 1 基因数据集信息
Table 1. Information on the gene datasets
Number GEO dataset Platform documents NC OA 1 GSE43270 GPL8490 18 23 2 GSE51588 GPL13497 10 40 3 GSE117999 GPL20844 10 10 4 GSE169077 GPL96 5 6 5 GSE48556 GPL6947 33 106 NC: normal control; OA: osteoarthritis. 下载: 导出CSV
表 2 特异基因引物序列
Table 2. Specific gene primer sequences
Gene Forward primer (5′→3′) Reverse primer (5′→3′) GAPDH TTCCACCCATGGCAAATTCC ATCTCGCTCCTGGAAGATGG MIR155HG GAGTGCTGAAGGCTTGCTGT TTGAACATCCCAGTGACCAG HOTAIR GGAAAGATCCAAATGGGACC CTAGGAATCAGCACGAAGCA H19 TGATGACGGGTGGAGGGGCT TGATGTCGCCCTGTCTGCAC NKILA CTGTCGGGGACTGGTGTATT AATACACCAGTCCCCGACAG GAPDH: glyceraldehyde-3-phosphate dehydrogenase; MIR155HG: MIR155 host gene; HOTAIR: HOX transcript antisense RNA; H19: H19 imprinted maternally expressed transcript; NKILA: NF-kappa B interacting lncRNA. 下载: 导出CSV
表 3 差异表达最显著的前10个lncRNA
Table 3. 🐟 Top 10 lncRNAs showing the most significant difference in their expression
Index GEO data set Gene log2FC P.Value adj.P.Val 1 GSE51588 MIR155HG 9.581 4.44E-03 9.05E-02 GSE117999 GSE48556 2 GSE51588 HOTAIR 2.321 6.44E-06 9.90E-04 GSE117999 GSE48556 3 GSE48556 NKILA −3.686 1.46E-05 1.26E-02 GSE169077 4 GSE43270 H19 2.216 3.05E-05 1.34E-02 GSE51588 GSE117999 GSE48556 5 GSE43270 MEG3 −3.033 3.01E-05 1.34E-02 GSE51588 GSE117999 GSE48556 6 GSE48556 LINC00973 2.146 3.76E-05 1.36E-02 7 GSE51588 C15orf54 −2.013 8.44E-05 2.01E-02 GSE117999 GSE48556 8 GSE117999 MEG9 2.252 1.33E-04 2.43E-02 9 GSE43270 PART1 2.191 1.73E-03 6.23E-02 GSE51588 GSE117999 GSE48556 10 GSE51588 C3orf79 2.179 2.10E-03 6.67E-02 GSE117999 下载: 导出CSV
表 4 两组免疫炎症指标的变化
Table 4. Changes in immunoinflammatory indicat♏ors in th🐻e two groups
Indicator NC group (n=15) OA group (n=30) P ESR/(mm/1 h) 3.45±1.34 15.6±7.34 <0.001 CRP/(mg/L) 0.73±0.56 8.3±4.24 <0.001 IgA/(g/L) 1.68±0.22 3.73±1.25 <0.001 IgM/(g/L) 1.04±0.12 1.25±0.65 0.654 IgG/(g/L) 11.47±3.45 13.79±6.44 0.545 IgE/(IU/mL) 19.49±9.45 70.56±15.56 0.013 C3/(g/L) 0.63±0.12 0.84±0.32 0.576 C4/(g/L) 0.11±0.11 0.76±0.89 0.021 IL-6/(pg/mL) 2.38±1.45 13.09±3.56 0.011 ESR: erythrocyte sedimentation rate; CRP: C-reactive protein; IgA: immunoglobulin A; IgM: immunoglobulin M; IgG: immunoglobulin G; IgE: immunoglobulin E; C3: complement 3; C4: complement 4; IL-6: interleukin 6. The other abbreviations are explained in the notes to Table 1. 下载: 导出CSV
表 5 lncRNA分子标志物与免疫炎症指标的Pearson分析
Table 5. Pearson analysis of lncRNA molecular m🦋arkers and immunoinf♑lammatory indicators
Indicator H19 MIR155HG NKILA HOTAIR r P r P r P r P ESR/(mm/1 h) 0.044 0.816 0.355 0.052 −0.425 0.021 0.345 0.054 CRP/(mg/L) 0.014 0.941 0.785 <0.001 −0.308 0.064 0.589 0.001 IgA/(g/L) 0.439 0.018 0.220 0.243 −0.312 0.056 0.212 0.260 IgM/(g/L) 0.298 0.110 0.454 0.008 −0.063 0.742 0.040 0.834 IgG/(g/L) 0.090 0.637 0.119 0.531 −0.122 0.522 0.095 0.618 IgE/(IU/mL) 0.358 0.051 0.008 0.968 −0.183 0.333 0.445 0.014 C3/(g/L) 0.035 0.856 0.212 0.260 −0.194 0.304 0.214 0.247 C4/(g/L) 0.028 0.883 0.010 0.960 −0.007 0.972 0.221 0.214 IL-6/(pg/mL) 0.061 0.749 0.610 <0.001 0.650 <0.001 0.492 0.006 ESR, CRP, IgA, IgM, IgG, IgE, C3, C4 and IL-6 denote the same as those in Table 4. H19, MIR155HG, NKILA and HOTAIR denote the same as those in Table 2. 下载: 导出CSV
-
[1] ZHOU Q, LIU J, XIN L, et al. Exploratory compatibility regularity of Traditional Chinese Medicine on osteoarthritis treatment: a data mining and random walk-based identification. Evid Based Complement Alternat Med,2021,2021: 2361512. doi: [2] LI J, YANG X, CHU Q, et al. Multi-omics molecular biomarkers and database of osteoarthritis. Database (Oxford),2022,2022: baac052. doi: [3] 仇学梅, 李鑫, 刘锐. 非编码RNA与先天免疫信号调控. koko体育app 学报(医学版),2022,53(1): 20–27. doi: [4] ZHOU L, WAN Y, CHENG Q, et al. The expression and diagnostic value of lncRNA H19 in the blood of patients with osteoarthritis. Iran J Public Health,2020,49(8): 1494–1501. doi: [5] ZHOU Y, LI J, XU F, et al. Long noncoding RNA H19 alleviates inflammation in osteoarthritis through interactions between TP53, IL-38, and IL-36 receptor. Bone Joint Res,2022,11(8): 594–607. doi: [6] CHEN X, LIU J, SUN Y, et al. Correlation analysis of differentially expressed long non-coding RNA HOTAIR with PTEN/PI3K/AKT pathway and inflammation in patients with osteoarthritis and the effect of baicalin intervention. J Orthop Surg Res,2023,18(1): 34. doi: [7] HU J, WANG Z, SHAN Y, et al. Long non-coding RNA HOTAIR promotes osteoarthritis progression via miR-17-5p/FUT2/β-catenin axis. Cell Death Dis,2018,9(7): 711. doi: [8] ZHOU Z, CHEN J, HUANG Y, et al. Long noncoding RNA GAS5: a new factor involved in bone diseases. Front Cell Dev Biol,2022,26(9): 807419. doi: [9] CULEMANN S, GRUNEBOOM A, KRONKE G. Origin and function of synovial macrophage subsets during inflammatory joint disease. Adv Immunol,2019,143: 75–98. doi: [10] ZHANG Q, SUN C, LIU X, et al. Mechanism of immune infiltration in synovial tissue of osteoarthritis: a gene expression-based study. J Orthop Surg Res,2023,18(1): 58. doi: [11] FERNANDEZ-TAJES J, SOTO-HERMIDA A, VAZQUEZ-MOSQUERA M E, et al. Genome-wide DNA methylation analysis of articular chondrocytes reveals a cluster of osteoarthritic patients. Ann Rheum Dis,2014,73(4): 668–677. doi: [12] CHOU C H, WU C C, SONG I W, et al. Genome-wide expression profiles of subchondral bone in osteoarthritis. Arthritis Res Ther,2013,15(6): R190. doi: [13] BROPHY R H, ZHANG B, CAI L, et al. Transcriptome comparison of meniscus from patients with and without osteoarthritis. Osteoarthritis Cartilage,2018,26(3): 422–432. doi: [14] RAMOS Y F, BOS S D, LAKENBERG N, et al. Genes expressed in blood link osteoarthritis with apoptotic pathways. Ann Rheum Dis,2014,73(10): 1844–1853. doi: [15] RADUA J, VIETA E, SHINOHARA R, et al. Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA. Neuroimage,2020,218: 116956. doi: [16] PEIGNIER S, CALEVRO F. Gene self-expressive networks as a generalization-aware tool to model gene regulatory networks. Biomolecules,2023,13(3): 526. doi: [17] SPEISER J L. A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data. J Biomed Inform,2021,117: 103763. doi: [18] LEVY J J, O'MALLEY A J. Don't dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning. BMC Med Res Methodol,2020,20(1): 171. doi: [19] LIN X, LI C, ZHANG Y, et al. Selecting feature subsets based on SVM-RFE and the overlapping ratio with applications in bioinformatics. Molecules,2017,23(1): 52. doi: [20] 高宏伟, 于东旭, 韩继成, 等. 基于循证医学指南的膝关节骨关节炎非手术诊疗方案思考. 长春中医药大学学报,2022,38(9): 952–955. doi: [21] OLSSON S, AKBARIAN E, LIND A, et al. Automating classification of osteoarthritis according to Kellgren-Lawrence in the knee using deep learning in an unfiltered adult population. BMC Musculoskelet Disord,2021,22(1): 844. doi: [22] HAUBRUCK P, PINTO M M, MORADI B, et al. Monocytes, macrophages, and their potential niches in synovial joints-therapeutic targets in post-traumatic osteoarthritis? Front Immunol,2021,12: 763702. doi: [23] CHEN H, YANG S, SHAO R. Long non-coding XIST raises methylation of TIMP-3 promoter to regulate collagen degradation in osteoarthritic chondrocytes after tibial plateau fracture. Arthritis Res Ther,2019,21(1): 271. doi: [24] HAN H, LIN L. Long noncoding RNA TUG1 regulates degradation of chondrocyte extracellular matrix via miR-320c/MMP-13 axis in osteoarthritis. Open Life Sci,2021,16(1): 384–394. doi: [25] ZHANG L, ZHANG P, SUN Y, et al. Long non-coding RNA DANCR regulates proliferation and apoptosis of chondrocytes in osteoarthritis via miR-216a-5p-JAK2-STAT3 axis. Biosci Rep,2018,138(6): BSR20181228. doi: [26] LI R, SHI T T, WANG Q, et al. Elevated lncRNA MIAT in peripheral blood mononuclear cells contributes to post-menopausal osteoporosis. Aging (Albany NY),2022,14(7): 3143–3154. doi: [27] WANG A, HU N, ZHANG Y, et al. MEG3 promotes proliferation and inhibits apoptosis in osteoarthritis chondrocytes by miR-361-5p/FOXO1 axis. BMC Med Genomics,2019,12(1): 201. doi: [28] ZOU Y, SHEN C, SHEN T, et al. lncRNA THRIL is involved in the proliferation, migration, and invasion of rheumatoid fibroblast-like synoviocytes. Ann Transl Med,2021,9(17): 1368. doi: [29] DANG X, WU D. The diagnostic value and pathogenetic role of lncRNA-ATB in patients with osteoarthritis. Cell Mol Biol Lett,2018,27(23): 55. doi: [30] LU J, WU Z, XIONG Y. Knockdown of long noncoding RNA HOTAIR inhibits osteoarthritis chondrocyte injury by miR-107/CXCL12 axis. J Orthop Surg Res,2021,16(1): 410. doi: [31] HU D, ZHONG T, DAI Q. Long non-coding RNA NKILA reduces oral squamous cell carcinoma development through the NF-KappaB signaling pathway. Technol Cancer Res Treat,2020,19: 1533033820960747. doi: [32] PENG L, CHEN Z, CHEN Y, et al. MIR155HG is a prognostic biomarker and associated with immune infiltration and immune checkpoint molecules expression in multiple cancers. Cancer Med,2019,8(17): 7161–7173. doi: [33] MAO Z, ZHU Y, HAO W, et al. MicroRNA-155 inhibition up-regulates LEPR to inhibit osteoclast activation and bone resorption via activation of AMPK in alendronate-treated osteoporotic mice. IUBMB Life,2019,71(12): 1916–1928. doi: [34] WIEGERTJES R, Van De LOO F A J, BLANEY DAVIDSON E N. A roadmap to target interleukin-6 in osteoarthritis. Rheumatology (Oxford),2020,59(10): 2681–2694. doi: [35] WANG Q, LEPUS C M, RAGHU H, et al. IgE-mediated mast cell activation promotes inflammation and cartilage destruction in osteoarthritis. Elife,2019,8: e39905. doi: [36] STAMBOUGH J B, CURTIN B M, ODUM S M, et al. Does change in ESR and CRP guide the timing of two-stage arthroplasty reimplantation? Clin Orthop Relat Res,2019,477(2): 364–371. doi: [37] KRISHNA A, GARG S, GUPTA S, et al. C-reactive protein (CRP) and erythrocyte sedimentation rate (ESR) trends following total hip and knee arthroplasties in an Indian population--a prospective study. Malays Orthop J,2021,15(2): 143–150. doi: [38] 鲍丙溪, 刘健, 万磊, 等. 骨关节炎患者免疫炎症关键蛋白表达谱变化及中医药的干预研究. 中国免疫学杂志,2021,37(11): 1313–1318. doi: [39] GRONWALL C, LILJEFORS L, BANG H, et al. A comprehensive evaluation of the relationship between different IgG and IgA anti-modified protein autoantibodies in rheumatoid arthritis. Front Immunol,2021,12: 627986. doi: [40] JONES K, SAVULESCU A F, BROMBACHER F, et al. Immunoglobulin M in health and diseases: how far have we come and what next? Front Immunol,2020,11: 595535. doi: -
休馆得到 本文遵循知识共享署名—非商业性使用4.0国际许可协议(CC BY-NC 4.0),允许第三方对本刊发表的论文自由共享(即在任何媒介以任何形式复制、发行原文)、演绎(即修改、转换或以原文为基础进行创作),必须给出适当的署名,提供指向本文许可协议的链接,同ꦓ时标明是否对原文作了修改;不得将本文用于商业目的。CC BY-NC 4.0许可协议详情请访问 //creativecommons.org/licenses/by-nc/4.0