基于综合的转录组数据集通过新颖的特征选择策略确定精神分裂症的一致基因特征

Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data8.99Brief Bioinform . 2020 May 21;21(3):1058-1068. doi: 10.1093/bib/bbz049.

Abstract

The etiology of schizophrenia (SCZ) is regarded as one of the most fundamental puzzles in current medical research, and its diagnosis is limited by the lack of objective molecular criteria. Although plenty of studies were conducted, SCZ gene signatures identified by these independent studies are found highly inconsistent. As one of the most important factors contributing to this inconsistency, the feature selection methods used currently do not fully consider the reproducibility among the signatures discovered from different datasets. Therefore, it is crucial to develop new bioinformatics tools of novel strategy for ensuring a stable discovery of gene signature for SCZ. In this study, a novel feature selection strategy (1) integrating repeated random sampling with consensus scoring and (2) evaluating the consistency of gene rank among different datasets was constructed. By systematically assessing the identified SCZ signature comprising 135 differentially expressed genes, this newly constructed strategy demonstrated significantly enhanced stability and better differentiating ability compared with the feature selection methods popular in current SCZ research. Based on a first-ever assessment on methods' reproducibility cross-validated by independent datasets from three representative studies, the new strategy stood out among the popular methods by showing superior stability and differentiating ability. Finally, 2 novel and 17 previously reported transcription factors were identified and showed great potential in revealing the etiology of SCZ. In sum, the SCZ signature identified in this study would provide valuable clues for discovering diagnostic molecules and potential targets for SCZ.   

Keywords: combined analysis; consistent gene signature; feature selection strategy; schizophrenia; transcriptomics.

复杂疾病的诊断很大程度上受到缺乏客观分子标准的限制。尽管当前进行了大量疾病特征基因筛选的研究,但是这些独立研究所确定的特征基因具有高度的不一致。导致这种不一致的重要原因之一就是当前使用的特征选择方法并未完全考虑从不同数据集发现的特征基因之间的可重复性。所以,开发具有新颖策略的生物信息学工具至关重要。今天这篇文章提出了一种新颖的特征选择策略,具有更高的稳定性更好的分类能力。文章五月份发表在BRIEFINGS IN BIOINFORMATICS(IF: 8.99)。

一、摘要:
精神分裂症(SCZ)的病因学被认为是当前医学研究中最基本的难题之一,其诊断受到缺乏客观分子标准的限制。因此,作者提出了一种新的特征选择策略,首先将重复随机抽样与共识评分相结合,接着评估不同数据集之间基因等级的一致性,最终识别到135个差异表达的SCZ特征基因。此外,通过评估发现新策略优越的稳定性和分类能力。

二、材料方法:
1.数据:9套SCZ数据分别来自于GEO, HBB和SMRI。

2.方法:z-score、RFE-SVM、PPI网络构建、功能富集分析、转录因子调控网络。

三、结果:
1.基于RFE-SVM构建一致特征基因识别新模型

图1.本研究流程图及新构造的特征选择策略

2.五个层面验证一致特征基因的稳定性
2.1.不同抽样组或数据集的特征具有高稳定性

图2.从(A)抽样组和(B)独立数据集识别出的签名具有高稳定性

2.2.SCZ一致基因的疾病相关性

表1.已发表文章与当前识别的一致基因的对比

2.3一致特征基因的蛋白互作网络构建

图3.利用本研究中鉴定的135个基因签名构建PPI网络

2.4.一致特征基因功能富集分析

表1.已发表文章与当前识别的一致基因的对比

2.5.模型的复现性交叉验证

表2.两种常用特征选择方法的再现性

3.转录因子调控网络构建

图4.本研究中发现的两个新TFs (NF1和GABP)及其调控的17个DEGs

总结:

文章基于RFE-SVM,利用多套数据构建了一个全新的一致特征基因识别策略。同时,作者从五个方面的分析验证了识别策略的稳定性和有效性。最终识别到一系列与SCZ密切相关的特征基因。不仅为SCZ的诊断提供了有价值的线索,也为其他疾病的特征基因识别提供了一个全新的、稳定性高的识别方法。

转自生信人

分享