Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data8.99Brief Bioinform . 2020 May 21;21(3):1058-1068. doi: 10.1093/bib/bbz049.
Abstract
The etiology of schizophrenia (SCZ) is regarded as one of the most fundamental puzzles in current medical research, and its diagnosis is limited by the lack of objective molecular criteria. Although plenty of studies were conducted, SCZ gene signatures identified by these independent studies are found highly inconsistent. As one of the most important factors contributing to this inconsistency, the feature selection methods used currently do not fully consider the reproducibility among the signatures discovered from different datasets. Therefore, it is crucial to develop new bioinformatics tools of novel strategy for ensuring a stable discovery of gene signature for SCZ. In this study, a novel feature selection strategy (1) integrating repeated random sampling with consensus scoring and (2) evaluating the consistency of gene rank among different datasets was constructed. By systematically assessing the identified SCZ signature comprising 135 differentially expressed genes, this newly constructed strategy demonstrated significantly enhanced stability and better differentiating ability compared with the feature selection methods popular in current SCZ research. Based on a first-ever assessment on methods' reproducibility cross-validated by independent datasets from three representative studies, the new strategy stood out among the popular methods by showing superior stability and differentiating ability. Finally, 2 novel and 17 previously reported transcription factors were identified and showed great potential in revealing the etiology of SCZ. In sum, the SCZ signature identified in this study would provide valuable clues for discovering diagnostic molecules and potential targets for SCZ.
Keywords: combined analysis; consistent gene signature; feature selection strategy; schizophrenia; transcriptomics.
复杂疾病的诊断很大程度上受到缺乏客观分子标准的限制。尽管当前进行了大量疾病特征基因筛选的研究,但是这些独立研究所确定的特征基因具有高度的不一致。导致这种不一致的重要原因之一就是当前使用的特征选择方法并未完全考虑从不同数据集发现的特征基因之间的可重复性。所以,开发具有新颖策略的生物信息学工具至关重要。今天这篇文章提出了一种新颖的特征选择策略,具有更高的稳定性和更好的分类能力。文章五月份发表在BRIEFINGS IN BIOINFORMATICS(IF: 8.99)。
转自生信人