数据库|EPD:真核生物启动子数据库

EPD in 2020: enhanced data visualization and extension to ncRNA promoters11.501Nucleic Acids Res . 2020 Jan 8;48(D1):D65-D69. doi: 10.1093/nar/gkz1014.

Abstract

The Eukaryotic Promoter Database (EPD), available online at https://epd.epfl.ch, provides accurate transcription start site (TSS) information for promoters of 15 model organisms plus corresponding functional genomics data that can be viewed in a genome browser, queried or analyzed via web interfaces, or exported in standard formats (FASTA, BED, CSV) for subsequent analysis with other tools. Recent work has focused on the improvement of the EPD promoter viewers, which use the UCSC Genome Browser as visualization platform. Thousands of high-resolution tracks for CAGE, ChIP-seq and similar data have been generated and organized into public track hubs. Customized, reproducible promoter views, combining EPD-supplied tracks with native UCSC Genome Browser tracks, can be accessed from the organism summary pages or from individual promoter entries. Moreover, thanks to recent improvements and stabilization of ncRNA gene catalogs, we were able to release promoter collections for certain classes of ncRNAs from human and mouse. Furthermore, we developed automatic computational protocols to assign orphan TSS peaks to downstream genes based on paired-end (RAMPAGE) TSS mapping data, which enabled us to add nearly 9000 new entries to the human promoter collection. Since our last article in this journal, EPD was extended to five more model organisms: rhesus monkey, rat, dog, chicken and Plasmodium falciparum.

启动子(Promotor)在概念上被定义为转录起始位点(TSS)或转录起始区。为了根据实验证据,提供准确的TSS注释,于1986年创建真核生物启动子数据库EPDhttps://epd.epfl.ch)。最初,EPD只是一个手动整理期刊发表结果的数据库,随着二代测序的出现,EPD也开始整合从高通量的转录本作图数据和高质量的基因注释资源中获得的启动子数据,更将数据集范围扩展到了ncRNA的启动子。更新后的数据库于20201月发表在知名期刊《Nucleic Acids Rsearch》上。

       EPD数据库可以为15种模式生物的启动子提供准确的转录起始位点(TSS)信息以及相应的功能基因组学数据,这些数据可以在基于UCSC的启动子基因组浏览器中查看,通过Web界面进行查询或分析,或者以标准格式(FASTABEDCSV)导出随后使用其他工具进行分析。除了适用于EPD涵盖的所有15种生物的通用查看器之外,数据库还提供针对特定细胞类型或组织的专门查看器。

EPD数据库网站首页

UCSC基因组浏览器

20171月至今,EPD整合了针对鸡,狗,大鼠,恒河猴和疟原虫的启动子,发布了人类、小鼠、果蝇和拟南芥等物种的新版本数据集,当前的启动子条目已经整理在下表中。随着疟原虫启动子集合的发布,EPD首次覆盖了人类病原体,这是朝着新方向迈出的重要一步。

EPD涵盖的生物体和相应的启动子总数

由于ncRNA是由不同聚合酶转录的异质分子组,它们经历了各种转录后加工事件并位于不同的亚细胞区室中,目前尚无整合ncRNA启动子的数据库资源。基于这一需求,EPD数据库使用GENCODE生物型注释作为分类标准,对现有ncRNA数据进行分类,最终将来自人的2339个和来自小鼠的3077ncRNA启动子整合到了数据库中。

相对于EPD中定义的人类编码和非编码基因的TSS,核心启动子基序的频率和位置分布

分享