EPD in 2020: enhanced data visualization and extension to ncRNA promoters11.501Nucleic Acids Res . 2020 Jan 8;48(D1):D65-D69. doi: 10.1093/nar/gkz1014.
Abstract
The Eukaryotic Promoter Database (EPD), available online at https://epd.epfl.ch, provides accurate transcription start site (TSS) information for promoters of 15 model organisms plus corresponding functional genomics data that can be viewed in a genome browser, queried or analyzed via web interfaces, or exported in standard formats (FASTA, BED, CSV) for subsequent analysis with other tools. Recent work has focused on the improvement of the EPD promoter viewers, which use the UCSC Genome Browser as visualization platform. Thousands of high-resolution tracks for CAGE, ChIP-seq and similar data have been generated and organized into public track hubs. Customized, reproducible promoter views, combining EPD-supplied tracks with native UCSC Genome Browser tracks, can be accessed from the organism summary pages or from individual promoter entries. Moreover, thanks to recent improvements and stabilization of ncRNA gene catalogs, we were able to release promoter collections for certain classes of ncRNAs from human and mouse. Furthermore, we developed automatic computational protocols to assign orphan TSS peaks to downstream genes based on paired-end (RAMPAGE) TSS mapping data, which enabled us to add nearly 9000 new entries to the human promoter collection. Since our last article in this journal, EPD was extended to five more model organisms: rhesus monkey, rat, dog, chicken and Plasmodium falciparum.
启动子(Promotor)在概念上被定义为转录起始位点(TSS)或转录起始区。为了根据实验证据,提供准确的TSS注释,于1986年创建了真核生物启动子数据库EPD(https://epd.epfl.ch)。最初,EPD只是一个手动整理期刊发表结果的数据库,随着二代测序的出现,EPD也开始整合从高通量的转录本作图数据和高质量的基因注释资源中获得的启动子数据,更将数据集范围扩展到了ncRNA的启动子。更新后的数据库于2020年1月发表在知名期刊《Nucleic Acids Rsearch》上。
EPD数据库可以为15种模式生物的启动子提供准确的转录起始位点(TSS)信息以及相应的功能基因组学数据,这些数据可以在基于UCSC的启动子基因组浏览器中查看,通过Web界面进行查询或分析,或者以标准格式(FASTA,BED,CSV)导出随后使用其他工具进行分析。除了适用于EPD涵盖的所有15种生物的通用查看器之外,数据库还提供针对特定细胞类型或组织的专门查看器。
EPD数据库网站首页
从2017年1月至今,EPD整合了针对鸡,狗,大鼠,恒河猴和疟原虫的启动子,发布了人类、小鼠、果蝇和拟南芥等物种的新版本数据集,当前的启动子条目已经整理在下表中。随着疟原虫启动子集合的发布,EPD首次覆盖了人类病原体,这是朝着新方向迈出的重要一步。
EPD涵盖的生物体和相应的启动子总数
相对于EPD中定义的人类编码和非编码基因的TSS,核心启动子基序的频率和位置分布