1.【Transcriptomics】秀丽隐杆线虫胚胎发育的单细胞图谱
A lineage-resolved molecular atlas of C. elegans embryogenesis at single cell resolution(CC-BY-ND 4.0)
C. elegans is an animal with few cells, but a striking diversity of cell types. Here, we characterize the molecular basis for their specification by profiling the transcriptomes of 84,625 single embryonic cells. We identify 284 terminal and pre-terminal cell types, mapping most single cell transcriptomes to their exact position in the invariant C. elegans lineage. We use these annotations to perform the first quantitative analysis of the relationship between lineage and the transcriptome for a whole organism. We find that a strong lineage-transcriptome correlation in the early embryo breaks down in the final two cell divisions as cells adopt their terminal fates and that most distinct lineages that produce the same anatomical cell type converge to a homogenous transcriptomic state. Users can explore our data with a graphical application VisCello.
在线数据库简图,原文Fig. S21
2.【Epigenomics】哈佛大学刘小乐对不同转录因子调控范围差异的原因提出新解
Determinants of transcription factor regulatory range
To characterize the genomic distances over which transcription factors (TFs) influence gene expression, we examined thousands of TF and histone modification ChIP-seq datasets and thousands of gene expression profiles. A model integrating these data revealed two classes of TF: one with short-range regulatory influence, the other with long-range regulatory influence. The two TF classes also had distinct chromatin-binding preferences and auto-regulatory properties. The regulatory range of a single TF bound within different topologically associating domains (TADs) depended on intrinsic TAD properties such as local gene density and G/C content, but also on the TAD chromatin state in specific cell types. Our results provide evidence that most TFs belong to one of these two functional classes, and that the regulatory range of long-range TFs is chromatin-state dependent. Thus, consideration of TF type, distance-to-target, and chromatin context is likely important in identifying TF regulatory targets and interpreting GWAS and eQTL SNPs.
3.【Evolution】西班牙学者研究发现酵母中双链均转录的非编码区域有更大可能进化为编码蛋白的基因
Frequent birth of de novo genes in the compact yeast genome(CC-BY-NC-ND 4.0)
Evidence has accumulated that some genes originate directly from previously non-genic sequences, or de novo, rather than by the duplication or fusion of existing genes. However, how de novo genes emerge and eventually become functional is largely unknown. Here we perform the first study on de novo genes that uses transcriptomics data from eleven different yeast species, all grown identically in both rich media and in oxidative stress conditions. The genomes of these species are densely-packed with functional elements, leaving little room for the co-option of genomic sequences into new transcribed loci. Despite this, we find that at least 213 transcripts (~5%) have arisen de novo in the past 20 million years of evolution of baker’s yeast-or approximately 10 new transcripts every million years. Nearly half of the total newly expressed sequences are generated from regions in which both DNA strands are used as templates for transcription, explaining the apparent contradiction between the limited ‘empty’ genomic space and high rate of de novo gene birth. In addition, we find that 40% of these de novo transcripts are actively translated and that at least a fraction of the encoded proteins are likely to be under purifying selection. This study shows that even in very highly compact genomes, de novo transcripts are continuously generated and can give rise to new functional protein-coding genes.
原文图1
4.【Metagenomics】加州大学伯克利分校Jillian Banfield宏基因组测序发现环境中大量被忽视巨型噬菌体
Clades of huge phage from across Earth’s ecosystems
Phage typically have small genomes and depend on their bacterial hosts for replication. DNA sequenced from many diverse ecosystems revealed hundreds of huge phage genomes, between 200 kbp and 716 kbp in length. Thirty-four genomes were manually curated to completion, including the largest phage genomes yet reported. Expanded genetic repertoires include diverse and new CRISPR-Cas systems, tRNAs, tRNA synthetases, tRNA modification enzymes, translation initiation and elongation factors, and ribosomal proteins. Phage CRISPR-Cas systems have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phage may repurpose bacterial CRISPR-Cas systems to eliminate competing phage. We phylogenetically define major clades of huge phage from human and other animal microbiomes, oceans, lakes, sediments, soils and the built environment. We conclude that their large gene inventories reflect a conserved biological strategy, observed over a broad bacterial host range and across Earth’s ecosystems.
5. UK Biobank两篇
5.1 【Genomics】近五万个体外显子组测序和编码区变异分析
Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank(CC-BY-NC-ND 4.0)
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, andIQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 andBRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.
5.2【Genomics】酒精消耗影响伴侣选择?
Alcohol consumption and mate choice in UK Biobank: comparing observational and Mendelian randomization estimates(CC-BY 4.0)
Alcohol use is correlated within spouse-pairs, but it is difficult to disentangle the effects of alcohol consumption on mate-selection from social factors or cohabitation leading to spouses becoming more similar over time. We hypothesised that genetic variants related to alcohol consumption may, via their effect on alcohol behaviour, influence mate selection.Therefore, in a sample of over 47,000 spouse-pairs in the UK Biobank we utilised a well-characterised alcohol related variant, rs1229984 in ADH1B, as a genetic proxy for alcohol use. We compared the phenotypic concordance between spouses for self-reported alcohol use with the association between an individual’s self-reported alcohol use and their partner’s rs1229984 genotype using Mendelian randomization. This was followed up by an exploration of the spousal genotypic concordance for the variant and an analysis determining if relationship length may be related to spousal alcohol behaviour similarities.We found strong evidence that both an individual’s self-reported alcohol consumption and rs1229984 genotype are associated with their partner’s self-reported alcohol use. The Mendelian randomization analysis found that each unit increase in an individual’s weekly alcohol consumption increased their partner’s alcohol consumption by 0.26 units (95% C.I. 0.15, 0.38; P=1.10×10-5). Furthermore, the rs1229984 genotype was concordant within spouse-pairs, suggesting that some spousal concordance for alcohol consumption existed prior to cohabitation. Although the SNP is strongly associated with ancestry, our results suggest that this concordance is unlikely to be explained by population stratification. Overall, our findings suggest that alcohol behaviour directly influences mate selection.
6.【Epigenomics】NIH赵可吉开发染色体修饰测序新技术ACT-seq
Mapping Histone Modifications in Low Cell Number and Single Cells Using Antibody-guided Chromatin Tagmentation (ACT-seq)(CC-BY-NC-ND 4.0)
Modern next-generation sequencing-based methods have empowered researchers to assay the epigenetic states of individual cells. Existing techniques for profiling epigenetic marks in single cells often require the use and optimization of time-intensive procedures such as drop fluidics, chromatin fragmentation, and end repair. Here we describe ACT-seq, a novel and streamlined method for mapping genome-wide distributions of histone tail modifications, histone variants, and chromatin-binding proteins in a small number of or single cells. ACT-seq utilizes a fusion of Tn5 transposase to Protein A that is targeted to chromatin by a specific antibody, allowing chromatin fragmentation and sequence tag insertion specifically at genomic sites presenting the relevant antigen. The Tn5 transposase enables the use of an index multiplexing strategy (iACT-seq), which enables construction of thousands of single-cell libraries in one day by a single researcher without the need for drop-based fluidics or visual sorting. We conclude that ACT-seq present an attractive alternative to existing techniques for mapping epigenetic marks in single cells.
7.【Omics】宾夕法尼亚大学学者绘制261位少儿癌症患者异种移植模型全景图
Genomic landscape of 261 childhood cancer patient-derived xenograft models(CC-BY 4.0)
Accelerating cures for children with cancer remains an immediate challenge due to extensive oncogenic heterogeneity between and within histologies, distinct molecular mechanisms evolving between diagnosis and relapsed disease, and limited therapeutic options. To systematically prioritize and rationally test novel agents in preclinical murine models, researchers within the Pediatric Preclinical Testing Consortium developed over 370 patient-derived xenografts (PDXs) from high-risk childhood cancers, many refractory to current standard-of-care treatments. Here, we genomically characterize 261 PDX models from 29 unique pediatric cancer malignancies and demonstrate faithful recapitulation of histologies, subtypes, and refine our understanding of relapsed disease. Expression and mutational signatures are used to classify tumors for TP53 and NF1 inactivation, as well as impaired DNA repair. We anticipate these data will serve as a resource for pediatric cancer drug development and guide rational clinical trial design for children with cancer.
8.【Omics】PGP-UK:开源人类多组学数据平台
The Personal Genome Project-UK: an open access resource of human multi-omics data(CC-BY-ND 4.0)
Integrative analysis of multi-omics data is a powerful approach for gaining functional insights into biological and medical processes. Conducting these multifaceted analyses on human samples is often complicated by the fact that the raw sequencing output is rarely available under open access. The Personal Genome Project UK (PGP-UK) is one of few resources that recruits its participants under open consent and makes the resulting multi-omics data freely and openly available. As part of this resource, we describe the PGP-UK multi-omics reference panel consisting of ten genomic, methylomic and transcriptomic data. Specifically, we outline the data processing, quality control and validation procedures which were implemented to ensure data integrity and exclude sample mix-ups. In addition, we provide a REST API to facilitate the download of the entire PGP-UK dataset. The data are also available from two cloud-based environments, providing platforms for free integrated analysis. In conclusion, the genotype-validated PGP-UK multi-omics human reference panel described here provides a valuable new open access resource for integrated analyses in support of personal and medical genomics.
原文图1
9.【Epigenomics】香港中文大学钟思林解析三种作物组织特异性三位基因组
Tissue-specific Hi-C analyses of rice, foxtail millet and maize suggest non-canonical function of plant chromatin domains(CC-BY-NC-ND 4.0)
Chromatins are not randomly packaged in the nucleus and their organization plays important roles in transcription regulation. Using in situ Hi-C, we have compared the 3D chromatin architectures of rice mesophyll and endosperm, foxtail millet bundle sheath and mesophyll, and maize bundle sheath, mesophyll and endosperm tissues. We have also profiled their DNA methylation, open chromatin, histone modification and gene expression to investigate whether chromatin structural dynamics are associated with epigenome features changes. We found that plant global A/B compartment partitions are stable across tissues, while local A/B compartment has tissue-specific dynamic that is associated with differential gene expression. Plant domains are largely stable across tissues, while rare domain border changes are often associated with gene activation. Genes inside plant domains are not conserved across species, and lack significant co-expression behavior unlike those in mammalian cells. When comparing synteny gene pairs, we found those maize genes involved in gene island chromatin loops have shorter genomic distances in smaller genomes without gene island loops such as rice and foxtail millet, suggesting that they have conserved functions. Our study revealed that the 3D configuration of the plant chromatin is also complex and dynamic with unique features that need to be further examined.
10.【Transcriptomics】德国慕尼黑大学学者对单细胞RNA测序pipeline的系统评估
A Systematic Evaluation of Single Cell RNA-Seq Analysis Pipelines: Library preparation and normalisation methods have the biggest impact on the performance of scRNA-seq studies(CC-BY-NC-ND 4.0)
The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not been established yet. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ~ 3,000 pipelines, allowing us to also assess interactions among pipeline steps.We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.
Bonus【ChemRxiv; structural bioinformatics】得克萨斯大学任鹏宇对经典位势对分子互作建模提出新观点
AMOEBA+ Classical Potential for Modeling Molecular Interactions(CC BY-NC-ND 4.0)
Classical potentials based on isotropic and additive atomic charges have been widely used to model molecules in computers for the past few decades. The crude approximations in the underlying physics are hindering both their accuracy and transferability across chemical and physical environments. Here we present a new classical potential, AMOEBA+, to capture essential intermolecular forces, including permanent electrostatics, repulsion, dispersion, many-body polarization, short-range charge penetration and charge transfer, by extending the polarizable multipole-based AMOEBA (Atomic Multipole Optimized Energetics for Biomolecular Applications) model. For a set of common organic molecules, we show that AMOEBA+ with general parameters can reproduce both quantum mechanical interactions and energy decompositions according to the Symmetry-Adapted Perturbation Theory (SAPT). Additionally, a new water model developed based on the AMOEBA+ framework captures various liquid phase properties in molecular dynamics simulations while remains consistent with SAPT energy decompositions, utilizing both ab initio data and experimental liquid properties. Our results demonstrate that it is possible to improve the physical basis of classical force fields to advance their accuracy and general applicability.