Mouse breeding
Tissue processing and library construction for scRNA-seq(咨询测序公司)
scRNA-seq data pre-processing
Preprocessing of raw data was initially performed using Cell Ranger software pipeline (version 9.0.1)(Zheng et al. 2017). This step included reads alignment on human genome assembly GRCm39 using STAR(Dobin et al. 2013), counting of unique molecular identifier (UMI) and generating a digital gene-cell matrix. The gene expression matrix was then processed and analyzed by Seurat (version 5.3.1) package(Hao et al. 2024).
For each single-cell RNA-seq data, we created a Seurat object with the CreateSeuratObject function. All Seurat objects were merged to one and filtering steps as follows were applied for the merged data: (1) cells with fewer than 500 detected genes, more than 8000 detected genes, or with UMI counts exceeding 25,000 were excluded. (2) cells with percentage of mitochondrial UMI count larger than 15% were filtered out. (3) the genes detected in more than 3 cells were retained and all ribosomal genes were filtered out. (4)Doublets and multiplets were removed using DoubletFinder (v2.0.3)(McGinnis, Murrow, and Gartner 2019)
After quality control and filtering, each sample was processed separately for normalization and identification of highly variable genes. Gene expression matrices were normalized using Seurat’s NormalizeData function by dividing the UMI counts of each gene by the total UMIs per cell, multiplying by 10,000, and applying a log-transformation (ln(UMI-per-10,000 + 1)). Subsequently, 2,000 highly variable genes were identified in each dataset and used for principal component analysis (PCA). To correct for batch effects across samples, data integration was performed using the Harmony algorithm (Korsunsky et al., 2019) with the RunHarmony function from the Harmony package (v1.2.4) based on the first 40 principal components. The harmonized data were then used to construct a shared nearest-neighbor graph with FindNeighbors (dims = 1:40) and clustered using FindClusters (resolution = 0.8). Finally, nonlinear dimensionality reduction and visualization were conducted using RunUMAP (dims = 1:40, umap.method = “umap-learn”).
cell type annotation
After unsupervised clustering, all major dental-related cell types were identified based on well-established marker genes, i.e., Runx2, Postn, Mfap4 for dental follicle cells; Krt5, Dsc3, Pitx2 for dental epithelium cells; Dlx6os1, Plppr1, Rspo4 for dental papilla cells; Enam, Amelx, Vwde for ameloblasts; Cdh5, Pecam1, Emcn for endothelial cells; Csf1r, Cd68, Fcer1g for immune cells; Rgs5, Pdgfrb, Notch3 for perivascular cells; Plp1, Sox10, Mbp for Schwann cells; and Bglap, Phex, Ifitm5 for odontoblasts. Cluster-specific marker genes were identified by using the FindAllMarkers function with the parameter setting ‘logfc.threshold = 0.25,min.pct=0.1’.
After further rounds of unsupervised subclustering of the major cell clusters, these cell populations were subclustered into functional phenotypes using the same approach as previous described. The subclusters were annotated by selecting a specific signature gene while considering both the observed expression pattern and established conventions.
Signature score calculation
irGSEA(Fan et al. 2024) is an R package designed to assess the outcomes of various gene set scoring methods and facilitate comprehensive visualization of results. We calculated geneSet enrichment scores of cells using the irGSEA.score function and visualized via the irGSEA.density.scatterplot function
Pathway analysis
With the aim of summarizing the process of biological-term classification and the enrichment analysis of pathways, we applied ClusterProfiler(Wu et al. 2021) for GO enrichment of a given gene set. Gene Ontology (GO) annotates genes to biological processes, molecular functions, and cellular components in a directed acyclic graph structure.
DecoupleR (v2.1.4)(Badia-I-Mompel et al. 2022) is an R package containing a collection of methods adapted for biological activity estimation in single-cell data. PROGENy(Holland, Szalai, and Saez-Rodriguez 2020) is a comprehensive resource containing a curated collection of 14 pathways and their target genes, with weights for each interaction. To calculate pathway enrichment scores in different malignant ductal cell subtypes, we run the multivariate linear model (mlm) method using the function run_mlm with default parameters. For each cell in our dataset, it fitted a linear model that predicted the observed gene expression based on pathway-gene interaction weights. Once fitted, the obtained t-values of the slopes were the scores. If it is positive, we interpreted that the pathway is active and if it is negative we interpreted that it is inactive. Then, we calculated the mean scores of cells in each cell subtypes as their pathway enrichment scores
Pseudotime trajectory analysis
Monocle2 (v2.38.0) (Trapnell et al. 2014)was used to infer the state transition of the cell types and subtypes. The unique molecular identifier count matrix of the cells was used to create the CellDataSet object and then filter out the genes expressed in fewer than ten cells. Genes with q < 0.01 were identified as DEGs using the differentialGeneTest function and sorted according to q value using the setOrderingFilter function. The pseudotime trajectory was constructed using the DDRTree algorithm using default parameters. The dynamic expression changes of selected marker genes in pseudotime were visualized using the plot_genes_in_pseudotime and plot_pseudotime_heatmap functions.
Pseudotime analysis was conducted using the R package Monocle3 (v1.4.26)(Cao et al. 2019). The integrated Seurat object, along with the associated metadata, was imported into Monocle3. The new_cell_data_set function was utilized to convert the Seurat object into a Monocle3 object. A total of 50 principal components were computed using the preprocess_cds function, with cells from different groups aligned using the parameters (alignment_group = ‘SampleID’). The UMAP was generated using the reduce_dimension function with parameters (umap.min_dist = 0.1, reduction_method = ‘UMAP’, umap. n_neighbours = 15). Each cell was assigned a pseudotime value using the order_cells function, with the root state determined on the basis of the predicted potency scores calculated by CytoTRACE2 (v1.1.0)(Kang et al. 2025).
RNA velocity and estimation of the fate of cells
All the scRNA-seq data were used for the RNA velocity analysis. Firstly, the spliced and unspliced UMIs were recounted by the python package velocyto(La Manno et al. 2018). Then we normalized the count matrices by the library size and kept the genes detected in over 20 cells in both spliced and unspliced matrices, selected HVGs, log-transformed the normalized spliced and unspliced counts by the scvelo.pp.filter_and_normalize function from scvelo package(v0.2.5)(Bergen et al. 2020). Duplicate cells were moved by the scvelo.pp.remove_duplicate_cells function and moments for velocity estimation was computed by scvelo.pp.moment function with the parameter setting ‘n_pcs=30, n_neighbors=30’. The RNA velocity was estimated using the scvelo.tl.velocity function and the velocity graph was built with the cosine similarity using the scvelo.tl.velocity_graph function. The RNA velocities were projected into the UMAP coordinates with the scvelo.pl.velocity_embedding_stream function for visualization.
To further infer the fate of dental_epithelium cells, the subpopulations were subjected to the partition-based graph abstraction method PAGA using the function scv.tl.paga to further assess the most likely trajectories of cell progression. Different to what was described in the previous part, we run the dynamical model to obtain the full transcriptional dynamics of splicing kinetics. Finally, we estimated fate combining these two methods.
Cellular interaction analysis by CellChat
We used Cellchat(v2.2.0)(Jin, Plikus, and Nie 2025) to infer cell-cell interaction between each two cell subclusters. Initially, a CellChat object was established by grouping defined clusters. The “CellChatDB.mouse” ligand‒receptor interaction database was used for analysis without additional supplementation, and all preprocessing steps were performed using default parameters. The functions computeCommunProb and computeCommunProbPathway were employed to infer the network of each ligand‒receptor pair and signaling pathway individually. Visualization was achieved through a hierarchy plot, circle plot, and heatmap, each serving as a distinct form of visualization.
Badia-I-Mompel, P., Vélez Santiago, J., Braunger, J., Geiss, C., Dimitrov, D., Müller-Dott, S., Taus, P., Dugourd, A., Holland, C. H., Ramirez Flores, R. O., & Saez-Rodriguez, J. (2022). decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinformatics Advances, 2(1). https://doi.org/10.1093/BIOADV/VBAC016
Bergen, V., Lange, M., Peidli, S., Wolf, F. A., & Theis, F. J. (2020). Generalizing RNA velocity to transient cell states through dynamical modeling. Nature Biotechnology, 38(12), 1408–1414. https://doi.org/10.1038/s41587-020-0591-3
Cao, J., Spielmann, M., Qiu, X., Huang, X., Ibrahim, D. M., Hill, A. J., Zhang, F., Mundlos, S., Christiansen, L., Steemers, F. J., Trapnell, C., & Shendure, J. (2019). The single-cell transcriptional landscape of mammalian organogenesis. Nature 2019 566:7745, 566(7745), 496–502. https://doi.org/10.1038/s41586-019-0969-x
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., & Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15–21. https://doi.org/10.1093/BIOINFORMATICS/BTS635
Fan, C., Chen, F., Chen, Y., Huang, L., Wang, M., Liu, Y., Wang, Y., Guo, H., Zheng, N., Liu, Y., Wang, H., & Ma, L. (2024). irGSEA: the integration of single-cell rank-based gene set enrichment analysis. Briefings in Bioinformatics, 25(4). https://doi.org/10.1093/BIB/BBAE243
Hao, Y., Stuart, T., Kowalski, M. H., Choudhary, S., Hoffman, P., Hartman, A., Srivastava, A., Molla, G., Madad, S., Fernandez-Granda, C., & Satija, R. (2024). Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nature Biotechnology, 42(2), 293–304. https://doi.org/10.1038/s41587-023-01767-y
Holland, C. H., Szalai, B., & Saez-Rodriguez, J. (2020). Transfer of regulatory knowledge from human to mouse for functional genomics analysis. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, 1863(6), 194431. https://doi.org/10.1016/J.BBAGRM.2019.194431
Jin, S., Plikus, M. V., & Nie, Q. (2025). CellChat for systematic analysis of cell–cell communication from single-cell transcriptomics. Nature Protocols, 20(1), 180–219. https://doi.org/10.1038/S41596-024-01045-4;SUBJMETA
Kang, M., Gulati, G. S., Brown, E. L., Qi, Z., Avagyan, S., Armenteros, J. J. A., Gleyzer, R., Zhang, W., Steen, C. B., D’Silva, J. P., Schwab, J., Clarke, M. F., Chaudhuri, A. A., & Newman, A. M. (2025). Improved reconstruction of single-cell developmental potential with CytoTRACE 2. Nature Methods 2025 22:11, 22(11), 2258–2263. https://doi.org/10.1038/s41592-025-02857-2
La Manno, G., Soldatov, R., Zeisel, A., Braun, E., Hochgerner, H., Petukhov, V., Lidschreiber, K., Kastriti, M. E., Lönnerberg, P., Furlan, A., Fan, J., Borm, L. E., Liu, Z., van Bruggen, D., Guo, J., He, X., Barker, R., Sundström, E., Castelo-Branco, G., … Kharchenko, P. V. (2018). RNA velocity of single cells. Nature, 560(7719), 494–498. https://doi.org/10.1038/s41586-018-0414-6
McGinnis, C. S., Murrow, L. M., & Gartner, Z. J. (2019). DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Systems, 8(4), 329-337.e4. https://doi.org/10.1016/j.cels.2019.03.003
Trapnell, C., Cacchiarelli, D., Grimsby, J., Pokharel, P., Li, S., Morse, M., Lennon, N. J., Livak, K. J., Mikkelsen, T. S., & Rinn, J. L. (2014). The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotechnology 2014 32:4, 32(4), 381–386. https://doi.org/10.1038/nbt.2859
Wu, T., Hu, E., Xu, S., Chen, M., Guo, P., Dai, Z., Feng, T., Zhou, L., Tang, W., Zhan, L., Fu, X., Liu, S., Bo, X., & Yu, G. (2021). clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation, 2(3). https://doi.org/10.1016/j.xinn.2021.100141
Zheng, G. X. Y., Terry, J. M., Belgrader, P., Ryvkin, P., Bent, Z. W., Wilson, R., Ziraldo, S. B., Wheeler, T. D., McDermott, G. P., Zhu, J., Gregory, M. T., Shuga, J., Montesclaros, L., Underwood, J. G., Masquelier, D. A., Nishimura, S. Y., Schnall-Levin, M., Wyatt, P. W., Hindson, C. M., … Bielas, J. H. (2017). Massively parallel digital transcriptional profiling of single cells. Nature Communications, 8(1), 14049. https://doi.org/10.1038/ncomms14049
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!
