Visium HD空间数据分析、可视化以及整合 (2)
## 引言Visium HD数据是通过在2微米 x 2微米的网格中标记的、具有空间模式的寡核苷酸生成的。由于这种高分辨率下的数据较为稀疏,因此相邻的网格会被合并,形成8微米和16微米的分辨率。虽然10x推荐使用8微米网格的数据进行分析,但Seurat允许同时加载多种不同分辨率的数据,并将它们存储在一个对象中作为多个检测。
在[本文](https://satijalab.org/seurat/articles/visiumhd_analysis_vignette "Source")中,概述了Seurat支持的一些空间分析工作流程,特别是针对Visium HD数据的分析,包括:
- 无监督聚类
- 识别空间组织区域
- 选择特定的空间区域进行分析
- 与单细胞RNA测序(scRNA-seq)数据的整合
- 对比不同细胞类型在空间上的分布
分析重点是一个来自小鼠大脑的Visium HD数据集。
## 选取特定解剖区域进行分析
研究人员可能需要对特定区域进行细分或提取,以便于进行更深入的分析。例如,我们这里利用 CreateSegmentation 函数根据坐标信息创建了一个分割掩模,它能够从整个数据集中区分出大脑皮层和海马区域,接着通过 Overlay 函数来确定位于这些特定区域的细胞。
相关的坐标数据可以下载获取,研究人员可以利用 SpatialDimPlot 函数并设置 interactive=TRUE 参数,在探索自己的数据集时动态识别这些解剖边界。
```R
cortex.coordinates <- as.data.frame(read.csv("/brahms/lis/visium_hd/final_mouse/cortex-hippocampus_coordinates.csv"))
cortex <- CreateSegmentation(cortex.coordinates)
object[["cortex"]] <- Overlay(object[["slice1.008um"]], cortex)
cortex <- subset(object, cells = Cells(object[["cortex"]]))
```
## 整合单细胞RNA测序数据
Seurat v5 版本新增了对鲁棒细胞类型分解(Robust Cell Type Decomposition, RCTD)的支持,这是一种在提供单细胞RNA测序数据作为参照时,用于从空间数据集中解析点级别数据的计算方法。RCTD已被证实能够为包括SLIDE-seq、Visium以及10x Xenium原位空间平台在内的多种技术来源的空间数据提供准确的细胞类型注释。在Visium HD数据上,RCTD同样展现出了良好的性能。
要执行RCTD分析,我们首先需要从GitHub上安装spacexr软件包,该包内嵌了RCTD功能。在实际操作RCTD时,我们按照RCTD用户指南中的步骤进行。
```R
if (!requireNamespace("spacexr", quietly = TRUE)) {
devtools::install_github("dmcable/spacexr", build_vignettes = FALSE)
}
library(spacexr)
```
鲁棒细胞类型分解(RCTD)技术使用单细胞RNA测序数据集作为参照,并以空间数据集作为待查询对象。在此,我们选用了来自Allen Brain Atlas的小鼠单细胞RNA测序数据集作为参照,该数据集可以在这里下载。为了确保分析的可行性,参照的单细胞RNA测序数据集已被精简至20万个细胞,同时移除了那些数量少于25个的稀有细胞类型。
我们以皮层的Visium HD数据作为空间数据查询对象。为了提高计算效率,我们对空间查询数据集进行了草图化处理,然后运用RCTD技术对这些草图化的皮层细胞进行去卷积分析并进行细胞类型注释,最终将这些注释结果映射回整个皮层数据集。
```R
# sketch the cortical subset of the Visium HD dataset
DefaultAssay(cortex) <- "Spatial.008um"
cortex <- FindVariableFeatures(cortex)
cortex <- SketchData(
object = cortex,
ncells = 50000,
method = "LeverageScore",
sketched.assay = "sketch"
)
DefaultAssay(cortex) <- "sketch"
cortex <- ScaleData(cortex)
cortex <- RunPCA(cortex, assay = "sketch", reduction.name = "pca.cortex.sketch", verbose = T)
cortex <- FindNeighbors(cortex, reduction = "pca.cortex.sketch", dims = 1:50)
cortex <- RunUMAP(cortex, reduction = "pca.cortex.sketch", reduction.name = "umap.cortex.sketch", return.model = T, dims = 1:50, verbose = T)
# load in the reference scRNA-seq dataset
ref <- readRDS("/brahms/satijar/allen_scRNAseq_ref.Rds")
Idents(ref) <- "subclass_label"
counts <- ref[["RNA"]]$counts
cluster <- as.factor(ref$subclass_label)
nUMI <- ref$nCount_RNA
levels(cluster) <- gsub("/", "-", levels(cluster))
cluster <- droplevels(cluster)
# create the RCTD reference object
reference <- Reference(counts, cluster, nUMI)
counts_hd <- cortex[["sketch"]]$counts
cortex_cells_hd <- colnames(cortex[["sketch"]])
coords <- GetTissueCoordinates(cortex)
# create the RCTD query object
query <- SpatialRNA(coords, counts_hd, colSums(counts_hd))
# run RCTD
RCTD <- create.RCTD(query, reference, max_cores = 28)
RCTD <- run.RCTD(RCTD, doublet_mode = "doublet")
# add results back to Seurat object
cortex <- AddMetaData(cortex, metadata = RCTD@results$results_df)
# project RCTD labels from sketched cortical cells to all cortical cells
cortex$first_type <- as.character(cortex$first_type)
cortex$first_type <- "Unknown"
cortex <- ProjectData(
object = cortex,
assay = "Spatial.008um",
full.reduction = "pca.cortex",
sketched.assay = "sketch",
sketched.reduction = "pca.cortex.sketch",
umap.model = "umap.cortex.sketch",
dims = 1:50,
refdata = list(full_first_type = "first_type")
)
DefaultAssay(object) <- "Spatial.008um"
# we only ran RCTD on the cortical cells
# set labels to all other cells as "Unknown"
object[[]][, "full_first_type"] <- "Unknown"
object$full_first_type <- cortex$full_first_type
Idents(object) <- "full_first_type"
# now we can spatially map the location of any scRNA-seq cell type
# start with Layered (starts with L), excitatory neurons in the cortex
cells <- CellsByIdentities(object)
excitatory_names <- sort(grep("^L.* CTX", names(cells), value = TRUE))
p <- SpatialDimPlot(object, cells.highlight = cells, cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T, ncol = 4)
p
```
![](https://s2.loli.net/2024/07/15/cHPRYJ7FjAwCfNE.png)
我们现在可以寻找各个 bin 的 scRNA-seq 标签与其组织域身份(由 BANKSY 分配)之间的关联。通过询问兴奋性神经元细胞属于哪些域,我们可以将 BANKSY 簇重命名为神经元层:
```R
plot_cell_types <- function(data, label) {
p <- ggplot(data, aes(x = get(label), y = n, fill = full_first_type)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(aes(label = ifelse(n >= min_count_to_show_label, full_first_type, "")), position = position_stack(vjust = 0.5), size = 2) +
xlab(label) +
ylab("# of Spots") +
ggtitle(paste0("Distribution of Cell Types across ", label)) +
theme_minimal()
}
cell_type_banksy_counts <- object[[]] %>%
dplyr::filter(full_first_type %in% excitatory_names) %>%
dplyr::count(full_first_type, banksy_cluster)
min_count_to_show_label <- 20
p <- plot_cell_types(cell_type_banksy_counts, "banksy_cluster")
p
```
![](https://s2.loli.net/2024/07/15/BfkutGqoPadQ23E.png)
根据该图,我们现在可以将细胞(即使它们不是兴奋性神经元)分配给各个神经元层。
```R
Idents(object) <- "banksy_cluster"
object$layer_id <- "Unknown"
object$layer_id <- "Layer 2/3"
object$layer_id <- "Layer 4"
object$layer_id <- "Layer 5"
object$layer_id <- "Layer 6"
```
最终,我们能够展示不同细胞类型在空间上的分布情况,并探究它们主要分布在大脑皮层的哪些层级。例如,与兴奋性神经元不同,皮层中的抑制性(GABA能)中间神经元并不是只局限于特定的皮层层级,尽管它们确实表现出一定的分布倾向。
在之前对STARmap数据的研究中,我们发现与先前研究结果一致,SST和PV类型的中间神经元主要分布在第4至第6层,而VIP和Lamp5类型的中间神经元则更多地出现在第2/3层。这些发现是基于一种能够捕捉单细胞特征的原位成像技术。现在,我们想要探究在基于Visium HD技术的点阵数据中,是否能够得到一致的结论。
```R
# set ID to RCTD label
Idents(object) <- "full_first_type"
# Visualize distribution of 4 interneuron subtypes
inhibitory_names <- c("Sst", "Pvalb", "Vip", "Lamp5")
cell_ids <- CellsByIdentities(object, idents = inhibitory_names)
p <- SpatialDimPlot(object, cells.highlight = cell_ids, cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T, ncol = 4)
p
```
![](https://s2.loli.net/2024/07/15/jvOlTSgrxEPtKwI.png)
```R
# create barplot to show proportions of cell types of interest
layer_table <- table(object$full_first_type, object$layer_id)
neuron_props <- reshape2::melt(prop.table(layer_table), margin = 1)
ggplot(neuron_props, aes(x = Var1, y = value, fill = Var2)) +
geom_bar(stat = "identity", position = "fill") +
labs(x = "Cell type", y = "Proportion", fill = "Layer") +
theme_classic()
```
![](https://s2.loli.net/2024/07/15/r8aIes3DiCEjT7z.png)
在 Visium HD 数据集中,我们重现了之前通过原位成像技术所识别的结果。这证实了 Visium HD 的 8um 分辨率网格,尽管不等同于真正的单细胞分辨率,但仍然能够准确捕捉到 scRNA-seq 技术定义的细胞类型。我们强烈推荐用户对任何不符合预期或令人惊讶的生物学结果进行额外的验证。
## 无监督聚类分析
以小鼠肠道为例, 我们通过一个简短的示例,展示了在小鼠小肠(FFPE)的第二个 Visium HD 数据集上应用草图聚类工作流程的过程。我们识别了不同的细胞聚类,展示了它们在空间上的分布,并报告了每个聚类中表达量最高的基因标记。
```R
localdir <- "/brahms/lis/visium_hd/Visium_HD_Public_Data/HD_public_data/Visium_HD_Mouse_Small_Intestine/outs"
object <- Load10X_Spatial(data.dir = localdir, bin.size = 8)
DefaultAssay(object) <- "Spatial.008um"
object <- NormalizeData(object)
object <- FindVariableFeatures(object)
object <- ScaleData(object)
object <- SketchData(
object = object,
ncells = 50000,
method = "LeverageScore",
sketched.assay = "sketch"
)
DefaultAssay(object) <- "sketch"
object <- FindVariableFeatures(object)
object <- ScaleData(object)
object <- RunPCA(object, assay = "sketch", reduction.name = "pca.sketch")
object <- FindNeighbors(object, assay = "sketch", reduction = "pca.sketch", dims = 1:50)
object <- FindClusters(object, cluster.name = "seurat_cluster.sketched", resolution = 3)
object <- RunUMAP(object, reduction = "pca.sketch", reduction.name = "umap.sketch", return.model = T, dims = 1:50)
object <- ProjectData(
object = object,
assay = "Spatial.008um",
full.reduction = "full.pca.sketch",
sketched.assay = "sketch",
sketched.reduction = "pca.sketch",
umap.model = "umap.sketch",
dims = 1:50,
refdata = list(seurat_cluster.projected = "seurat_cluster.sketched")
)
Idents(object) <- "seurat_cluster.projected"
DefaultAssay(object) <- "Spatial.008um"
p1 <- DimPlot(object, reduction = "umap.sketch", label = F) + theme(legend.position = "bottom")
p2 <- SpatialDimPlot(object, label = F) + theme(legend.position = "bottom")
p1 | p2
```
![](https://s2.loli.net/2024/07/15/mS73Lhol2kx8zrQ.png)
我们分别可视化每个簇的位置:
```R
Idents(object) <- "seurat_cluster.projected"
cells <- CellsByIdentities(object, idents = c(1, 5, 18, 26))
p <- SpatialDimPlot(object, cells.highlight = cells, cols.highlight = c("#FFFF00", "grey50"), facet.highlight = T, combine = T) + NoLegend()
p
```
![](https://s2.loli.net/2024/07/15/gUTdAtPOZBG52Mw.png)
```R
DefaultAssay(object) <- "Spatial.008um"
Idents(object) <- "seurat_cluster.projected"
object_subset <- subset(object, cells = Cells(object[["Spatial.008um"]]), downsample = 1000)
DefaultAssay(object_subset) <- "Spatial.008um"
Idents(object_subset) <- "seurat_cluster.projected"
object_subset <- BuildClusterTree(object_subset, assay = "Spatial.008um", reduction = "full.pca.sketch", reorder = T)
markers <- FindAllMarkers(object_subset, assay = "Spatial.008um", only.pos = TRUE)
markers %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 1) %>%
slice_head(n = 5) %>%
ungroup() -> top5
object_subset <- ScaleData(object_subset, assay = "Spatial.008um", features = top5$gene)
p <- DoHeatmap(object_subset, assay = "Spatial.008um", features = top5$gene, size = 2.5) + theme(axis.text = element_text(size = 5.5)) + NoLegend()
p
```
![](https://s2.loli.net/2024/07/15/lWyQARLfib5q2xe.png)
页:
[1]