Permutation-based selection in the melanoma data

[1]:
import os
import pandas as pd
import numpy as np
import anndata as ann

import spatialdm as sdm
from spatialdm.datasets import dataset
import spatialdm.plottings as pl

import matplotlib.pyplot as plt
print("SpatailDM version: %s" %sdm.__version__)
SpatailDM version: 0.0.7

The melanoma dataset from Thrane, et al. was publicly available, and we obtained raw counts, pre-processed log counts, and spatial coordinates from Matt Stone (Git repository). For easier reuse, we included them in an anndata object which can be loaded directly in SpatialDM Python package.

[3]:
adata = dataset.melanoma()
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/anndata/_core/anndata.py:121: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)
The anndata object contains the following:
log-transformed expression in adata.X,
raw expression in adata.raw,
cell types computed by RCTD in adata.obs,
and spatial coordinates in adata.obsm['spatial']

SpatialDM object and preprocessing

We generate a suitable weight matrix for the SpatialDM object at first. Here considering the scale of the spatial coordinates and spot-spot distance (200 micrometers here), we set radial basis kernel parameter l = 1.2, and trimmed all weights < 0.2 (cutoff) to match the normal range of CCC (200 micrometers, 1 spot away from the sender cell here)

Note Alternative to the cutoff parameter, we can set n_neighbors to around 8 to restrain the range of CCC.

[5]:
sdm.weight_matrix(adata, l=1.2, cutoff=0.2, single_cell=False) # weight_matrix by rbf kernel
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/sklearn/utils/validation.py:70: FutureWarning: Pass n_neighbors=6 as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
  warnings.warn(f"Pass {args_msg} as keyword args. From version "

Note

If the scale of the spatial coordinates is larger (e.g. several thousand in the intestinal example) or smaller, or when the spot-spot distance varies (which is common for different platforms), the l can be very different. It’s a crucial step to determine a suitable l to match the biological context. We recommend the following plotting to check the resulting weight matrix from the previous step. If needed, users can run weight_matrix iteratively to decide the most optimal l and cutoff.

[6]:
# visualize range of interaction
import matplotlib.pyplot as plt
plt.figure(figsize=(3,3))

# plt.scatter(adata.obsm['spatial'][:,0], adata.obsm['spatial'][:,1],
#             c=adata.obsp['weight'][50])

plt.scatter(adata.obsm['spatial'][:,0], adata.obsm['spatial'][:,1],
            c=adata.obsp['weight'].A[50])
[6]:
<matplotlib.collections.PathCollection at 0x7fd4b64386d0>
_images/melanoma_10_1.png

Next step is to extract valid LR pairs from the database (by default, we use CellChatDB). Filtering out sparsely expressed ligand or receptor (e.g. 3) can help us get more interpretable results later.

[7]:
sdm.extract_lr(adata, 'human', min_cell=3)      # find overlapping LRs from CellChatDB

Global and local selection (permutation approach)

It is a crucial step to identify dataset-specific interacting LR pairs (global selection) for ensuring quality analysis and reliable interpretation of the putative CCC. With high computation efficiency, SpatialDM can complete global selection in seconds.

[13]:
import time
start = time.time()
sdm.spatialdm_global(adata, 1000, specified_ind=None, method='both', nproc=1)     # global Moran selection
sdm.sig_pairs(adata, method='permutation', fdr=True, threshold=0.1)     # select significant pairs
print("%.3f seconds" %(time.time()-start))
100%|██████████| 1000/1000 [00:06<00:00, 155.27it/s]
8.346 seconds

We used fdr corrected global p-values and a threshold FDR < 0.1 (default) to determine which pairs to be included in the following local identification steps. There are 103 pairs being selected in this data.

[14]:
print(adata.uns['global_res'].selected.sum())
adata.uns['global_res'].sort_values(by='fdr').head()
103
[14]:
Ligand0 Ligand1 Ligand2 Receptor0 Receptor1 Receptor2 z_pval z perm_pval fdr selected
CSPG4_ITGA2_ITGB1 CSPG4 None None ITGA3 ITGB1 None 1.372040e-06 4.689101 0.0 0.0 True
PECAM1_PECAM1 PECAM1 None None PECAM1 None None 2.792191e-52 15.170039 0.0 0.0 True
SEMA3D_NRP2_PLXNA1 SEMA3D None None NRP2 PLXNA1 None 1.267594e-04 3.658679 0.0 0.0 True
HLA-C_CD8A HLA-C None None CD8A None None 7.035725e-07 4.823990 0.0 0.0 True
COL6A1_ITGA1_ITGB1 COL6A1 None None ITGA1 ITGB1 None 8.586635e-06 4.298790 0.0 0.0 True

Local selection is then carried out for the selected 103 pairs to identify where the LRI take place, in a single-spot resolution.

[15]:
start = time.time()
sdm.spatialdm_local(adata, n_perm=1000, method='both', specified_ind=None, nproc=1)     # local spot selection
sdm.sig_spots(adata, method='permutation', fdr=False, threshold=0.1)     # significant local spots
print("%.3f seconds" %(time.time()-start))
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/anndata/_core/raw.py:139: FutureWarning: X.dtype being converted to np.float32 from int64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  return anndata.AnnData(
100%|██████████| 1000/1000 [00:00<00:00, 3934.37it/s]
100%|██████████| 1000/1000 [00:00<00:00, 4645.87it/s]
1.115 seconds

By default, local selection is performed for all selected pairs in the previous step.

Note Apply an array of integer indices of selected pairs to select_num to run local selection in selected pairs

The global and local results are easily accessible through global_res and local_perm_p or local_z_p

Visualize pair(s)

SpatialDM provides plotting utilities for general summary of global results:

[17]:
pl.global_plot(adata, pairs=['SPP1_CD44', 'CSF1_CSF1R', 'VEGFA_VEGFR2', 'ANGPTL4_SDC2'],
               figsize=(6,5), cmap='RdGy_r', vmin=-1.5, vmax=2)
_images/melanoma_25_0.png

Known melanoma-related pairs were all observed in the selected pairs. It would then be meaningful to investigate the spatial context of their interactions.

[18]:
# visualize known melanoma pairs
pl.plot_pairs(adata, ['CSF1_CSF1R', 'VEGFA_VEGFR2', 'CXCL12_CXCR4'], marker='s')
_images/melanoma_27_0.png
_images/melanoma_27_1.png
_images/melanoma_27_2.png

Spatial Clustering of Local Spots

SpatialDE allows clustering of spatially auto-correlated genes. Here, we repurposed SpatialDE to identify spatially auto-correlated interactions by using binary local selection status as input.

[19]:
import NaiveDE
import SpatialDE

Filter out sparse interactions with fewer than 3 identified interacting spots. Cluster into 6 patterns.

[20]:
# SpatialDE code
bin_spots = adata.uns['selected_spots'].astype(int)[adata.uns['local_stat']['n_spots']>2]
print(bin_spots.shape[0], " pairs used for spatial clustering")
103  pairs used for spatial clustering
[21]:
from threadpoolctl import threadpool_limits

with threadpool_limits(limits=1, user_api='blas'):
    results = SpatialDE.run(adata.obsm['spatial'], bin_spots.transpose())

    histology_results, patterns = SpatialDE.aeh.spatial_patterns(
        adata.obsm['spatial'], bin_spots.transpose(), results, C=3, l=3, verbosity=1)
iter 0, ELBO: -2.99e+10
iter 1, ELBO: -1.57e+10, delta_ELBO: 1.42e+10
iter 2, ELBO: -1.57e+10, delta_ELBO: 4.65e+03
iter 3, ELBO: -1.57e+10, delta_ELBO: 1.29e+03
iter 4, ELBO: -1.57e+10, delta_ELBO: 3.45e+00
iter 5, ELBO: -1.57e+10, delta_ELBO: 4.57e+00
iter 6, ELBO: -1.57e+10, delta_ELBO: 1.31e+00
iter 7, ELBO: -1.57e+10, delta_ELBO: 1.17e+00
iter 8, ELBO: -1.57e+10, delta_ELBO: 3.26e+01
iter 9, ELBO: -1.57e+10, delta_ELBO: 2.56e+01
iter 10, ELBO: -1.57e+10, delta_ELBO: 2.93e-01
iter 11, ELBO: -1.57e+10, delta_ELBO: 4.65e-04
Converged on iter 11
[22]:
plt.figure(figsize=(9,8))
for i in range(3):
    plt.subplot(2, 2, i + 2)
    plt.scatter(adata.obsm['spatial'][:,0], adata.obsm['spatial'][:,1], marker = 's',c=patterns[i], s=35);
    plt.axis('equal')
    pl.plt_util('Pattern {} - {} genes'.format(i, histology_results.query('pattern == @i').shape[0] ))
plt.savefig('mel_DE_clusters.pdf')
_images/melanoma_34_0.png

Note For detail parameter settings like C and l, please refer to SpatialDE tutorial

SpatialDM provides compute_pathway function to group a list of interactions based on the pathways they belong. The input can be a dictionary of several lists.

[23]:
dic=dict()
for i in histology_results.sort_values('pattern').pattern.unique():
    dic['Pattern_{}'.format(i)]=histology_results.query('pattern == @i').sort_values('membership')['g'].values

Save the anndata file to retrieve later

[25]:
adata_out = adata.copy()
adata_out.uns['local_stat']['local_permI'] = 'None'
adata_out.uns['local_stat']['local_permI_R'] = 'None'

sdm.write_spatialdm_h5ad(adata_out, filename='mel_sdm_out.h5ad')
# sdm.read_spatialdm_h5ad(filename='mel_sdm_out.h5ad')

In the dot plot of the lymphoid-associated pattern, we identified CD23 and other immune-related pathways.

[28]:
pl.dot_path(adata, dic=dic, figsize=(6,12))
_images/melanoma_41_0.png
_images/melanoma_41_1.png
_images/melanoma_41_2.png

SpatialDM provides chord_celltype and other chord diagrams to visualize the aggregated cell types (or interactions)

[30]:
adata.obsm['celltypes'] = adata.obs[adata.obs.columns]
pl.chord_celltype(adata, pairs=['FCER2A_CR2'], ncol=1, min_quantile=0.01)
# save='mel_FCER2A_CR2_ct.svg'
[30]:
Column(
id = '1110', …)

alternative DE input with Moran R values

[16]:
# local_sum = adata.uns['selected_spots'].copy()
# local_sum=local_sum.transpose()
# local_sum[local_sum.columns] =(adata.uns['local_stat']['local_I'] + adata.uns['local_stat']['local_I'])

# results = SpatialDE.run(adata.obsm['spatial'], local_sum)

# histologyesults, patterns = SpatialDE.aeh.spatial_patterns(adata.obsm['spatial'], local_sum,
#                                                              results, C=3, l=5,
#                                                              verbosity=1)
# plt.figure(figsize=(9,8))
# for i in range(3):
#     plt.subplot(2, 2, i + 1)
#     plt.scatter(adata.obsm['spatial'][:,0], adata.obsm['spatial'][:,1], marker = 's',c=patterns[i], s=35);
#     plt.axis('equal')
#     pl.plt_util('Pattern {} - {} genes'.format(i, histologyesults.query('pattern == @i').shape[0] ))

Consistency between permutation and z-score approaches

As an alternative to permutation method, we derived the first and second moments of the null distribution to analytically obtain a z-score and its according p-values, which are highly consistent with the permutation method.

[31]:
# global consistency
x = -np.log10(adata.uns['global_res'].z_pval.values)
y = -np.log10(adata.uns['global_res'].perm_pval)

x = np.where(x>3, 3, x)
y = np.where(np.isinf(y), 3, y)
pl.corr_plot(x, y, method='spearman')
plt.xlabel('-log10(p) (z-test)')
plt.ylabel('-log10(p) (permutation)')
plt.title('Global p-value correlation (Permutation vs. z-score)')
plt.savefig('mel_perm_z_corr.pdf')
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/pandas/core/arraylike.py:364: RuntimeWarning: divide by zero encountered in log10
  result = getattr(ufunc, method)(*inputs, **kwargs)
_images/melanoma_48_1.png
[32]:
# local consistency
plt.figure(figsize=(4,4))
x=(adata.uns['local_z_p']<0.1).sum(1).values
y=(adata.uns['local_perm_p']<0.1).sum(1).values
pl.corr_plot(x, y, method='pearson')
plt.title('Correlation of number of selected \nlocal spots (Permutation vs. z-score)')
plt.savefig('mel_local_cor.pdf')
_images/melanoma_49_0.png

Cell type prediction with local Moran

[34]:
from sklearn.linear_model import LinearRegression

Note The prediction results may differ by different annotation accuracy and potentially other factor. Here we generated the cell type decomposition results using RCTD.

[35]:
X = adata.uns['local_perm_p'].values
y=adata.obs.values

reg = LinearRegression().fit(X.T, y)
y_pred = reg.predict(X.T)
pl.corr_plot(np.hstack(y),np.hstack(y_pred), method='pearson')
plt.xlabel('observed cell type weights')
plt.ylabel('predicted cell type weights')
plt.title('linear regression prediction')
# plt.savefig('linear_regression_celltype.pdf')
[35]:
Text(0.5, 1.0, 'linear regression prediction')
_images/melanoma_53_1.png