Differential analyses (z-score approach)

[1]:
import os
import pandas as pd
import numpy as np
import anndata as ann

import spatialdm as sdm
from spatialdm.datasets import dataset
import spatialdm.plottings as pl
print("SpatailDM version: %s" %sdm.__version__)
SpatailDM version: 0.0.7

z-score selection in batch

[2]:
data=["A1","A2","A3","A4","A6","A7","A8","A9"]

The intestine dataset from Corbett, et al. was publicly available on STAR-FINDer, from which we obtained raw counts and spatial coordinates, and log-transformed to normalize on the spot-level. Rawcounts (.raw), logcounts (.X), cell types (.obs), and spatial coordinates (.obsm['spatial']) have been included in the corresponding anndata object which can be loaded directly in SpatialDM Python package.

[4]:
A1 = dataset.A1()
A2 = dataset.A2()
A3 = dataset.A3()
A4 = dataset.A4()
A6 = dataset.A6()
A7 = dataset.A7()
A8 = dataset.A8()
A9 = dataset.A9()

Note

The dataset includes:
1) 2 adult colon samples from the same donor A, with IBD or colon cancer (A1, A2);
2) 2 replicates of 12-PCW TI from fetus donor D (A6, A7);
3) 3 12-PCW colon samples, 2 replicates from donor D (A8, A9), another from donor B (A3);
4) 1 19-PCW colon sample from donor C (A4).
[5]:
samples = [A1,A2,A3,A4,A6,A7,A8,A9]

Considering the scale of the spatial coordinates and spot-spot distance of 100 micrometers, l will be set to 75 here. The parameters here should be determined to match the context of CCC.

[9]:
for adata in samples:
    adata.obsm['spatial'] = adata.obsm['spatial'].values
    sdm.weight_matrix(adata, l=75, cutoff=0.2, single_cell=False) # weight_matrix by rbf kernel
    sdm.extract_lr(adata, 'human', min_cell=3)      # find overlapping LRs from CellChatDB
    sdm.spatialdm_global(adata, 1000, specified_ind=None, method='z-score', nproc=1)     # global Moran selection
    sdm.sig_pairs(adata, method='z-score', fdr=True, threshold=0.1)     # select significant pairs
    sdm.spatialdm_local(adata, n_perm=1000, method='z-score', specified_ind=None, nproc=1)     # local spot selection
    sdm.sig_spots(adata, method='z-score', fdr=False, threshold=0.1)
    print(' done!')
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/sklearn/utils/validation.py:70: FutureWarning: Pass n_neighbors=6 as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
  warnings.warn(f"Pass {args_msg} as keyword args. From version "
100%|██████████| 1000/1000 [09:45<00:00,  1.71it/s]
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/spatialdm/utils.py:130: RuntimeWarning: invalid value encountered in true_divide
  X = X/X.max()
100%|██████████| 1000/1000 [00:14<00:00, 69.16it/s]
100%|██████████| 1000/1000 [00:06<00:00, 162.48it/s]
 done!

filtered non-expressed LR pairs.

Save selection results in batch

differential test on 6 colon samples (adult vs. fetus)

[10]:
from spatialdm.diff_utils import *

First step is to concatenate all selection results, resulting in a p_df storing p-values across each sample before fdr correction, a tf_df indicating whether each pair is selected in each sample, and a zscore_df which stores z-scores. These will be essential for the differential analyses later.

[11]:
concat=concat_obj(samples, data, 'human', 'z-score', fdr=False)
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/anndata/_core/anndata.py:1828: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")
[12]:
concat.uns['p_df'].head()
[12]:
A1 A2 A3 A4 A6 A7 A8 A9
VSIR_IGSF11 0.425453 0.395138 0.598898 0.315757 0.599201 0.327638 0.666588 0.760361
EFNA5_EPHA7 0.510958 0.774411 0.096287 0.039663 0.003395 0.006287 0.058307 0.411695
EFNA5_EPHB2 0.306913 0.104032 0.111116 0.000113 0.079636 0.026080 0.766981 0.515436
EFNB1_EPHA4 0.270509 0.002146 0.019308 0.008792 0.002633 0.008091 0.357686 0.182043
EFNB1_EPHB2 0.428187 0.218473 0.994771 0.010403 0.005470 0.001725 0.217220 0.498809
[13]:
concat.uns['tf_df'].head()
[13]:
A1 A2 A3 A4 A6 A7 A8 A9
VSIR_IGSF11 False False False False False False False False
EFNA5_EPHA7 False False False True True True False False
EFNA5_EPHB2 False False False True False True False False
EFNB1_EPHA4 False True True True True True False False
EFNB1_EPHB2 False False False True True True False False
[14]:
concat.uns['zscore_df'].head()
[14]:
A1 A2 A3 A4 A6 A7 A8 A9
VSIR_IGSF11 0.187962 0.265951 -0.250496 0.479598 -0.251280 0.446444 -0.430511 -0.707466
EFNA5_EPHA7 -0.027471 -0.753451 1.303003 1.754614 2.707015 2.495596 1.569141 0.223187
EFNA5_EPHB2 0.504619 1.258905 1.220614 3.688469 1.407522 1.941812 -0.728942 -0.038703
EFNB1_EPHA4 0.611274 2.855859 2.068248 2.374266 2.790333 2.404807 0.364652 0.907606
EFNB1_EPHB2 0.180993 0.777361 -2.560322 2.311493 2.544579 2.924481 0.781615 0.002986

Whole-interactome clustering revealed the dendrogram relationships that are close to the sample kinship

[15]:
import seaborn as sns
sns.clustermap(1-concat.uns['p_df'])
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/seaborn/matrix.py:649: UserWarning: Clustering large matrix with scipy. Installing `fastcluster` may give better performance.
  warnings.warn(msg)
[15]:
<seaborn.matrix.ClusterGrid at 0x7f9c2b7df8e0>
_images/differential_test_intestine_20_2.png

Next, we subset the 6 colon samples (2 adult vs. 4 fetus for differential analyses.

Use 1 and 0 to label different conditions.

Note z-score differential testing will also support differential test among 3 or more conditions, provided with sufficient samples.

[16]:
conditions = np.hstack((np.repeat([1],2), np.repeat([0],4)))
subset = ['A1', 'A2', 'A3', 'A4', 'A8', 'A9']

By differential_test, likelihood ratio test will be performed, and the differential p-values are stored in uns['p_val'], corresponding fdr values in uns['diff_fdr'], and difference in average zscores in uns['diff'].

[17]:
differential_test(concat, subset, conditions)
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/statsmodels/regression/linear_model.py:903: RuntimeWarning: divide by zero encountered in log
  llf = -nobs2*np.log(2*np.pi) - nobs2*np.log(ssr / nobs) - nobs2
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/spatialdm/diff_utils.py:109: RuntimeWarning: invalid value encountered in double_scalars
  LR_statistic[i] = -2 * (reduced_ll - full_ll)

Later, we can focus on differential pairs within 2 contrasting conditions. By default, pairs with z-score difference greater than 30% quantile and a corrected differential significance (.diff_fdr) smaller than 0.1 are selected, while different parameters can be applied in diff_quantile1, diff_quantile2, and fdr_co, respectively.

differential_volcano allows easy check for target pair(s) in whether they are differential among conditions and in which condition it’s specific.

[18]:
group_differential_pairs(concat, 'adult', 'fetus')
[19]:
pl.differential_dendrogram(concat)
[19]:
<seaborn.matrix.ClusterGrid at 0x7f9871e74f10>
_images/differential_test_intestine_30_1.png
[20]:
pl.differential_volcano(concat, legend=['adult specific', 'fetus specific'])
_images/differential_test_intestine_31_0.png

Note

CEACAM1_CEACAM5 is sparsely identified in A3 in addition to two adult slices, but still considered adult-specific by differential analyses. NRG4 was found in human breast milk, and its oral supplementation can protect against inflammation in the intestine. PLG_F2RL1 is sparsely and exclusively identified in fetal samples with consistent cell type enrichment. Please refer to our manuscript for more discussions.

[27]:
pl.differential_volcano(concat, pairs=['CEACAM1_CEACAM5', 'NRG4_ERBB2_ERBB4', 'PLG_F2RL1'],
        legend=['adult specific', 'fetus specific'], )
_images/differential_test_intestine_33_0.png
It will also be interpretable to analyze condition-specific pairs on a pathway level.
Here we identified an inflammatory microenvironment in adult colon samples and more development-related signatures in the fetal colon samples.
[25]:
pl.dot_path(concat, 'adult_specific', cut_off=2, figsize=(5,15))
_images/differential_test_intestine_35_0.png
[26]:
pl.dot_path(concat, 'fetus_specific', cut_off=2, figsize=(4,8))
_images/differential_test_intestine_36_0.png
EGF pathway is highly enriched in adult samples.
We can use plotting functions to visualize some pairs like TGFA_EGFR and EGF_EGFR
[37]:
A1.uns['geneInter'].loc[(A1.uns['geneInter'].pathway_name=='EGF') \
                          &(A1.uns['geneInter'].index.isin(A1.uns['selected_spots'].index))]
[37]:
interaction_name pathway_name agonist antagonist co_A_receptor co_I_receptor evidence annotation interaction_name_2
EREG_ERBB2_ERBB4 EREG_ERBB2_ERBB4 EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling EREG - (ERBB2+ERBB4)
EREG_EGFR_ERBB2 EREG_EGFR_ERBB2 EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling EREG - (EGFR+ERBB2)
EREG_EGFR EREG_EGFR EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling EREG - EGFR
HBEGF_ERBB2_ERBB4 HBEGF_ERBB2_ERBB4 EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling HBEGF - (ERBB2+ERBB4)
HBEGF_EGFR_ERBB2 HBEGF_EGFR_ERBB2 EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling HBEGF - (EGFR+ERBB2)
HBEGF_EGFR HBEGF_EGFR EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling HBEGF - EGFR
AREG_EGFR_ERBB2 AREG_EGFR_ERBB2 EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling AREG - (EGFR+ERBB2)
AREG_EGFR AREG_EGFR EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling AREG - EGFR
TGFA_EGFR_ERBB2 TGFA_EGFR_ERBB2 EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling TGFA - (EGFR+ERBB2)
TGFA_EGFR TGFA_EGFR EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling TGFA - EGFR
EGF_EGFR_ERBB2 EGF_EGFR_ERBB2 EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling EGF - (EGFR+ERBB2)
EGF_EGFR EGF_EGFR EGF NaN NaN NaN NaN KEGG: hsa04012 Secreted Signaling EGF - EGFR
[38]:
# visualize 3 pairs in a pattern
pl.plot_pairs(A1, ['TGFA_EGFR','EGF_EGFR'], cmap='Wistia')
_images/differential_test_intestine_39_0.png
_images/differential_test_intestine_39_1.png
[41]:
# double check if the last column is cell type or not
A1.obsm['celltypes'] = A1.obs[A1.obs.columns[:-1]]

pl.chord_celltype(A1, pairs=['TGFA_EGFR','EGF_EGFR'], ncol=2, min_quantile=0.01)
# save='A1_EGF.svg'
[41]:
Column(
id = '1429', …)

Note

MAPK and RAL are reported upstream and downstream effectors of EGF, respectively. We examine if they express at the identified spots Please refer to our manuscript for more discussions.

[42]:
A1.obs['AREG_selected'] = np.where(A1.uns['selected_spots'].loc['AREG_EGFR'].values, 'selected','nonselected')
[43]:
import scanpy as sc
sc.pl.violin(A1, ['MAPK3','RALA', 'RALB'], 'AREG_selected', stripplot=False)
_images/differential_test_intestine_43_0.png

Cross-replicate consistency

Thanks to the multi-replicate setting, we could validate the consistency between biological / technical replicates

Global consistency

[29]:
I_df = pd.DataFrame(pd.Series(sample.uns['global_I'], index=sample.uns['global_res'].index) for sample in samples)
I_df = I_df.transpose()
I_df.columns = data

Extract the Global R from each replicate.

[30]:
from hilearn import *
_df = I_df.loc[:, ['A1', 'A2']].dropna()
corr_plot(_df.A1.values, _df.A2.values)
plt.xlabel('A1 (adult rep 1)')
plt.ylabel('A2 (adult rep 2)')
plt.title('Moran\'s I between replicates')
[30]:
Text(0.5, 1.0, "Moran's I between replicates")
_images/differential_test_intestine_49_1.png
[31]:
_df = I_df.loc[:, ['A8', 'A9']].dropna()
corr_plot(_df.A8.values, _df.A9.values)
plt.xlabel('A8 (fetus rep 1)')
plt.ylabel('A9 (fetus rep 2)')
plt.title('Moran\'s I between replicates')
[31]:
Text(0.5, 1.0, "Moran's I between replicates")
_images/differential_test_intestine_50_1.png
[32]:
_df = I_df.loc[:, ['A2', 'A9']].dropna()
corr_plot(_df.A2.values, _df.A9.values)
plt.xlabel('A2 (adult rep 1)')
plt.ylabel('A9 (fetus rep 2)')
plt.title('Moran\'s I between non-biological replicates')
[32]:
Text(0.5, 1.0, "Moran's I between non-biological replicates")
_images/differential_test_intestine_51_1.png

From the above results, Global R is consistent between replicates.

Local consistency & selected cell type consistency

We can also assess the local selection consistency by comparing the cell type compositions of selected spots

[33]:
fetus_celltypes = A3.obs.columns
PLG_df = pd.DataFrame(fetus_celltypes, index=fetus_celltypes)

PLG_F2RL1 is sparsely detected in fetus samples, it will be interesting to see whether such selections are consistent.

[34]:
pair='PLG_F2RL1'
for d,sample in zip(data,samples):
    sample.celltype = locals()['{}'.format(d)].obs
    if pair in sample.uns['local_z_p'].index:
        ct=sample.celltype[sample.uns['local_z_p'].loc[pair].values<0.1].sum(0).sort_values(ascending=False)
        ct=ct[ct>0]
        PLG_df['{}'.format(d)]=ct
    else:
        PLG_df['{}'.format(d)] = 0

Group spot weights by selecte cell type vs other cell types

[35]:
PLG_df.pop(0)
other=PLG_df.loc[~PLG_df.index.isin(['Distal Enterocytes','Distal Mature Enterocytes'])].sum(0)
df =pd.concat((PLG_df.loc[PLG_df.index.isin(['Distal Enterocytes','Distal Mature Enterocytes'])],
    pd.DataFrame(other, columns=['other cell types']).transpose()))

df = df.transpose()
df['sample']=df.index

PLG_F2RL1 is enriched in Enterocytes in 12-PCW colons, but not found in 19-PCW and adult samples. The two 12-PCW TI samples also have consistent PLG_F2RL1 enrichment, but the signaling celltypes are different from colon samples.

[36]:
plt.figure()
ax=df.sort_index().plot(x='sample',
        kind='bar',
        stacked=False,
        title='cell type weights in PLG_F2RL1 selected spot',
        color={'Distal Mature Enterocytes': [0.5       , 1.        , 0.65808859, 1.        ],
              'Distal Enterocytes': [0.06331813, 0.62857143, 0.        , 1.        ],
#                'neuron':[0.99848342, 0.875     , 0.99522066, 1.        ],
              'other cell types': 'grey'})
ax.set_xticklabels(df.sort_index().index, rotation=0)
[36]:
[Text(0, 0, 'A1'),
 Text(1, 0, 'A2'),
 Text(2, 0, 'A3'),
 Text(3, 0, 'A4'),
 Text(4, 0, 'A6'),
 Text(5, 0, 'A7'),
 Text(6, 0, 'A8'),
 Text(7, 0, 'A9')]
<Figure size 432x288 with 0 Axes>
_images/differential_test_intestine_61_2.png