Differential analyses (z-score approach)

[1]:

import os
import pandas as pd
import numpy as np
import anndata as ann

import spatialdm as sdm
from spatialdm.datasets import dataset
import spatialdm.plottings as pl
print("SpatailDM version: %s" %sdm.__version__)

SpatailDM version: 0.0.7

z-score selection in batch

[2]:

data=["A1","A2","A3","A4","A6","A7","A8","A9"]

The intestine dataset from Corbett, et al. was publicly available on STAR-FINDer, from which we obtained raw counts and spatial coordinates, and log-transformed to normalize on the spot-level. Rawcounts (.raw), logcounts (.X), cell types (.obs), and spatial coordinates (.obsm['spatial']) have been included in the corresponding anndata object which can be loaded directly in SpatialDM Python package.

[4]:

A1 = dataset.A1()
A2 = dataset.A2()
A3 = dataset.A3()
A4 = dataset.A4()
A6 = dataset.A6()
A7 = dataset.A7()
A8 = dataset.A8()
A9 = dataset.A9()

Note

The dataset includes:
1) 2 adult colon samples from the same donor A, with IBD or colon cancer (A1, A2);
2) 2 replicates of 12-PCW TI from fetus donor D (A6, A7);
3) 3 12-PCW colon samples, 2 replicates from donor D (A8, A9), another from donor B (A3);
4) 1 19-PCW colon sample from donor C (A4).

[5]:

samples = [A1,A2,A3,A4,A6,A7,A8,A9]

Considering the scale of the spatial coordinates and spot-spot distance of 100 micrometers, l will be set to 75 here. The parameters here should be determined to match the context of CCC.

[9]:

for adata in samples:
    adata.obsm['spatial'] = adata.obsm['spatial'].values
    sdm.weight_matrix(adata, l=75, cutoff=0.2, single_cell=False) # weight_matrix by rbf kernel
    sdm.extract_lr(adata, 'human', min_cell=3)      # find overlapping LRs from CellChatDB
    sdm.spatialdm_global(adata, 1000, specified_ind=None, method='z-score', nproc=1)     # global Moran selection
    sdm.sig_pairs(adata, method='z-score', fdr=True, threshold=0.1)     # select significant pairs
    sdm.spatialdm_local(adata, n_perm=1000, method='z-score', specified_ind=None, nproc=1)     # local spot selection
    sdm.sig_spots(adata, method='z-score', fdr=False, threshold=0.1)
    print(' done!')

/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/sklearn/utils/validation.py:70: FutureWarning: Pass n_neighbors=6 as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
  warnings.warn(f"Pass {args_msg} as keyword args. From version "
100%|██████████| 1000/1000 [09:45<00:00,  1.71it/s]
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/spatialdm/utils.py:130: RuntimeWarning: invalid value encountered in true_divide
  X = X/X.max()
100%|██████████| 1000/1000 [00:14<00:00, 69.16it/s]
100%|██████████| 1000/1000 [00:06<00:00, 162.48it/s]

 done!

filtered non-expressed LR pairs.

Save selection results in batch

differential test on 6 colon samples (adult vs. fetus)

[10]:

from spatialdm.diff_utils import *

First step is to concatenate all selection results, resulting in a p_df storing p-values across each sample before fdr correction, a tf_df indicating whether each pair is selected in each sample, and a zscore_df which stores z-scores. These will be essential for the differential analyses later.

[11]:

concat=concat_obj(samples, data, 'human', 'z-score', fdr=False)

/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/anndata/_core/anndata.py:1828: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")

[12]:

concat.uns['p_df'].head()

[12]:

	A1	A2	A3	A4	A6	A7	A8	A9
VSIR_IGSF11	0.425453	0.395138	0.598898	0.315757	0.599201	0.327638	0.666588	0.760361
EFNA5_EPHA7	0.510958	0.774411	0.096287	0.039663	0.003395	0.006287	0.058307	0.411695
EFNA5_EPHB2	0.306913	0.104032	0.111116	0.000113	0.079636	0.026080	0.766981	0.515436
EFNB1_EPHA4	0.270509	0.002146	0.019308	0.008792	0.002633	0.008091	0.357686	0.182043
EFNB1_EPHB2	0.428187	0.218473	0.994771	0.010403	0.005470	0.001725	0.217220	0.498809

[13]:

concat.uns['tf_df'].head()

[13]:

	A1	A2	A3	A4	A6	A7	A8	A9
VSIR_IGSF11	False	False	False	False	False	False	False	False
EFNA5_EPHA7	False	False	False	True	True	True	False	False
EFNA5_EPHB2	False	False	False	True	False	True	False	False
EFNB1_EPHA4	False	True	True	True	True	True	False	False
EFNB1_EPHB2	False	False	False	True	True	True	False	False

[14]:

concat.uns['zscore_df'].head()

[14]:

	A1	A2	A3	A4	A6	A7	A8	A9
VSIR_IGSF11	0.187962	0.265951	-0.250496	0.479598	-0.251280	0.446444	-0.430511	-0.707466
EFNA5_EPHA7	-0.027471	-0.753451	1.303003	1.754614	2.707015	2.495596	1.569141	0.223187
EFNA5_EPHB2	0.504619	1.258905	1.220614	3.688469	1.407522	1.941812	-0.728942	-0.038703
EFNB1_EPHA4	0.611274	2.855859	2.068248	2.374266	2.790333	2.404807	0.364652	0.907606
EFNB1_EPHB2	0.180993	0.777361	-2.560322	2.311493	2.544579	2.924481	0.781615	0.002986

Whole-interactome clustering revealed the dendrogram relationships that are close to the sample kinship

[15]:

import seaborn as sns
sns.clustermap(1-concat.uns['p_df'])

/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/seaborn/matrix.py:649: UserWarning: Clustering large matrix with scipy. Installing `fastcluster` may give better performance.
  warnings.warn(msg)

[15]:

<seaborn.matrix.ClusterGrid at 0x7f9c2b7df8e0>

_images/differential_test_intestine_20_2.png

Next, we subset the 6 colon samples (2 adult vs. 4 fetus for differential analyses.

Use 1 and 0 to label different conditions.

Note z-score differential testing will also support differential test among 3 or more conditions, provided with sufficient samples.

[16]:

conditions = np.hstack((np.repeat([1],2), np.repeat([0],4)))
subset = ['A1', 'A2', 'A3', 'A4', 'A8', 'A9']

By differential_test, likelihood ratio test will be performed, and the differential p-values are stored in uns['p_val'], corresponding fdr values in uns['diff_fdr'], and difference in average zscores in uns['diff'].

[17]:

differential_test(concat, subset, conditions)

/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/statsmodels/regression/linear_model.py:903: RuntimeWarning: divide by zero encountered in log
  llf = -nobs2*np.log(2*np.pi) - nobs2*np.log(ssr / nobs) - nobs2
/home/yoyo/miniconda2/envs/CC/lib/python3.9/site-packages/spatialdm/diff_utils.py:109: RuntimeWarning: invalid value encountered in double_scalars
  LR_statistic[i] = -2 * (reduced_ll - full_ll)

Later, we can focus on differential pairs within 2 contrasting conditions. By default, pairs with z-score difference greater than 30% quantile and a corrected differential significance (.diff_fdr) smaller than 0.1 are selected, while different parameters can be applied in diff_quantile1, diff_quantile2, and fdr_co, respectively.

differential_volcano allows easy check for target pair(s) in whether they are differential among conditions and in which condition it’s specific.

[18]:

group_differential_pairs(concat, 'adult', 'fetus')

[19]:

pl.differential_dendrogram(concat)

[19]:

<seaborn.matrix.ClusterGrid at 0x7f9871e74f10>

_images/differential_test_intestine_30_1.png

[20]:

pl.differential_volcano(concat, legend=['adult specific', 'fetus specific'])

_images/differential_test_intestine_31_0.png

Note

CEACAM1_CEACAM5 is sparsely identified in A3 in addition to two adult slices, but still considered adult-specific by differential analyses. NRG4 was found in human breast milk, and its oral supplementation can protect against inflammation in the intestine. PLG_F2RL1 is sparsely and exclusively identified in fetal samples with consistent cell type enrichment. Please refer to our manuscript for more discussions.

[27]:

pl.differential_volcano(concat, pairs=['CEACAM1_CEACAM5', 'NRG4_ERBB2_ERBB4', 'PLG_F2RL1'],
        legend=['adult specific', 'fetus specific'], )

_images/differential_test_intestine_33_0.png

It will also be interpretable to analyze condition-specific pairs on a pathway level.

Here we identified an inflammatory microenvironment in adult colon samples and more development-related signatures in the fetal colon samples.

[25]:

pl.dot_path(concat, 'adult_specific', cut_off=2, figsize=(5,15))

_images/differential_test_intestine_35_0.png

[26]:

pl.dot_path(concat, 'fetus_specific', cut_off=2, figsize=(4,8))

_images/differential_test_intestine_36_0.png

EGF pathway is highly enriched in adult samples.

We can use plotting functions to visualize some pairs like TGFA_EGFR and EGF_EGFR

[37]:

A1.uns['geneInter'].loc[(A1.uns['geneInter'].pathway_name=='EGF') \
                          &(A1.uns['geneInter'].index.isin(A1.uns['selected_spots'].index))]

[37]:

	interaction_name	pathway_name	agonist	antagonist	co_A_receptor	co_I_receptor	evidence	annotation	interaction_name_2
EREG_ERBB2_ERBB4	EREG_ERBB2_ERBB4	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	EREG - (ERBB2+ERBB4)
EREG_EGFR_ERBB2	EREG_EGFR_ERBB2	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	EREG - (EGFR+ERBB2)
EREG_EGFR	EREG_EGFR	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	EREG - EGFR
HBEGF_ERBB2_ERBB4	HBEGF_ERBB2_ERBB4	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	HBEGF - (ERBB2+ERBB4)
HBEGF_EGFR_ERBB2	HBEGF_EGFR_ERBB2	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	HBEGF - (EGFR+ERBB2)
HBEGF_EGFR	HBEGF_EGFR	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	HBEGF - EGFR
AREG_EGFR_ERBB2	AREG_EGFR_ERBB2	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	AREG - (EGFR+ERBB2)
AREG_EGFR	AREG_EGFR	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	AREG - EGFR
TGFA_EGFR_ERBB2	TGFA_EGFR_ERBB2	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	TGFA - (EGFR+ERBB2)
TGFA_EGFR	TGFA_EGFR	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	TGFA - EGFR
EGF_EGFR_ERBB2	EGF_EGFR_ERBB2	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	EGF - (EGFR+ERBB2)
EGF_EGFR	EGF_EGFR	EGF	NaN	NaN	NaN	NaN	KEGG: hsa04012	Secreted Signaling	EGF - EGFR

[38]:

# visualize 3 pairs in a pattern
pl.plot_pairs(A1, ['TGFA_EGFR','EGF_EGFR'], cmap='Wistia')

_images/differential_test_intestine_39_0.png

_images/differential_test_intestine_39_1.png

[41]:

# double check if the last column is cell type or not
A1.obsm['celltypes'] = A1.obs[A1.obs.columns[:-1]]

pl.chord_celltype(A1, pairs=['TGFA_EGFR','EGF_EGFR'], ncol=2, min_quantile=0.01)
# save='A1_EGF.svg'

[41]:

Column(

id = '1429', …)

Note

MAPK and RAL are reported upstream and downstream effectors of EGF, respectively. We examine if they express at the identified spots Please refer to our manuscript for more discussions.

[42]:

A1.obs['AREG_selected'] = np.where(A1.uns['selected_spots'].loc['AREG_EGFR'].values, 'selected','nonselected')

[43]:

import scanpy as sc
sc.pl.violin(A1, ['MAPK3','RALA', 'RALB'], 'AREG_selected', stripplot=False)

_images/differential_test_intestine_43_0.png

Cross-replicate consistency

Thanks to the multi-replicate setting, we could validate the consistency between biological / technical replicates

Global consistency

[29]:

I_df = pd.DataFrame(pd.Series(sample.uns['global_I'], index=sample.uns['global_res'].index) for sample in samples)
I_df = I_df.transpose()
I_df.columns = data

Extract the Global R from each replicate.

[30]:

from hilearn import *
_df = I_df.loc[:, ['A1', 'A2']].dropna()
corr_plot(_df.A1.values, _df.A2.values)
plt.xlabel('A1 (adult rep 1)')
plt.ylabel('A2 (adult rep 2)')
plt.title('Moran\'s I between replicates')

[30]:

Text(0.5, 1.0, "Moran's I between replicates")

_images/differential_test_intestine_49_1.png

[31]:

_df = I_df.loc[:, ['A8', 'A9']].dropna()
corr_plot(_df.A8.values, _df.A9.values)
plt.xlabel('A8 (fetus rep 1)')
plt.ylabel('A9 (fetus rep 2)')
plt.title('Moran\'s I between replicates')

[31]:

Text(0.5, 1.0, "Moran's I between replicates")

_images/differential_test_intestine_50_1.png

[32]:

_df = I_df.loc[:, ['A2', 'A9']].dropna()
corr_plot(_df.A2.values, _df.A9.values)
plt.xlabel('A2 (adult rep 1)')
plt.ylabel('A9 (fetus rep 2)')
plt.title('Moran\'s I between non-biological replicates')

[32]:

Text(0.5, 1.0, "Moran's I between non-biological replicates")

_images/differential_test_intestine_51_1.png

From the above results, Global R is consistent between replicates.

Local consistency & selected cell type consistency

We can also assess the local selection consistency by comparing the cell type compositions of selected spots

[33]:

fetus_celltypes = A3.obs.columns
PLG_df = pd.DataFrame(fetus_celltypes, index=fetus_celltypes)

PLG_F2RL1 is sparsely detected in fetus samples, it will be interesting to see whether such selections are consistent.

[34]:

pair='PLG_F2RL1'
for d,sample in zip(data,samples):
    sample.celltype = locals()['{}'.format(d)].obs
    if pair in sample.uns['local_z_p'].index:
        ct=sample.celltype[sample.uns['local_z_p'].loc[pair].values<0.1].sum(0).sort_values(ascending=False)
        ct=ct[ct>0]
        PLG_df['{}'.format(d)]=ct
    else:
        PLG_df['{}'.format(d)] = 0

Group spot weights by selecte cell type vs other cell types

[35]:

PLG_df.pop(0)
other=PLG_df.loc[~PLG_df.index.isin(['Distal Enterocytes','Distal Mature Enterocytes'])].sum(0)
df =pd.concat((PLG_df.loc[PLG_df.index.isin(['Distal Enterocytes','Distal Mature Enterocytes'])],
    pd.DataFrame(other, columns=['other cell types']).transpose()))

df = df.transpose()
df['sample']=df.index

PLG_F2RL1 is enriched in Enterocytes in 12-PCW colons, but not found in 19-PCW and adult samples. The two 12-PCW TI samples also have consistent PLG_F2RL1 enrichment, but the signaling celltypes are different from colon samples.

[36]:

plt.figure()
ax=df.sort_index().plot(x='sample',
        kind='bar',
        stacked=False,
        title='cell type weights in PLG_F2RL1 selected spot',
        color={'Distal Mature Enterocytes': [0.5       , 1.        , 0.65808859, 1.        ],
              'Distal Enterocytes': [0.06331813, 0.62857143, 0.        , 1.        ],
#                'neuron':[0.99848342, 0.875     , 0.99522066, 1.        ],
              'other cell types': 'grey'})
ax.set_xticklabels(df.sort_index().index, rotation=0)

[36]:

[Text(0, 0, 'A1'),
 Text(1, 0, 'A2'),
 Text(2, 0, 'A3'),
 Text(3, 0, 'A4'),
 Text(4, 0, 'A6'),
 Text(5, 0, 'A7'),
 Text(6, 0, 'A8'),
 Text(7, 0, 'A9')]

<Figure size 432x288 with 0 Axes>

_images/differential_test_intestine_61_2.png