该程序依赖的包已经严重老旧,为了方便使用,我们在GPU节点上利用Docker安装了该程序。如果您没有GPU队列的登录权限,请联系计算平台开通。
ssh gpu002 -q gup.q
nvidia-docker run --rm -e NVIDIA_VISIBLE_DEVICES=all -v /sibcb2/bioinformatics/Projects/EPEE:/home epee:gpu python /home/script/run_epee.py -h
usage: run_epee.py [-h] -a CONDITIONA -b CONDITIONB -na NETWORKA -nb NETWORKB
[-o OUTPUT] [-reg1 LREGULARIZATION] [-reg2 GREGULARIZATION]
[-s STEP] [-c CONDITIONING] [-r RUNS] [-i ITERATIONS]
[-ag AGGREGATION] [-n NORMALIZE] [-m MODEL] [-v VERBOSE]
[-eval EVALUATE] [-pr PREFIX] [-w] [-mp] [-null] [-d SEED]
[-p PERTURB] [-sg]
optional arguments:
-h, --help show this help message and exit
-a CONDITIONA, --conditiona CONDITIONA
RNA-seq data for Condition A
-b CONDITIONB, --conditionb CONDITIONB
RNA-seq data for Condition B
-na NETWORKA, --networka NETWORKA
Network for condition A
-nb NETWORKB, --networkb NETWORKB
Network for condition B
-o OUTPUT, --output OUTPUT
output directory
-reg1 LREGULARIZATION, --lregularization LREGULARIZATION
lasso regularization parameter
-reg2 GREGULARIZATION, --gregularization GREGULARIZATION
graph contrained regularization parameter
-s STEP, --step STEP optimizer learning-rate
-c CONDITIONING, --conditioning CONDITIONING
Weight for the interactions not known
-r RUNS, --runs RUNS Number of independent runs
-i ITERATIONS, --iterations ITERATIONS
Number of iterations
-ag AGGREGATION, --aggregation AGGREGATION
Method for aggregating runs. Default: "sum" Valid
options: {"mean", "median", "sum"}
-n NORMALIZE, --normalize NORMALIZE
Weight normalization strategy. Default:"minmax" Valid
options: {"minmax", "log", "log10", "no"}
-m MODEL, --model MODEL
Model regularization choice. Default: "epee-gcl" Valid
options: {"epee-gcl","epee-l","no-penalty"
-v VERBOSE, --verbose VERBOSE
logging info levels 10, 20, or 30
-eval EVALUATE, --evaluate EVALUATE
Evaluation mode available for Th1, Th2, Th17, Bmem,
COAD, and AML
-pr PREFIX, --prefix PREFIX
Add prefix to the log
-w, --store_weights Store all the inferred weights
-mp, --multiprocess multiprocess the calculation of perturb and regulator
scores
-null, --null Generate null scores by label permutation
-d SEED, --seed SEED Starting seed number
-p PERTURB, --perturb PERTURB
True label perturb scores. Required when running
permutations for null model
-sg, --shuffle_genes Generate null scores by gene permutation
运行该程序需要两类输入文件:
基因表达谱:
head data/COAD_tumor.txt
gene value
A1BG 6.6
A1CF 0
A2BP1 0
A2LD1 6.6
A2ML1 1
A2M 14
A4GALT 8.4
A4GNT 1.1
AAA1 0
head data/COAD_normal.txt
gene value
A1BG 6.3
A1CF 0
A2BP1 0
A2LD1 6.1
A2ML1 0.68
A2M 16
A4GALT 8.6
A4GNT 2.3
AAA1 0
网络文件可以从如下网页下载:FANTOM5_individual_networks.tar和Network_compendium.zip。
zcat 20_gastrointestinal_system.txt.gz |head
FOXO3 MBP 2.16216012E-3
ALX1 CD209 2.06986338E-3
ZIC4 PDLIM7 1.25342086E-2
PAX8 NKD2 1.42917932E-3
MAFF PRC1 3.17588671E-2
DBX1 LGALSL 5.93835673E-2
HOXC10 IRS1 3.02063335E-4
TCF12 FUCA2 9.88682006E-3
TFAP2A PLCD1 8.69630824E-4
ZSCAN4 GCDH 2.7480217E-3
运行代码如下:
nvidia-docker run --rm -e NVIDIA_VISIBLE_DEVICES=all \
-v /sibcb2/bioinformatics/Projects/EPEE:/home \
-v /sibcb2/bioinformatics/iGenome/FANTOM5_Network/Network_compendium/Tissue-specific_regulatory_networks_FANTOM5-v1:/network \
epee:gpu python /home/script/run_epee.py \
-a /home/data/COAD_tumor.txt \
-b /home/data/COAD_normal.txt \
-na /network/32_high-level_networks/20_gastrointestinal_system.txt.gz \
-nb /network/32_high-level_networks/20_gastrointestinal_system.txt.gz \
-o /home/res2/ \
-pr COAD
原作者删除了测试数据,我们从TCGA中提取了COAD肿瘤以及癌旁的数据,分别存为COAD_tumor.txt
、COAD_normal.txt
,运行产生四个结果文件:
res/COAD_epee-gcl_0.01_0.01_0/
|-- log.txt
|-- model
| |-- loss1_arr_y1.txt
| |-- loss2_arr_y2.txt
| `-- loss_runs.txt
`-- scores
|-- all_perturb_scores.txt
|-- all_regulator_scores.txt
|-- perturb_scores.txt
`-- regulator_scores.txt
其中regulator_scores.txt
是排在前面的转录因子:
gene score
0 HOXC13 0.6924611278809607
1 TP73 0.6397721986286342
2 PITX2 0.5111408117227256
3 POU4F1 0.4897183245047927
4 POU6F2 0.48886519484221935
5 NKX6-1 0.48417521407827735
6 DMBX1 0.4784999266266823
7 ONECUT1 0.4712283406406641
8 BHLHA15 0.46911837393417954