edger_analysis.r 差异基因分析edgeR

edger_analysis.r 差异基因分析edgeR

使用方法:

$Rscript $scriptdir/edger_analysis.r -h
usage: /work/my_stad_immu/scripts/edger_analysis.r [-h] -i filepath -m
                                                   filepath -t treatname
                                                   --control CONTROL --case
                                                   CASE [-f fdr] [-c fc]
                                                   [-s size] [-a alpha]
                                                   [-X x.lab] [-Y y.lab]
                                                   [-T title] [-H height]
                                                   [-W width] [-o path]
                                                   [-p prefix]

edgeR analysis : https://www.omicsclass.com/article/1506

optional arguments:
  -h, --help            show this help message and exit
  -i filepath, --input filepath
                        input read count file [required]
  -m filepath, --metadata filepath
                        metadata file , required
  -t treatname, --treatname treatname
                        treat colname in group file, required
  --control CONTROL     set control group name required
  --case CASE           set case group name required
  -f fdr, --fdr fdr     set fdr threshold [default 0.05]
  -c fc, --fc fc        set fold change threshold [default 2]
  -s size, --size size  point size [optional, default: 0.7]
  -a alpha, --alpha alpha
                        point transparency [0-1] [optional, default: 1]
  -X x.lab, --x.lab x.lab
                        the label for x axis [optional, default: log2FC]
  -Y y.lab, --y.lab y.lab
                        the label for y axis [optional, default: -log10(FDR)]
  -T title, --title title
                        the label for main title [optional, default: Volcano]
  -H height, --height height
                        the height of pic inches [default 5]
  -W width, --width width
                        the width of pic inches [default 5]
  -o path, --outdir path
                        output file directory [default
                        /work/my_stad_immu/05.enrich]
  -p prefix, --prefix prefix
                        out file name prefix [default Volcano]



参数说明:

-i 输入基因表达矩阵文件,必须为count表达文件:

IDTCGA-B7-A5TK-01A-12R-A36D-31TCGA-BR-7959-01A-11R-2343-13TCGA-IN-8462-01A-11R-2343-13TCGA-BR-A4CR-01A-11R-A24K-31TCGA-CG-4443-01A-01R-1157-13TCGA-KB-A93J-01A-11R-A39E-31TCGA-BR-4371-01A-01R-1157-13
TSPAN65951403628343484253720274749
TNMD3401018
DPM14672433017254370652330944415
SCYL31260205770214839241451982
C1orf1125239921721400234733958
FGR1249112728514856941208
CFH12831114355387995457121891795
FUCA25896785732085625152775303290
GCLC2682550914479323642252652418


-m metadata文件路径,样本的分组信息,第一列必须和表达文件的样本名称对应:


barcodesubtype.hclustStromalScoreImmuneScoreESTIMATEScoreTumourPurity
TCGA-B7-A5TK-01A-12R-A36D-31S11026.0572386.8353412.8920.448276
TCGA-BR-7959-01A-11R-2343-13S21130.722729.4021860.1240.638667
TCGA-IN-8462-01A-11R-2343-13S2112.2318683.9349796.16670.750581
TCGA-BR-A4CR-01A-11R-A24K-31S2-1060.35-766.618-1826.970.943814
TCGA-CG-4443-01A-01R-1157-13S2-261.577-258.629-520.2060.8635
TCGA-KB-A93J-01A-11R-A39E-31S1-202.2551605.121402.8650.688838
TCGA-BR-4371-01A-01R-1157-13S2-828.231711.3379-116.8930.832147
TCGA-IN-A6RO-01A-12R-A33Y-31S2-1406.5768.58307-1337.980.917683
TCGA-HU-A4H3-01A-21R-A251-31S2-619.208538.7225-80.48540.829171
TCGA-RD-A8MV-01A-11R-A36D-31S1113.41272309.6472423.060.572976
TCGA-VQ-A91X-01A-12R-A414-31S2-1845.85-590.017-2435.870.969545
TCGA-D7-8575-01A-11R-2343-13S2-206.1121392.7991186.6870.711491
TCGA-BR-4257-01A-01R-1131-13S1861.0291676.1482537.1770.559167
TCGA-BR-8485-01A-11R-2402-13S1373.09611110.5161483.6120.680198
TCGA-BR-4370-01A-01R-1157-13S11300.4951802.3273102.8220.488483


-t subtype.hclust   --case S1 --control  S2  : 指定metadata 分组列名,分组里面的比较组名字 ,如果分组名字有空格,应该用引号引起来:  “Stage IA”


--fdr 0.01 --fc 2  设置差异基因的筛选条件: 显著性和差异倍数

使用举例:

Rscript $scriptdir/edger_analysis.r  -i ../01.TCGA_download/TCGA-STAD_gene_expression_Counts.tsv \
    --fdr 0.01 --fc 2 \
  -m ../03.TIME/metadata.group.tsv -t subtype.hclust   --case S1 --control  S2 -p S1_vs_S2

结果展示:

火山图:



attachments-2021-06-vixQomQb60d2b01a431c0.png脚本获取与使用课程:https://study.163.com/course/introduction/1211864801.htm?share=1&shareId=1030291076


参考文献:

Robinson MD, McCarthy DJ, Smyth GK (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics26(1), 139-140. doi: 10.1093/bioinformatics/btp616.

McCarthy DJ, Chen Y, Smyth GK (2012). “Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.” Nucleic Acids Research40(10), 4288-4297. doi: 10.1093/nar/gks042.

  • 发表于 2021-06-25 11:21
  • 阅读 ( 2604 )
  • 分类:转录组

0 条评论

请先 登录 后评论
omicsgene
omicsgene

生物信息

738 篇文章

作家榜 »

  1. omicsgene 738 文章
  2. 安生水 364 文章
  3. Daitoue 167 文章
  4. 生物女学霸 120 文章
  5. xun 90 文章
  6. rzx 85 文章
  7. 红橙子 81 文章
  8. CORNERSTONE 72 文章