转录组比对软件HISAT2的使用说明

转录组分析的常用分析流程，目前都由Hophat + cufflinks 组合转向了采用HISTA + StringTie 组合。该组合的Protocol 可参考发表在Nature Protocol 上的文章“Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown”

首先来看看比对的软件HISTA，其速度和精度都较Tophat 有很大的提升。

其使用说明如下：

hisat2 [options]* -x <ht2-idx> {-1 <m1> -2 <m2> | -U <r> | --sra-acc <SRA accession number>} [-S <sam>]

<ht2-idx> Index 文件的前缀 (*.X.ht2)

<m1> read1 文件 (支持gz,bzip2压缩格式)

<m2> read2 文件 (支持gz,bzip2压缩格式)

<r> 输出 unpaired 比对序列（支持gz,bzip2压缩格式）

<SRA accession number> 支持对NCBI SRA数据的下载，采用逗号分隔不同SRA号

<sam> 比对结果SAM 文件的输出 (默认: 标准输出)

<m1>, <m2>, <r> 支持输入一个用逗号隔开的文件列表，也支持多次输入比如： '-U file1.fq,file2.fq -U file3.fq'.

选项 (括号中是默认值):

输入:

-q 输入文件格式是FASTQ .fq/.fastq (default)

--qseq q输入文件格式是 Illumina's qseq format

-f 输入文件格式是多序列的FASTA .fa/.mfa

-r 输入是一行序列

-c <m1>, <m2>, <r> are sequences themselves, not files

-s/--skip <int> 跳过输入文件前面的 <int> reads/pairs (none)

-u/--upto <int> 超过输入文件前面的 <int> reads/pairs 就停止程序(no limit)

-5/--trim5 <int> 去除Reads 5'/左边 <int> 碱基 (0)

-3/--trim3 <int> 去除Reads 3'/r右边 <int> 碱基 (0)

--phred33 序列质量值编码是 Phred+33 (默认编码格式)

--phred64 序列质量值编码是Phred+64

--int-quals 序列质量值是用空格分开的数字

--sra-acc SRA 登录号

比对:

--n-ceil <func> 允许非A/C/G/Ts 在比对中的比例 (L,0,0.15)

--ignore-quals 如果忽略测序质量值，则默认质量值为30 (off)

--nofw 不比对正向的reads (off)

--norc 不比对反向互补的reads (off)

剪切比对:

--pen-cansplice <int> 正常剪切位点的罚分 (0)

--pen-noncansplice <int> 非正常剪切位点的罚分 (12)

--pen-canintronlen <func> 长内含子正常剪切位点的罚分函数 (G,-8,1)

--pen-noncanintronlen <func> 长内含子非正常剪切位点的罚分函数 (G,-8,1)

--min-intronlen <int> 内含子最小长度 (20)

--max-intronlen <int> 内含子最大长度 (500000)

--known-splicesite-infile <path> 指定已知的剪切位点文件

--novel-splicesite-outfile <path> 发现（报告）新的剪切位点

--novel-splicesite-infile <path> 指定一些新的可变剪切位点

--no-temp-splicesite disable the use of splice sites found

--no-spliced-alignment 停用剪切比对

--rna-strandness <string> 只能RNA的连特异性 (unstranded)

--tmo 只报告与已知的转录本比对上的reads

--dta 报告专门为转录本组装的比对reads

--dta-cufflinks 报告专门为cufflinks组装的比对reads

打分:

--ma <int> 匹配得分 (0 for --end-to-end, 2 for --local)

--mp <int>,<int> 位点错误匹配的最大和最小罚分，低质量，低罚分 <2,6>

--sp <int>,<int> max and min penalties for soft-clipping; lower qual = lower penalty <1,2>

--np <int> 非A/C/G/Ts 匹配的罚分 (1)

--rdg <int>,<int> read 空格开放和延伸的罚分(5,3)

--rfg <int>,<int> 参考序列空格开放和延伸的罚分 (5,3)

--score-min <func> 最小可接受的比对打分 (L,0.0,-0.2)

比对报告输出:

(default) 多对比结果，只报告最好的比对

-k <int> 多比对结果，最多可报告的比对数量

-a/--all 报告全部对比对结果

双端比对:

--fr/--rf/--ff reads 比对的方向 fw/rev, rev/fw, fw/fw (--fr)

--no-mixed 不做非配对的reads 比对

--no-discordant 比做距离不一致的reads 比对

输出:

-t/--time 输出在搜索过程中的使用的时间情况

--un <path> 未比对上的reads 输出路径 <path>

--al <path> 一端比对上的reads 输出路径 <path>

--un-conc <path> 比对位置不一致的reads 输出路径 <path>

--al-conc <path> 至少有一个位置比对一致的reads 输出路径 <path>

--un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)

--quiet 除非有严重错误，否则不打印错误输出

--met-file <path> 保存metrics 到文件 <path> (off)

--met-stderr 打印metrics 大标准错误输出 (off)

--met <int> 多少秒报告一次内部 counters 和 metrics (1)

--no-head 在SAM文件中不输出head信息

--no-sq 在SAM文件中不输出head的@SQ 信息

--rg-id <text> 设置reads ID信息

--rg <text> 增加reads 分组信息

--omit-sec-seq put '*' in SEQ and QUAL fields for secondary alignments.

性能:

-o/--offrate <int> 覆盖index的offrate

-p/--threads <int> 比对的线程数 (1)

--reorder 强制保持输出SAM文件中reads的顺序同输入的reads一致

--mm 通过内存共享index, 使得多个bowtie能共享

其他:

--qc-filter 过滤质量值低的reads

--seed <int> 生成随机数的seed(种子) (0)

--non-deterministic 随机数生成采用种子（seed) 代替reads的属性

--remove-chrname 在比对结果中删除参考序列名称上的'chr'

--add-chrname 在比对结果中给参考序列名称加上 'chr'

--version 输出软件的版本信息

-h/--help 输出软件的使用文档

大部分使用参数采用默认即可，具体参数设置需要结合分析需求。

如果对转录组数据分析感兴趣的话，可以学习我的课程：
《有参转录组数据分析》

发表于 2018-07-13 10:48
阅读 ( 18372 )
分类：转录组

转录组比对软件HISAT2的使用说明

你可能感兴趣的文章

相关问题

0 条评论

作家榜 »