qiime2 分类器建立 SILVA数据库

https://forum.qiime2.org/t/processing-filtering-and-evaluating-the-silva-database-and-other-reference-sequence-data-with-rescript/15494


利用工具建立数据库  rescript


qiime rescript get-silva-data \ --p-version '138' \     --p-target 'SSURef_NR99' \     --p-include-species-labels \     --o-silva-sequences silva-138-ssu-nr99-seqs.qza \     --o-silva-taxonomy silva-138-ssu-nr99-tax.qza

这个代码自动获取 99相似性的序列和分类信息,由于网络原因一般运行报错

可以直接下载qiime2官方网站i提供的文件:https://docs.qiime2.org/2020.8/data-resources/

wget -c https://data.qiime2.org/2020.8/common/silva-138-99-seqs.qza
wget -c https://data.qiime2.org/2020.8/common/silva-138-99-tax.qza
ln -s silva-138-99-tax.qza silva-138-ssu-nr99-tax.qza
ln -s silva-138-99-seqs.qza silva-138-ssu-nr99-seqs.qza

之后就可以参考这里建立自己的分类器:https://forum.qiime2.org/t/processing-filtering-and-evaluating-the-silva-database-and-other-reference-sequence-data-with-rescript/15494


#remove sequences that contain 5 or more ambiguous bases (IUPAC compliant ambiguity bases) and any homopolymers that are 8 or more bases in length

qiime rescript cull-seqs \
    --i-sequences silva-138-ssu-nr99-seqs.qza \
    --o-clean-sequences silva-138-ssu-nr99-seqs-cleaned.qza

#长度过滤
qiime rescript filter-seqs-length-by-taxon \
    --i-sequences silva-138-ssu-nr99-seqs-cleaned.qza \
    --i-taxonomy silva-138-ssu-nr99-tax.qza \
    --p-labels Archaea Bacteria Eukaryota \
    --p-min-lens 900 1200 1400 \
    --o-filtered-seqs silva-138-ssu-nr99-seqs-filt.qza \
    --o-discarded-seqs silva-138-ssu-nr99-seqs-discard.qza

#重复序列合并
qiime rescript dereplicate \
    --i-sequences silva-138-ssu-nr99-seqs-filt.qza  \
    --i-taxa silva-138-ssu-nr99-tax.qza \
    --p-rank-handles 'silva' \
    --p-mode 'uniq' \
    --o-dereplicated-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza \
    --o-dereplicated-taxa silva-138-ssu-nr99-tax-derep-uniq.qza
#全长分类器构建
qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads  silva-138-ssu-nr99-seqs-derep-uniq.qza \
  --i-reference-taxonomy silva-138-ssu-nr99-tax-derep-uniq.qza \
  --o-classifier silva-138-ssu-nr99-classifier.qza



##特异引物分类器构建1
#截取序列
qiime feature-classifier extract-reads \
    --i-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza \
    --p-f-primer GTGYCAGCMGCCGCGGTAA \
    --p-r-primer GGACTACNVGGGTWTCTAAT \
    --p-n-jobs 2 \
    --p-read-orientation 'forward' \
    --o-reads silva-138-ssu-nr99-seqs-515f-806r.qza
#合并重复
qiime rescript dereplicate \
    --i-sequences silva-138-ssu-nr99-seqs-515f-806r.qza \
    --i-taxa silva-138-ssu-nr99-tax-derep-uniq.qza \
    --p-rank-handles 'silva' \
    --p-mode 'uniq' \
    --o-dereplicated-sequences silva-138-ssu-nr99-seqs-515f-806r-uniq.qza \
    --o-dereplicated-taxa  silva-138-ssu-nr99-tax-515f-806r-derep-uniq.qza
#构建分类器
qiime feature-classifier fit-classifier-naive-bayes \
    --i-reference-reads silva-138-ssu-nr99-seqs-515f-806r-uniq.qza \
    --i-reference-taxonomy silva-138-ssu-nr99-tax-515f-806r-derep-uniq.qza \
    --o-classifier silva-138-ssu-nr99-515f-806r-classifier.qza


##特异引物分类器构建2
# 338F (5′-ACTCCTACGGGAGGCAGCAG-3′) and. 806R (5′-GGACTACHVGGGTWTCTAAT-3′)
#截取序列
qiime feature-classifier extract-reads \
    --i-sequences silva-138-ssu-nr99-seqs-derep-uniq.qza \
    --p-f-primer ACTCCTACGGGAGGCAGCAG \
    --p-r-primer GGACTACHVGGGTWTCTAAT \
    --p-n-jobs 2 \
    --p-read-orientation 'forward' \
    --o-reads silva-138-ssu-nr99-seqs-338f-806r.qza
#合并重复
qiime rescript dereplicate \
    --i-sequences silva-138-ssu-nr99-seqs-338f-806r.qza \
    --i-taxa silva-138-ssu-nr99-tax-derep-uniq.qza \
    --p-rank-handles 'silva' \
    --p-mode 'uniq' \
    --o-dereplicated-sequences silva-138-ssu-nr99-seqs-338f-806r-uniq.qza \
    --o-dereplicated-taxa  silva-138-ssu-nr99-tax-338f-806r-derep-uniq.qza
#构建分类器
qiime feature-classifier fit-classifier-naive-bayes \
    --i-reference-reads silva-138-ssu-nr99-seqs-338f-806r-uniq.qza \
    --i-reference-taxonomy silva-138-ssu-nr99-tax-338f-806r-derep-uniq.qza \
    --o-classifier silva-138-ssu-nr99-338f-806r-classifier.qza





相关问题

0 条评论

请先 登录 后评论
omicsgene
omicsgene

生物信息

368 篇文章

作家榜 »

  1. omicsgene 368 文章
  2. 安生水 217 文章
  3. Daitoue 167 文章
  4. 生物女学霸 120 文章
  5. CORNERSTONE 72 文章
  6. 红橙子 50 文章
  7. 生信老顽童 48 文章
  8. landy 37 文章