建立索引时报错,有很多关于mRNA的错误输出文件为部分为0KB

Error: discarding overlapping duplicate mRNA feature (98631638-98635833) with ID=PenpuB30077125.g

Error: discarding overlapping duplicate mRNA feature (98637068-98639230) with ID=PenpuB30077126.g

Error: discarding overlapping duplicate mRNA feature (98654990-98657285) with ID=PenpuB30077127.g

Error: discarding overlapping duplicate mRNA feature (98670390-98671158) with ID=PenpuB30077128.g

Error: discarding overlapping duplicate mRNA feature (98713041-98715893) with ID=PenpuB30077129.g

Error: discarding overlapping duplicate mRNA feature (98722904-98724406) with ID=PenpuB30077130.g

Error: discarding overlapping duplicate mRNA feature (98789258-98799732) with ID=PenpuB30077131.g

Error: discarding overlapping duplicate mRNA feature (98741933-98747930) with ID=PenpuB30077132.g

Error: discarding overlapping duplicate mRNA feature (98724757-98725253) with ID=PenpuB30077133.g

Error: discarding overlapping duplicate mRNA feature (98758742-98759650) with ID=PenpuB30077134.g

Error: discarding overlapping duplicate mRNA feature (98799840-98803342) with ID=PenpuB30077135.g

Error: discarding overlapping duplicate mRNA feature (98723981-98730275) with ID=PenpuB30077136.g

Error: discarding overlapping duplicate mRNA feature (98748659-98749969) with ID=PenpuB30077137.g

Error: discarding overlapping duplicate mRNA feature (98732668-98734245) with ID=PenpuB30077138.g

Error: discarding overlapping duplicate mRNA feature (98735987-98736634) with ID=PenpuB30077139.g

build ANNOVAR index

RUN CMD: gtfToGenePred -genePredExt elephant_grass_genomenew.gtf unknown_refGene.txt

invalid gffGroup detected on line: chrB6MAKERexon1273930701273934250.000000-.transcript_id "PenpuB60000010.g.72"; 

GFF/GTF group PenpuB60000010.g.72 on chrB6+, this line is on chrB6-, all group members must be on same seq and strand

RUN CMD: retrieve_seq_from_fasta.pl --format refGene --seqfile elepant_grass_genome.fa  unknown_refGene.txt --out unknown_refGeneMrna.fa

NOTICE: Reading region file unknown_refGene.txt ... Done with 4143 regions from 1 chromosomes

/public/home/majieyu/perl5/ppgwas/reseq/reseq_demo//scripts/index.sh: line 56: 30143 Killed                  retrieve_seq_from_fasta.pl --format refGene --seqfile $fa --outfile unknown_refGeneMrna.fa unknown_refGene.txt

INFO:    Cleaning up image...


请先 登录 后评论

2 个回答

马杰宇

我在网上查了方法,把gff文件的mrna行删除,现在又出现了新的问题:

14:31:45.646 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/work/biosoft/picard/picard-3.0.0/picard.jar!/com/intel/gkl/native/libgkl_compression.so

[Sat Dec 16 14:31:45 CST 2023] CreateSequenceDictionary OUTPUT=elepant_grass_genome.dict REFERENCE=elepant_grass_genome.fa    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

[Sat Dec 16 14:31:45 CST 2023] Executing as majieyu@mgt02 on Linux 3.10.0-1127.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 19.0.1+10-21; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 3.0.0

/share/work/biosoft/picard/picard: line 66: 12030 Killed                  /share/work/biosoft/java/latest/bin/java -Xms512m -Xmx2g -jar /share/work/biosoft/picard/picard.jar CreateSequenceDictionary "R=elepant_grass_genome.fa" "O=elepant_grass_genome.dict"

awk: cmd. line:1: fatal: cannot open file `elepant_grass_genome.fa.fai' for reading (No such file or directory)

/public/home/majieyu/perl5/ppgwas/reseq/reseq_demo//scripts/index.sh: line 31: [: -gt: unary operator expected

RNN CMD: bwa index elepant_grass_genome.fa

[bwa_index] Pack FASTA... /public/home/majieyu/perl5/ppgwas/reseq/reseq_demo//scripts/index.sh: line 38: 14922 Killed                  bwa index $fa


gtf file not provide, try get gtf from gff:

RUN CMD: gffread  new.gff -T -o new.gtf

build ANNOVAR index

RUN CMD: gtfToGenePred -genePredExt new.gtf unknown_refGene.txt

invalid gffGroup detected on line: chrB6MAKERexon1273930701273934250.000000-.transcript_id "PenpuB60000010.g.72"; 

GFF/GTF group PenpuB60000010.g.72 on chrB6+, this line is on chrB6-, all group members must be on same seq and strand

RUN CMD: retrieve_seq_from_fasta.pl --format refGene --seqfile elepant_grass_genome.fa  unknown_refGene.txt --out unknown_refGeneMrna.fa

NOTICE: Reading region file unknown_refGene.txt ... Done with 4143 regions from 1 chromosomes

/public/home/majieyu/perl5/ppgwas/reseq/reseq_demo//scripts/index.sh: line 56: 18535 Killed                  retrieve_seq_from_fasta.pl --format refGene --seqfile $fa --outfile unknown_refGeneMrna.fa unknown_refGene.txt

请先 登录 后评论
omicsgene - 生物信息
擅长:重测序,遗传进化,转录组,GWAS

电脑内存不够,为了防止电脑死机,系统自动杀死了任务:

看看这个:https://www.omicsclass.com/article/1413

请先 登录 后评论