GATK 将VCF文件导入db出现问题

再用GATK 将VCF文件导入db的时候,用如下命令:gatk  --java-options "-Xmx50g" GenomicsDBImport  \

  -L intervals.list  --tmp-dir $tmpdir  -R $REF --batch-size 5 \

  --reader-threads 1 --max-num-intervals-to-import-in-parallel 5 \

  --genomicsdb-workspace-path db --sample-name-map cohort.sample_map。生成的db文件夹中有这几个文件:

drwx------ 4 root root  4096 Aug  4 16:55 Chr1$1$48169259

-rwx------ 1 root root     0 Aug  4 16:55 __tiledb_workspace.tdb

-rwx------ 1 root root 17408 Aug  4 16:55 vcfheader.vcf

-rwx------ 1 root root  3417 Aug  4 16:55 vidmap.json

感觉是不完成整的。
生成的日志文件是这样的:
Using GATK jar /share/work/biosoft/GATK/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx50g -jar /share/work/biosoft/GATK/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar GenomicsDBImport -L intervals.list --tmp-dir /work/my_reseq/tmp -R /work/my_reseq/ref/Dongzao.fa --batch-size 5 --reader-threads 1 --max-num-intervals-to-import-in-parallel 5 --genomicsdb-workspace-path db --sample-name-map cohort.sample_map
14:56:31.470 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/work/biosoft/GATK/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:56:31.552 INFO  GenomicsDBImport - ------------------------------------------------------------
14:56:31.556 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.4.0.0
14:56:31.556 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
14:56:31.556 INFO  GenomicsDBImport - Executing as root@ed01a5f7cf62 on Linux v4.15.0-112-generic amd64
14:56:31.556 INFO  GenomicsDBImport - Java runtime: Java HotSpot(TM) 64-Bit Server VM v19.0.1+10-21
14:56:31.556 INFO  GenomicsDBImport - Start Date/Time: August 4, 2025 at 2:56:31 PM CST
14:56:31.557 INFO  GenomicsDBImport - ------------------------------------------------------------
14:56:31.557 INFO  GenomicsDBImport - ------------------------------------------------------------
14:56:31.557 INFO  GenomicsDBImport - HTSJDK Version: 3.0.5
14:56:31.558 INFO  GenomicsDBImport - Picard Version: 3.0.0
14:56:31.558 INFO  GenomicsDBImport - Built for Spark Version: 3.3.1
14:56:31.558 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:56:31.558 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:56:31.558 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:56:31.558 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:56:31.558 INFO  GenomicsDBImport - Deflater: IntelDeflater
14:56:31.558 INFO  GenomicsDBImport - Inflater: IntelInflater
14:56:31.559 INFO  GenomicsDBImport - GCS max retries/reopens: 20
14:56:31.559 INFO  GenomicsDBImport - Requester pays: disabled
14:56:31.559 INFO  GenomicsDBImport - Initializing engine
14:56:31.799 INFO  IntervalArgumentCollection - Processing 393332932 bp from intervals
14:56:31.817 INFO  GenomicsDBImport - Done initializing engine
14:56:32.064 INFO  GenomicsDBLibLoader - GenomicsDB native library version : 1.4.4-ce4e1b9
14:56:32.065 INFO  GenomicsDBImport - Vid Map JSON file will be written to /work/my_reseq/4.snp_indel/GATK/db/vidmap.json
14:56:32.065 INFO  GenomicsDBImport - Callset Map JSON file will be written to /work/my_reseq/4.snp_indel/GATK/db/callset.json
14:56:32.065 INFO  GenomicsDBImport - Complete VCF Header will be written to /work/my_reseq/4.snp_indel/GATK/db/vcfheader.vcf
14:56:32.065 INFO  GenomicsDBImport - Importing to workspace - /work/my_reseq/4.snp_indel/GATK/db
14:56:32.565 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:32.565 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:32.565 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:32.565 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:32.565 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:33.775 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:33.781 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:33.801 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:33.801 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:33.967 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:33.976 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:56:33.983 INFO  GenomicsDBImport - Importing batch 1 with 5 samples
14:57:24.783 INFO  GenomicsDBImport - Shutting down engine
[August 4, 2025 at 2:57:24 PM CST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.89 minutes.
Runtime.totalMemory()=5838471168
htsjdk.samtools.SAMFormatException: Did not inflate expected amount
        at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:147)
        at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
        at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
        at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
        at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
        at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
        at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
        at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:241)
        at htsjdk.tribble.readers.TabixReader.readLine(TabixReader.java:215)
        at htsjdk.tribble.readers.TabixReader.access$300(TabixReader.java:48)
        at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:434)
        at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:46)
        at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:170)
        at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
        at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:950)
        at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$1$NoMnpIterator.next(GenomicsDBImport.java:941)
        at org.genomicsdb.importer.GenomicsDBImporterStreamWrapper.next(GenomicsDBImporterStreamWrapper.java:110)
        at org.genomicsdb.importer.GenomicsDBImporter.doSingleImport(GenomicsDBImporter.java:583)
        at org.genomicsdb.importer.GenomicsDBImporter.lambda$null$4(GenomicsDBImporter.java:733)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1589)

请先 登录 后评论

1 个回答

Ti Amo

和线程关系不大,需要检查一下前面的g.vcf.gz都正不正常,以及他们的索引文件是否正常,如果涉及到染色体长度超过512M可能会引起一些索引的问题。或者你选择继续往下跑一下看看有在db变成vcf这一步有没有更明确的报错。

以下是关于“htsjdk.samtools.SAMFormatException: Did not inflate expected amount” GATK作者做出的答复,以及最后的解决方案:

"Did not inflate expected amount" Error – GATK

attachments-2025-08-8CFTYZQ3689082b5475e5.png

请先 登录 后评论
  • 0 关注
  • 0 收藏,103 浏览
  • 郭老师 提出于 5天前

相似问题