TCGA 数据库中的基因编号采用的Esembl 的编号,但是有些分析软件,需要输入的基因编号是 gene symbol ,这就需要将Esemble 的ID 转换成gene symbol  。 
今天介绍采用clusterProfiler 进行转换:
# 加载相关软件包
> library(clusterProfiler)
> library(org.Hs.eg.db)
# org.Hs.eg.db 包提供的ID转换类型
> keytypes(org.Hs.eg.db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL" 
[10] "GENENAME"     "GO"           "GOALL"        "IPI"          "MAP"          "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"        
[19] "PFAM"         "PMID"         "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"       "UNIGENE"      "UNIPROT"     
# 测试ID
> test_id <- c("ENSG00000000971", "ENSG00000001084", "ENSG00000001460", "ENSG00000001461", "ENSG00000001626", "ENSG00000001630") 
# 采用bitr 命令进行ID的转换
> gene_ids <- bitr(test_id, fromType="ENSEMBL", toType=c("SYMBOL", "GENENAME"), OrgDb="org.Hs.eg.db")
'select()' returned 1:1 mapping between keys and columns
# 查看转换的结果
> gene_ids
          ENSEMBL  SYMBOL                                            GENENAME
1 ENSG00000000971     CFH                                 complement factor H
2 ENSG00000001084    GCLC         glutamate-cysteine ligase catalytic subunit
3 ENSG00000001460   STPG1              sperm tail PG-rich repeat containing 1
4 ENSG00000001461  NIPAL3                       NIPA like domain containing 3
5 ENSG00000001626    CFTR cystic fibrosis transmembrane conductance regulator
6 ENSG00000001630 CYP51A1      cytochrome P450 family 51 subfamily A member 1
如果您对TCGA数据挖掘感兴趣,请学习我的TCGA系列课程:
 
                如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!