老师，您好！请问做全基因组选择（GS选择）之前，训练集和预测集的划分，有没有代码可以按照特定比例，随机快速实现材料的划分，并生成文件？（比如20%个体作为训练集，80%个体作为预测集；40%个体作为训练集，60%个体作为预测集；60%个体作为训练集，40%个体作为预测集；80%个体作为训练集，20%个体作为预测集，等等）

全基因组选择（GS）

0 条评论
分类：重测序

默认排序时间排序

1 个回答

omicsgene - 生物信息 2025-08-08 17:07

擅长：重测序,遗传进化,转录组,GWAS

如果你会R语言可以使用下面的代码随机比例分组：

使用R包：caret

library(caret)
phenotypes <- read.table("phenotypes.txt",sep = "\t",header = T)
head(phenotypes)
#sampleID weight gender
#ID1    150   F
#ID2    160   F
#ID3    290   M
#ID4    155   M

#不考虑分组 随机抽80%样本
train_indices <- createDataPartition(y = phenotypes$sampleID, p = 0.8, list = FALSE)
train_set <- phenotypes[train_indices, ]
test_set  <- phenotypes[-train_indices, ]

#如果考虑分组: 例如每种性别随机80%

train_indices <- createDataPartition(y = phenotypes$gender, p = 0.8, list = FALSE)
train_set <- phenotypes[train_indices, ]
test_set  <- phenotypes[-train_indices, ]

1 个回答

相似问题