基于R实现统计中的检验方法---T检验

前言

T检验，亦称student t检验（Student's t test），主要用于样本含量较小（例如n < 30），总体标准差σ未知的正态分布。T检验是用t分布理论来推论差异发生的概率，从而比较两个平均数的差异是否显著。

1.适用条件

已知一个总体均数；可得到一个样本均数及该样本标准差；样本来自正态或近似正态总体。

备注：若是单独样本T检验，必须给出一个标准值或总体均值，同时，提供一组定量的观测结果，应用t检验的前提条件是该组资料必须服从正态分布；若是配对样本T检验，每对数据的差值必须服从正态分布；若是独立样本T检验，个体之前相互独立，两组资料均取自正态分布的总体，并满足方差齐性。之所以需要这些前提条件，是因为必须在这样的前提下所计算出的t统计量才服从t分布，而t检验正是以t分布作为其理论依据的检验方法。后面的方差分析，其独立样本T检验的前提条件是相同的，即正态性额方差齐性。(参考：t检验和方差分析的前提条件及应用误区_百度文库（链接见文末）说的非常详细)

2.分类

单总T检验（单独样本T检验），双总T检验（一是独立样本T检验，另一是配对样本T检验）

备注：单独样本T检核与独立样本T检验的区别。单独样本T检验（One-Samples T Test）用于进行样本所在总体均数与已知总体均数的比较，独立样本T检验（Independent-Samples T Test)用于进行两样本均数的比较。

3.R实例

  —————————#单样本T检验#——————————————
  #某鱼塘水的含氧量多年平均值为4.5mg/L,现在该鱼塘设10点采集水样，测定水中含氧量（单位：mg/L）分别为：
  #4.33,4.62,3.89,4.14,4.78,4.64,4.52,4.55,4.48,4.26，问该次抽样的水中含氧量与多年平均值是否有显著差异？
  Sites<-c(4.33,4.62,3.89,4.14,4.78,4.64,4.52,4.55,4.48,4.26)
  t.test(sites,mu=4.5)
          One Sample t-test
  
  data:  sites
  t = -0.93574, df = 9, p-value = 0.3738
alternative hypothesis: true mean is not equal to 4.5
95 percent confidence interval:
 4.230016 4.611984
sample estimates:
mean of x 
    4.421 
p=0.3738>0.05,认为所抽样水体的含氧量与多年平均值无显著差异

—————————#独立样本T检验#——————————————
#有两种情况，一种是两个总体方差齐性，另一种是两个总体方差不齐。
#################两样本方差齐性
#用高蛋白和低蛋白两种饲料饲养1月龄的大白鼠，饲养3个月后，测定两组大白鼠的增重量(g)，两组数据分别如下所示：
#高蛋白组：134,146,106,119,124,161,107,83,113,129,97,123
#低蛋白组：70,118,101,85,107,132,94
#试问两种饲料养殖的大白鼠增重量是否有显著差异？
High<-c(134,146,106,119,124,161,107,83,113,129,97,123)
Low<-c(70,118,101,85,107,132,94)
Group<-c(rep(1,12),rep(0,7))#1表示High，0表示Low
x<-c(High,Low)
DATA<-data.frame(x,Group)
DATA$Group<-as.factor(DATA$Group)
#bartlett.test方差齐性检验
bartlett.test(x~Group)
        Bartlett test of homogeneity of variances

data:  x by Group
Bartlett's K-squared = 0.0066764, df = 1, p-value = 0.9349

#var.test方差齐性检验
var.test(x~Group)
 F test to compare two variances

data:  x by Group
F = 0.94107, num df = 6, denom df = 11, p-value = 0.9917
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.2425021 5.0909424
sample estimates:
ratio of variances 
          0.941066 

#leveneTest方差齐性检验（也是SPSS的默认方差齐性检验方法）
library(car)
leveneTest(DATA$x,DATA$Group)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.0088 0.9264
      17              
#前两者是对原始数据的方差进行检验的，leveneTest是对方差模型的残差进行组间齐性检验.一般认为是要求残差的方差齐，所以一般的统计软件都做的是leveneTest
#结果说明两独立样本数据方差齐性，可以进行独立样本T检验。
t.test(High,Low,paired=FALSE)
        Welch Two Sample t-test

data:  High and Low
t = 1.9319, df = 13.016, p-value = 0.07543
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.263671 40.597005
sample estimates:
mean of x mean of y 
 120.1667  101.0000 
结果表明两种饲料养殖的大白鼠增重量无显著差异。

#################两样本方差不齐
#有人测定了甲乙两地区某种饲料的含铁量（mg/kg），结果如下：
#甲地：5.9,3.8,6.5,18.3,18.2,16.1,7.6
#乙地：7.5,0.5,1.1,3.2,6.5,4.1,4.7
#试问这种饲料含铁量在两地间是否有显著差异？
JIA<-c(5.9,3.8,6.5,18.3,18.2,16.1,7.6)
YI<-c(7.5,0.5,1.1,3.2,6.5,4.1,4.7)
Content<-c(JIA,YI)
Group<-c(rep(1,7),rep(2,7))#1表示甲地，2表示乙地
data<-data.frame(Content,Group)
data$Group<-as.factor(Group)

#bartlett.test方差齐性检验
bartlett.test(Content~Group)
 Bartlett test of homogeneity of variances

data:  Content by Group
Bartlett's K-squared = 3.9382, df = 1, p-value = 0.0472

#var.test方差齐性检验
var.test(Content~Group)
 F test to compare two variances

data:  Content by Group
F = 5.9773, num df = 6, denom df = 6, p-value = 0.04695
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
  1.02707 34.78643
sample estimates:
ratio of variances 
            5.9773 
#结果说明两独立样本数据方差不齐，对齐进行方差不齐分析
t.test(Content,Group,paired=FALSE,var.equal=FALSE)
Welch Two Sample t-test

data:  Content and Group
t = 3.7511, df = 13.202, p-value = 0.002362
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 2.519419 9.337724
sample estimates:
mean of x mean of y 
 7.428571  1.500000 
#方差齐性检验表明，方差不等，因此设定var.equal=FALSE，此时p=0.0023<0.05，
#表明该饲料在两地的含铁量有显著差异。


—————————#配对样本T检验#——————————————
#某人研究冲水对草鱼产卵率的影响， 获得冲水前后草鱼产卵率（%），如下：
#冲水前：82.5,85.2,87.6,89.9,89.4,90.1,87.8,87.0,88.5,92.4
#冲水后：91.7,94.2,93.3,97.0,96.4,91.5,97.2,96.2,98.5,95.8
#问：冲水前后草鱼亲鱼产卵率有无差异？
Before<-c(82.5,85.2,87.6,89.9,89.4,90.1,87.8,87.0,88.5,92.4)
After<-c(91.7,94.2,93.3,97.0,96.4,91.5,97.2,96.2,98.5,95.8)
t.test(Before,After,paired=T)
        Paired t-test

data:  Before and After
t = -7.8601, df = 9, p-value = 2.548e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.1949 -5.0851
sample estimates:
mean of the differences 
                  -7.14 
结果表明，p=2.548e-05<0.01，表明冲水前后，草鱼亲鱼的产卵率有非常显著差异。

------------------------备注---------------------------
1）会有很多同学疑惑（Professionals don't laugh），为什么独立样本T检验有方差相等/不相等之分，而配对样本T检验/单样本T检验没有？
2）t.test(x,y,alternative=c("two.sided","less","greater"),mu=0,paired=FALSE,
var.equal=FALSE,conf.level=0.95......)
如果只提供x,则作单个正态总体的均值检验，如果提供x,y则作两个总体的均值检验)，alternative表示被则假设，
two.sided(缺省)，双边检验,less表示单边检验,greater表示单边检验，mu表示原假设μ0，若 paired=T，为配对检验，
则必须指定x和y，并且它们必须是相同的长度。默认删除缺失值（如果配对为TRUE，则成对配对），var.equal是逻辑变量，
var.equal=TRUE表示两样品方差相同，var.equal=FALSE（缺省）表示两样本方差不同，conf.level置信水平，即1-α，通常是0.95，。

参考

[1]顾志峰,叶乃好,石耀华.实用生物统计学[M].北京:科学出版社,2012年.

[2]t检验和方差分析的前提条件及应用误区_百度文库

https://wenku.baidu.com/view/c3f1e06b5727a5e9846a6117.html

发表于 2019-02-28 22:20
阅读 ( 4640 )
分类：R

基于R实现统计中的检验方法---T检验

你可能感兴趣的文章

相关问题

0 条评论

作家榜 »