R-常见的字符串处理方法

R语言中常见的基本字符串处理，涉及匹配、替换、分割、截取、大小写转换等等

A =c("abcdgegh")
B = c("abcdgegh","deghgabcd")

匹配

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)
grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
      fixed = FALSE, useBytes = FALSE)
##ignore.case 决定匹配是否对大小写敏感，为了达到精确匹配，默认为对大小写敏感；你完全可以设置不敏感

grep()返回匹配结果,其中invert设定返回匹配或者未匹配上的字符串，grepl()返回逻辑值，例如：

> grep("ab",A)
[1] 1
> grep("ab",B,value=T)
[1] "abcdgegh"  "deghgabcd"
> grep("de",B,invert=F,value=T)
[1] "deghgabcd"
> grep("de",B,invert=T,value=T)
[1] "abcdgegh"

##grepl字符串起始匹配也可以用"^"锚定

> grepl("ab",A)
[1] TRUE
> grepl("ab",B)
[1] TRUE TRUE

匹配替换

sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
    fixed = FALSE, useBytes = FALSE)
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
     fixed = FALSE, useBytes = FALSE)
chartr(old, new, x)

sub()、gsub()、chartr()可以返回替换之后的结果

> sub("ab","AB",B)
[1] "ABcdgegh"  "deghgABcd"

> gsub("ab","AB",B)
[1] "ABcdgegh"  "deghgABcd"

> chartr("a","A",B)
[1] "Abcdgegh"  "deghgAbcd"

分割、拆分、截取

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
substr(x, start, stop)
substring(text, first, last = 1000000L)

例如：

> strsplit(A,"d")
[[1]]
[1] "abc"  "gegh"
> unlist(strsplit(A,"d"))
[1] "abc"  "gegh"

> substr(A,1,3)
[1] "abc"
> substr(B,1,3)
[1] "abc" "deg"

> substring(A,1,last=4)
[1] "abcd"
> substring(B,1,last=4)
[1] "abcd" "degh"
>

大小写替换

##替换成大写

toupper(x)

##替换成小写

tolower(x) 

##根据参数转换大小写

casefold(x,upper=FALSE)

例如:

> toupper(A)
[1] "ABCDGEGH"
> tolower(toupper(A))
[1] "abcdgegh"
> casefold(A,upper=T)
[1] "ABCDGEGH"

此外还有一个针对字符出来的包strngr可以对字符串进行多种操作处理。
package: stringr

发表于 2018-05-31 14:28
阅读 ( 3566 )
分类：R

R-常见的字符串处理方法

你可能感兴趣的文章

相关问题

1 条评论

作家榜 »