Regex; eliminate all punctuation except
我有以下正则表达式,可以拆分任何空格或标点符号。如何从
1 2 3 4 5 | X <-"I'm not that good at regex yet, but am getting better!" strsplit(X,"[[:space:]]|(?=[[:punct:]])", perl=TRUE) [1]"I" "'" "m" "not" "that" "good" "at" "regex" "yet" [10]"," "" "but" "am" "getting""better" "!" |
我不清楚你想要的结果是什么,但你可以使用像这个答案这样的否定类。
1 2 3 | R> strsplit(X,"[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]] [1]"I'm" "not" "that" "good" "at" "regex" "yet," [8]"but" "am" "getting""better" "!" |
如果右边的下一个字符是
1 2 | [[:space:]]|(?=(?![',])[[:punct:]]) ^^^^^^^^ |
查看正则表达式演示。
详情
-
[[:space:]] - 任何空格 -
| - 或 -
(?=(?![',])[[:punct:]]) - 一个正向预测,要求在当前位置的右侧没有' 和, 并且有任何 1 个不是' 或, 的标点符号 (实际上,需要除' 和, 之外的任何标点符号)。
查看 R 在线演示
1 2 3 4 5 | X <-"I'm not that good at regex yet, but am getting better!" strsplit(X,"[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE) [[1]] [1]"I'm" "not" "that" "good" "at" "regex" "yet," [8]"but" "am" "getting""better" "!" |