关于正则表达式：如何捕获在 elisp 中拆分字符串的结果？

How can I capture the results of splitting a string in elisp?

我在 elisp 中工作，我有一个代表项目列表的字符串。字符串看起来像

1	"apple orange 'tasty things' 'my lunch' zucchini 'my dinner'"

我正试图将其拆分为

1	("apple""orange""tasty things""my lunch""zucchini""my dinner")

这是一个熟悉的问题。我解决它的障碍不是关于正则表达式，而是更多关于 elisp 的细节。

我想要做的是像这样运行一个循环：

(while (< (length my-string) 0) do-work)

do-work 在哪里：

将正则表达式 \\('[^']*?'\\|[[:alnum:]]+)\\([[:space:]]*\\(.+\\) 应用于 my-string
将 \\1 附加到我的结果列表中
将 my-string 重新绑定到 \\2

但是，我不知道如何让 split-string 或 replace-regexp-in-string 做到这一点。

如何将此字符串拆分为可以使用的值？

(或者："我还没有找到哪个内置的emacs函数？")

类似的东西，但没有正则表达式：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

(defun parse-quotes (string)
(let ((i 0) result current quotep escapedp word)
(while (< i (length string))
(setq current (aref string i))
(cond
((and (char-equal current ?\\ )
(not quotep))
(when word (push word result))
(setq word nil escapedp nil))
((and (char-equal current ?\')
(not escapedp)
(not quotep))
(setq quotep t escapedp nil))
((and (char-equal current ?\')
(not escapedp))
(push word result)
(setq quotep nil word nil escapedp nil))
((char-equal current ?\\\\)
(when escapedp (push current word))
(setq escapedp (not escapedp)))
(t (setq escapedp nil)
(push current word)))
(incf i))
(when quotep
(error (format"Unbalanced quotes at %d"
(- (length string) (length word)))))
(when word (push result word))
(mapcar (lambda (x) (coerce (reverse x) 'string))
(reverse result))))

(parse-quotes"apple orange 'tasty things' 'my lunch' zucchini 'my dinner'")
("apple""orange""tasty things""my lunch""zucchini""my dinner")

(parse-quotes"apple orange 'tasty thing\\\'s' 'my lunch' zucchini 'my dinner'")
("apple""orange""tasty thing's""my lunch""zucchini""my dinner")

(parse-quotes"apple orange 'tasty things' 'my lunch zucchini 'my dinner'")
;; Debugger entered--Lisp error: (error"Unbalanced quotes at 52")

奖励：它还允许使用""转义引号，如果引号不平衡(到达字符串末尾，但未找到打开的引号的匹配项)将报告它。

相关讨论

您可能想看看 split-string-and-unquote.

这是使用临时缓冲区实现算法的简单方法。我不知道是否有办法使用 replace-regexp-in-string 或 split-string.

1
2
3
4
5
6
7
8
9
10
11
12
13

(defun my-split (string)
(with-temp-buffer
(insert string"") ;; insert the string in a temporary buffer
(goto-char (point-min)) ;; go back to the beginning of the buffer
(let ((result nil))
;; search for the regexp (and just return nil if nothing is found)
(while (re-search-forward"\\\\('[^']*?'\\\\|[[:alnum:]]+\\\\)\\\\([[:space:]]*\\\\(.+\\\\)\\\\)" nil t)
;; (match-string 1) is"\\1"
;; append it after the current list
(setq result (append result (list (match-string 1))))
;; go back to the beginning of the second part
(goto-char (match-beginning 2)))
result)))

示例：

1 2	(my-split"apple orange 'tasty things' 'my lunch' zucchini 'my dinner'") ==> ("apple""orange""'tasty things'""'my lunch'""zucchini""'my dinner'")

如果你经常操作字符串，你应该通过包管理器安装 s.el 库，它在一致的 API 下引入了大量的字符串实用函数。对于此任务，您需要函数 s-match，其可选的第三个参数接受起始位置。然后，您需要一个正确的正则表达式，尝试：

1	(concat"\\\\b[a-z]+\\\\b""\\\\\|""'[a-z ]+'")

\\| 表示匹配构成单词的字母序列(\\b 表示单词边界)或引号内的字母和空格序列。然后使用循环：

1
2
3
4
5

;; let s = given string, r = regex
(loop for start = 0 then (+ start (length match))
for match = (car (s-match r s start))
while match
collect match)

出于教育目的，我还使用递归函数实现了相同的功能：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

;; labels is Common Lisp's local function definition macro
(labels
((i
(start result)
;; s-match searches from start
(let ((match (car (s-match r s start))))
(if match
;; recursive call
(i (+ start (length match))
(cons match result))
;; push/nreverse idiom
(nreverse result)))))
;; recursive helper function
(i 0 '()))

由于 Emacs 缺乏尾调用优化，在大列表上执行它可能会导致堆栈溢出。因此，您可以使用 do 宏重写它：

1
2
3
4
5
6

(do* ((start 0)
(match (car (s-match r s start)) (car (s-match r s start)))
(result '()))
((not match) (reverse result))
(push match result)
(incf start (length match)))