How can I capture the results of splitting a string in elisp?
我在 elisp 中工作,我有一个代表项目列表的字符串。字符串看起来像
1 | "apple orange 'tasty things' 'my lunch' zucchini 'my dinner'" |
我正试图将其拆分为
1 | ("apple""orange""tasty things""my lunch""zucchini""my dinner") |
这是一个熟悉的问题。我解决它的障碍不是关于正则表达式,而是更多关于 elisp 的细节。
我想要做的是像这样运行一个循环:
-
(while (< (length my-string) 0) do-work)
-
将正则表达式
\\('[^']*?'\\|[[:alnum:]]+)\\([[:space:]]*\\(.+\\) 应用于my-string -
将
\\1 附加到我的结果列表中 -
将
my-string 重新绑定到\\2
但是,我不知道如何让
如何将此字符串拆分为可以使用的值?
(或者:"我还没有找到哪个内置的emacs函数?")
类似的东西,但没有正则表达式:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | (defun parse-quotes (string) (let ((i 0) result current quotep escapedp word) (while (< i (length string)) (setq current (aref string i)) (cond ((and (char-equal current ?\\ ) (not quotep)) (when word (push word result)) (setq word nil escapedp nil)) ((and (char-equal current ?\') (not escapedp) (not quotep)) (setq quotep t escapedp nil)) ((and (char-equal current ?\') (not escapedp)) (push word result) (setq quotep nil word nil escapedp nil)) ((char-equal current ?\\\\) (when escapedp (push current word)) (setq escapedp (not escapedp))) (t (setq escapedp nil) (push current word))) (incf i)) (when quotep (error (format"Unbalanced quotes at %d" (- (length string) (length word))))) (when word (push result word)) (mapcar (lambda (x) (coerce (reverse x) 'string)) (reverse result)))) (parse-quotes"apple orange 'tasty things' 'my lunch' zucchini 'my dinner'") ("apple""orange""tasty things""my lunch""zucchini""my dinner") (parse-quotes"apple orange 'tasty thing\\\'s' 'my lunch' zucchini 'my dinner'") ("apple""orange""tasty thing's""my lunch""zucchini""my dinner") (parse-quotes"apple orange 'tasty things' 'my lunch zucchini 'my dinner'") ;; Debugger entered--Lisp error: (error"Unbalanced quotes at 52") |
奖励:它还允许使用""转义引号,如果引号不平衡(到达字符串末尾,但未找到打开的引号的匹配项)将报告它。
您可能想看看
这是使用临时缓冲区实现算法的简单方法。我不知道是否有办法使用
1 2 3 4 5 6 7 8 9 10 11 12 13 | (defun my-split (string) (with-temp-buffer (insert string"") ;; insert the string in a temporary buffer (goto-char (point-min)) ;; go back to the beginning of the buffer (let ((result nil)) ;; search for the regexp (and just return nil if nothing is found) (while (re-search-forward"\\\\('[^']*?'\\\\|[[:alnum:]]+\\\\)\\\\([[:space:]]*\\\\(.+\\\\)\\\\)" nil t) ;; (match-string 1) is"\\1" ;; append it after the current list (setq result (append result (list (match-string 1)))) ;; go back to the beginning of the second part (goto-char (match-beginning 2))) result))) |
示例:
1 2 | (my-split"apple orange 'tasty things' 'my lunch' zucchini 'my dinner'") ==> ("apple""orange""'tasty things'""'my lunch'""zucchini""'my dinner'") |
如果你经常操作字符串,你应该通过包管理器安装
1 | (concat"\\\\b[a-z]+\\\\b""\\\\|""'[a-z ]+'") |
1 2 3 4 5 | ;; let s = given string, r = regex (loop for start = 0 then (+ start (length match)) for match = (car (s-match r s start)) while match collect match) |
出于教育目的,我还使用递归函数实现了相同的功能:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ;; labels is Common Lisp's local function definition macro (labels ((i (start result) ;; s-match searches from start (let ((match (car (s-match r s start)))) (if match ;; recursive call (i (+ start (length match)) (cons match result)) ;; push/nreverse idiom (nreverse result))))) ;; recursive helper function (i 0 '())) |
由于 Emacs 缺乏尾调用优化,在大列表上执行它可能会导致堆栈溢出。因此,您可以使用 do 宏重写它:
1 2 3 4 5 6 | (do* ((start 0) (match (car (s-match r s start)) (car (s-match r s start))) (result '())) ((not match) (reverse result)) (push match result) (incf start (length match))) |