How can I make a Haskell parser from a list of words?
我是 Haskell 初学者,使用 Attoparsec 在文本中查找一些颜色表达式。例如,我希望能够匹配文本中的"浅蓝绿色"和"浅蓝绿色"。但当然,我需要一个针对任何这样的字符串的通用解决方案。所以我一直在想它会像
1 2 3 4 | "light">> sep >>"blue">> sep >>"green" where sep = inClass"\ \ -" |
换句话说,我认为我需要一种将
1 2 3 4 5 6 7 | import qualified Data.Text as T import Data.Attoparsec.Text -- | Makes a parser from a list of words, accepting -- spaces, newlines, and hyphens as separators. wordListParser :: [T.Text] -> Parser wordListParser wordList = -- Some magic here |
或者我可能完全错误地考虑了这个问题,还有更简单的方法吗?
编辑:这个最小的非工作示例感觉就像它几乎就在那里:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | {-# LANGUAGE OverloadedStrings #-} import Replace.Attoparsec.Text import Data.Attoparsec.Text as AT import qualified Data.Text as T import Control.Applicative (empty) wordListParser :: [T.Text] -> Parser T.Text wordListParser (w:ws) = string w >> satisfy (inClass" -") >> wordListParser ws wordListParser [w] = string w wordListParser [] = empty -- or whatever the empty parser is main :: IO () main = parseTest (wordListParser (T.words"light green blue"))"light green-blue" |
我认为可以用
之类的东西运行
1 | stack runhaskell ThisFile.hs --package attoparsec replace-attoparsec text |
这就是我要做的,假设你有一个颜色的数据类型;如果您不这样做,只需将其替换为您正在使用的内容。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import Prelude hiding (concat, words) import Control.Applicative ((<|>)) import Data.Attoparsec.Text import Data.List (intersperse) import Data.Text (concat, pack, singleton, Text, words) data Colour = LightBlue | DarkBlue | VibrantRed deriving Show parseColourGen :: Text -> Parser [Text] parseColourGen = sequence . intersperse (mempty <$ many1 legalSep) . fmap string . words parseColour :: [(Text, Colour)] -> Parser Colour parseColour = foldl1 (<|>) . fmap (\\(text, colour) -> colour <$ parseColourGen text) legalSep :: Parser Text legalSep = singleton <$> satisfy (inClass"\ \ -") |
然后您可以将您的
1 2 | wordList :: [(Text, Colour)] wordList = [("light blue", LightBlue), ("dark blue", DarkBlue), ("vibrant red", VibrantRed)] |
这样,您可以在一个地方配置所有颜色及其对应的颜色名称,然后您可以像这样运行解析器:
1 2 | > parse (parseColour wordList) $ pack"vibrant-red" Done"" VibrantRed |
编辑
在编辑您的问题后,我想我更了解您想要什么。 FWIW,我仍然更喜欢上面的解决方案,但这里是如何修复你的最后一个代码块:
这是您的代码现在的样子:
1 2 3 4 5 6 7 8 | wordListParser :: [Text] -> Parser Text wordListParser [w] = string w wordListParser (w:ws) = do a <- string w b <- satisfy (inClass" -") c <- wordListParser ws return (a `append` (singleton b) `append` c) -- singleton :: Char -> Text wordListParser [] = empty |
最后一件事:您当前的实现不会解析 Windows 换行符 (
\
我不熟悉 attoparsec,但您可以使用递归解决方案:
1 2 3 4 5 6 | wordListParser :: [T.Text] -> Parser wordListParser [] = empty wordListParser [w] = text w wordListParser (w:ws) = text w >> inClass"\ \ -">> wordListParser ws |