关于算法：Project Euler No. 14 Haskell

Project Euler No. 14 Haskell

我正在尝试解决ProjectEuler的第14个问题(http://projectEuler.net/problem=14)，我使用haskell达到了一个死胡同。

现在，我知道数字可能足够小，我可以做一个蛮力，但这不是我练习的目的。我试图记住Map Integer (Bool, Integer)型Map的中间结果，其含义如下：

1
2
3
4

- the first Integer (the key) holds the number
- the Tuple (Bool, Interger) holds either (True, Length) or (False, Number)
where Length = length of the chain
Number = the number before him

前任：

1
2
3
4
5
6
7
8
9
10
11
12

for 13: the chain is 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
My map should contain :
13 - (True, 10)
40 - (False, 13)
20 - (False, 40)
10 - (False, 20)
5 - (False, 10)
16 - (False, 5)
8 - (False, 16)
4 - (False, 8)
2 - (False, 4)
1 - (False, 2)

号

现在，当我搜索另一个数字如40时，我知道链上有(10 - 1) length等。现在，如果我搜索10，不仅要告诉我10的长度是(10 - 3) length并更新地图，还要更新20，40，以防它们仍然存在(错误，u)

我的代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

import Data.Map as Map

solve :: [Integer] -> Map Integer (Bool, Integer)
solve xs = solve' xs Map.empty
where
solve' :: [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
solve' [] table = table
solve' (x:xs) table =
case Map.lookup x table of
Nothing -> countF x 1 (x:xs) table
Just (b, _) ->
case b of
True -> solve' xs table
False -> {-WRONG-} solve' xs table

f :: Integer -> Integer
f x
| x `mod` 2 == 0 = x `quot` 2
| otherwise = 3 * x + 1

countF :: Integer -> Integer -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
countF n cnt (x:xs) table
| n == 1 = solve' xs (Map.insert x (True, cnt) table)
| otherwise = countF (f n) (cnt + 1) (x:xs) $ checkMap (f n) n table

checkMap :: Integer -> Integer -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
checkMap n rez table =
case Map.lookup n table of
Nothing -> Map.insert n (False, rez) table
Just _ -> table

在-错误部分，我们应该像下面的示例一样更新所有值：

1
2
3
4
5
6
7
8
9
10
11
12
13

--We are looking for 10:
10 - (False, 20)
|
V {-finally-} update 10 => (True, 10 - 1 - 1 - 1)
20 - (False, 40) ^
| |
V update 20 => 20 - (True, 10 - 1 - 1)
40 - (False, 13) ^
| |
V update 40 => 40 - (True, 10 - 1)
13 - (True, 10) ^
| |
---------------------------

。

问题是我不知道在一个函数中是否可以做两件事，比如更新一个数字并继续递归。在C类语言中，我可以做一些类似(伪代码)的事情：

1
2
3
4
5
6
7
8
9
10
11
12

void f(int n, tuple(b,nr), int &length, table)
{
if(b == False) f (nr, (table lookup nr), 0, table);
// the bool is true so we got a length
else
{
length = nr;
return;
}
// Since this is a recurence it would work as a stack, producing the right output
table update(n, --cnt);
}

最后一条指令可以工作，因为我们通过引用发送cnt。我们也知道它会在某一点结束，cnt不应小于1。

相关讨论

最简单的优化(如您所确定的)是记忆化。您尝试自己创建一个记忆系统，但是遇到了如何存储记忆值的问题。有一些以可维护的方式来实现这一点的解决方案，例如使用状态monad或starray。然而，有一个更简单的解决方案来解决你的问题-使用haskell现有的记忆。默认情况下，haskell会记住常量值，因此如果您创建一个存储collatz值的值，它将被自动记忆！

一个简单的例子是下面的斐波那契定义：

1
2
3

fib :: Int -> Integer
fib n = fibValues !! n where
fibValues = 1 : 1 : zipWith (+) fibValues (tail fibValues)

fibValues是一个[Integer]，因为它只是一个常量，所以被记忆化了。然而，这并不意味着所有的事情都会立刻被记忆化，因为这是一个内讧列表，永远不会结束。相反，这些值只在需要时计算，因为haskell是懒惰的。

因此，如果你对你的问题做了类似的事情，你将在没有大量工作的情况下获得记忆。但是，使用上面这样的列表在您的解决方案中不会很好地工作。这是因为collatz算法使用许多不同的值来获得给定数字的结果，所以使用的容器需要随机访问才能有效。显而易见的选择是一个数组。

1	collatzMemoized :: Array Integer Int

号

接下来，我们需要用正确的值填充数组。我将编写这个函数，假设存在一个计算任意n的collatz值的collatz函数。另外，请注意数组的大小是固定的，因此需要使用一个值来确定要记忆的最大数目。我将使用一百万，但任何值都可以使用(这是内存/速度的折衷)。

1 2	collatzMemoized = listArray (1, maxNumberToMemoize) $ map collatz [1..maxNumberToMemoize] where maxNumberToMemroize = 1000000

这很简单，listArray是有界的，并且该范围内所有collatz值的列表都是给它的。记住，这不会直接计算所有collatz值，因为这些值比较懒。

现在，可以编写collatz函数。最重要的是，如果要检查的数字在其界限内，则只检查collatzMemoized数组：

1
2
3
4
5
6
7
8
9
10

collatz :: Integer -> Int
collatz 1 = 1
collatz n
| inRange (bounds collatzMemoized) nextValue = 1 + collatzMemoized ! nextValue
| otherwise = 1 + collatz nextValue
where
nextValue = case n of
1 -> 1
n | even n -> n `div` 2
| otherwise -> 3 * n + 1

。

在ghci中，您现在可以看到记忆化的有效性。试试collatz 200000。完成大约需要2秒钟。但是，如果您再次运行它，它将立即完成。

最后，可以找到解决方案：

1 2	maxCollatzUpTo :: Integer -> (Integer, Int) maxCollatzUpTo n = maximumBy (compare `on` snd) $ zip [1..n] (map collatz [1..n]) where

然后打印：

1	main = print $ maxCollatzUpTo 1000000

。

如果运行main，结果将在大约10秒内打印出来。

现在，这种方法的一个小问题是它使用了大量的堆栈空间。它在ghci中可以很好地工作(在堆栈空间方面似乎使用更灵活)。但是，如果编译它并尝试运行可执行文件，它将崩溃(堆栈空间溢出)。因此，要运行这个程序，在编译时必须指定更多。这可以通过在编译选项中添加-with-rtsopts='K64m'来实现。这会将堆栈增加到64MB。

现在可以编译和运行程序：

1	> ghc -O3 --make -with-rtsopts='-K6m' problem.hs

。

运行./problem将在不到一秒钟的时间内得到结果。

相关讨论

你要做的是记忆化，这是一种很难的方式，尝试在Haskell中编写一个命令式程序。借鉴David Eisenstat的解决方案，我们将按照J_Random_Hacker的建议进行解决：

1
2
3
4
5

collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)

为此，动态编程解决方案是用在表中查找内容来替换递归。让我们做一个函数，在这里我们可以替换递归调用：

1
2
3
4
5

collatzLengthDef :: (Integer -> Integer) -> Integer -> Integer
collatzLengthDef r n
| n == 1 = 1
| even n = 1 + r (n `div` 2)
| otherwise = 1 + r (3*n + 1)

现在我们可以将递归算法定义为

1 2	collatzLength :: Integer -> Integer collatzLength = collatzLengthDef collatzLength

。

现在，我们还可以创建一个表版本(它需要一个数字作为表大小，并返回一个使用该大小的表计算的collatzlength函数)：

1
2
3
4
5
6
7
8
9
10
11
12
13

-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array $ map (\x -> (x, f x)) $ range bounds

collatzLengthTabled :: Integer -> Integer -> Integer
collatzLengthTabled n = collatzLengthTableLookup
where
bounds = (1, n)
table = buildTable bounds (collatzLengthDef collatzLengthTableLookup)
collatzLengthTableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (collatzLengthDef collatzLengthTableLookup) x

。

这是通过将collatzlength定义为表查找来实现的，其中表是函数的定义，但是递归调用被表查找替换。表查找函数检查函数的参数是否在表中所列的范围内，并返回函数的定义。我们甚至可以将此功能用于建立这样的功能：

1
2
3
4
5
6
7
8
9
10

tableRange :: (Ix a) => (a, a) -> ((a -> b) -> a -> b) -> a -> b
tableRange bounds definition = tableLookup
where
table = buildTable bounds (definition tableLookup)
tableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (definition tableLookup) x

collatzLengthTabled n = tableRange (1, n) collatzLengthDef

你只需要确保

1 2	let memoized = collatzLengthTabled 10000000 ... memoized ...

。

所以在内存中只构建了一个表。

我手边没有haskell编译器，所以我为任何损坏的代码道歉。

没有记忆，就有一个功能

1
2
3
4
5

collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)

对于memoization，类型签名是

1	memoCL :: Map Integer Integer -> Integer -> (Map Integer Integer, Integer)

号

因为memoCL接收一个表作为输入，并将更新后的表作为输出。memoCL需要做的是用let形式截获递归调用的返回，并插入新的结果。

1
2
3
4
5
6
7
8

-- table must have an initial entry for 1

memoCL table n = case Map.lookup n table of
Just m -> (table, m)
Nothing -> let (table', m) = memoCL table (collatzStep n) in (Map.insert n (1 + m) table', 1 + m)

collatzStep :: Integer -> Integer
collatzStep n = if even n then n `div` 2 else 3*n + 1

有时你会对上面的成语感到厌烦。现在是单子的时候了。

我记得在Haskell中发现动态编程算法的Memoisation非常违反直觉，而且我已经做了一段时间了，但希望下面的技巧对您有用。

但首先，我不太了解您当前的DP方案，尽管我怀疑它可能非常低效，因为它似乎需要为每个答案更新许多条目。(a)我不知道如何在Haskell中这样做，(b)您不需要这样做来有效地解决问题；-)

我建议使用以下方法：首先构建一个普通的递归函数，它为一个输入数字计算正确的答案。(提示：它将有一个类似于collatzLength :: Int -> Int的签名。)当您让这个函数工作时，只需使用关联列表用array函数延迟定义元素的数组的定义替换它的定义，并替换对数组查找的函数的所有递归调用(例如collatzLength 42将成为collatzLength ! 42。这将以必要的顺序自动填充数组！所以您的"顶级"collatzLength对象现在实际上是一个数组，而不是一个函数。

正如我上面所建议的，我将使用数组而不是map数据类型来保存dp表，因为您需要存储从1到1000000的所有整数索引的值。

我最终修改了"错误"部分，以便在调用mark x (b, n) [] xs table时执行它应该执行的操作，其中

1
2
3
4
5
6
7
8
9
10
11
12
13
14

mark :: Integer -> (Bool, Integer) -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
mark crtElem (b, n) list xs table
| b == False = mark n (findElem n table) (crtElem:list) xs table
| otherwise = continueWith n list xs table

continueWith :: Integer -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
continueWith _ [] xs table = solve' xs table
continueWith cnt (y:ys) xs table = continueWith (cnt - 1) ys xs (Map.insert y (True, cnt - 1) table)

findElem :: Integer -> Map Integer (Bool, Integer) -> (Bool, Integer)
findElem n table =
case Map.lookup n table of
Nothing -> (False, 0)
Just (b, nr) -> (b, nr)

但是，有比这1更好(更不冗长)的答案

既然我们在研究递归方案，这里有一个给你。

让我们考虑函数n(a，b，x)=a+b*x，它是一个b流，最后一个元素是a。

1
2
3
4
5
6
7
8
9
10
11

{-# LANGUAGE DeriveFunctor
, TypeFamilies
, TupleSections #-}

import Data.Functor.Foldable
import qualified Data.Map as M
import Data.List
import Data.Function
import Data.Int

data N a b x = Z a | S b x deriving (Functor)

此流对于几种迭代都很方便。首先，我们可以使用它来表示collatz序列中的一系列整数：

1
2
3
4
5
6

type instance Base Int64 = N Int Int64

instance Foldable Int64 where
project 1 = Z 1
project x | odd x = S x $ 3*x+1
project x = S x $ x `div` 2

号

这只是一个代数，而不是初始代数，因为转换不是同构(相同的整数链是2*x和(x-1)/3的一个链的一部分)，但这足以表示固定点基Int64 Int64。

有了这个定义，cata将把链子输入给给它的代数，你可以用它来构造一个链子长度的整数的备忘图。最后，变形可以使用它来生成一系列解决不同大小问题的解决方案：

1
2
3
4
5
6
7

problems = ana (uncurry $ cata . phi) (M.empty, 1) where
phi :: M.Map Int64 Int ->
Base Int64 (Prim [(Int64, Int)] (M.Map Int64 Int, Int64)) ->
Prim [(Int64, Int)] (M.Map Int64 Int, Int64)
phi m (Z v) = found m 1 v
phi m (S x ~(Cons (_, v') (m', _))) = maybe (notFound m' x v') (found m x) $
M.lookup x m

"~before"(cons…)表示延迟模式匹配。在需要这些值之前，我们不会触及模式。如果不进行懒惰的模式匹配，它将始终构建整个链，并且使用映射将是无用的。如果x的链长不在映射中，那么使用惰性模式匹配，我们只构造值v'和m'。

helper函数构造(int，chain-length)对的流：

1 2	found m x v = Cons (x, v) (m, x+1) notFound m x v = Cons (x, 1+v) (M.insert x (1+v) m, x+1)

。

现在只需考虑前999999个问题，并找出链最长的问题：

1	main = print $ maximumBy (compare `on` snd) $ take 999999 problems

这比基于数组的解决方案工作得慢，因为映射查找是映射大小的对数，但此解决方案不是固定大小。不过，它还是在5秒钟内完成。

也许你会发现我是如何解决这个问题的。虽然它可能不是地球上最有效的东西，但它的功能相当强大。

您可以在这里找到代码：https://github.com/fmancinelli/project-euler/blob/master/haskell/project-euler/problem014.hs

P.S.：免责声明：我做项目欧拉练习是为了学习哈斯克尔，所以解决方案的质量可能会有争议。