Bash Script To Find Dollar Words Not As Fast As Was Hoping
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | #!/bin/bash #370101 total words in Words.txt line=$(cat line.txt) function wordcheck { letter=({a..z}) i=0 while ["$i" -le 25 ] do occurences["$i"]=$(echo $word | grep ${letter["$i"]} -o | wc -l) ((i++)) done ((line++)) } until ["$line" -ge"370102" ] do word=$(sed -n"$line"p Words.txt) wordcheck echo"$line"> line.txt x=0 while ["$x" -le '25' ] do y=$((x+1)) charsum["$x"]=$((${occurences[x]} * $y)) ((x++)) done wordsum=0 for n in ${charsum[@]} do (( wordsum += n )) done tput el printf"Word #" printf"$(($line - 1))" if ["$wordsum" = '100' ] then echo $word >> DollarWords.txt printf" " printf"$word " printf '$$$DOLLAR WORD$$$ ' else printf" Not A Dollar Word $word " tput cuu1 fi done |
1 2 | $ wc -l words.txt 370101 words.txt |
1 2 3 4 5 6 | c=0 while IFS= read -r word; do (( c+=1 )) done <words.txt echo"$c" # prints 370,101 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | lcl=' abcdefghijklmnopqrstuvwxyz' ucl=' ABCDEFGHIJKLMNOPQRSTUVWXYZ' while IFS= read -r word; do ws=0 for (( i=0; i<${#word}; i++ )); do ch=${word:i:1} if [["$ch" == [a-z] ]]; then x="${lcl%%$ch*}" (( ws +="${#x}" )) elif [["$ch" == [A-Z] ]]; then x="${ucl%%$ch*}" (( ws +="${#x}" )) fi done if (( ws==100 )); then echo"$word" fi done <words.txt |
1 2 3 4 5 6 7 8 9 10 11 12 | abactinally abatements abbreviatable abettors abomasusi abreption ... zincifies zinkify zithern zoogleas zorgite |
1 2 3 4 5 6 7 8 9 10 | import string lets={k:v for v,k in enumerate(string.lowercase, 1)} lets.update({k:v for v,k in enumerate(string.uppercase, 1)}) with open('/tmp/words.txt') as f: for word in f: word=word.strip() if sum(lets.get(c,0) for c in word)==100: print word |
1 2 | man bash csh dash ksh busybox find file sed tr gcc perl python make | tr '[:upper:][ \t]' '[:lower:] ' | sort -u > Words.txt |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #!/bin/bash # make an Associative Array of the 26 letters and values. declare -A lval=\($(seq 26 | for i in [{a..z}] ; do read x ; echo $i=$x ; done)\) while read word do # skip words that contain a non-letter [[ !"${word}" =~ ^[a-z]+$ ]] && continue sum=0 # process ${word} one character at a time while read -n 1 char do # here string dumps a newline on the end of ${word}, so we'll # run a quick test to break out of the loop for a non-letter [["${char}" != [a-z] ]] && break sum=$(( sum + lval[${char}] )) # from the referenced SO link - see above - the solutions of interest # use process substitution and printf to pass the desired string into # the while loop; I've replaced this with the 'here' string and added # the test to break the loop when we see the the newline character. #done < <(printf $s"${word}") done <<<"${word}" (( sum == 100 )) && \ echo"${word}" done < Words.txt |
- AGC的解决方案:37秒
- 上述溶液w/工艺替代:11秒
- 上面的解决方案w/here字符串:2.7秒
$(seq 26 | for/do/read/echo/done) 生成字符串列表"[a]=1[b]=2…[Z]=26declare -A lval=\( $(seq...done) \) :声明lval为关联数组,并加载前26个条目([a]=1[b]=2…[Z]=26)=~ 用于测试特定的模式;^ 表示模式的开始,$ 表示字符串的结束,[a-z]表示匹配a 和z 之间的任何字符,+ 表示匹配1个或多个字符。如果$word是a)仅由字母
a-z 组成,并且b)至少有一个字母,则"${word}" =~ ^[a-z]+$ 的计算结果为真。! 否定了模式测试;在这种情况下,我正在寻找任何具有非字母字符的单词[注意:有许多方法可以测试特定模式;这恰好是我选择用于此脚本的方法][[ !"${word}" ... ]] && continue :如果单词包含非字母,测试生成true 和(&& ),然后我们continue (即,我们对这个单词不感兴趣,所以跳到下一个单词;换句话说,跳到循环的下一个迭代)while read -n 1 char :一次解析输入(在本例中,${word} 作为'here'字符串传入)1个字符,将得到的字符串放入名为'char'的变量中。[["${char}" != [a-z] ]] && break :另一种/不同的模式匹配方法;这里我们测试单个字符$char变量,看它是否是字母,如果是(例如,evals为true),那么我们将break 退出当前循环;如果$char是字母(a-z),那么处理将继续执行循环中的下一个命令(本例中为sum=... )。(( sum == 100 )) && \ echo"${word}" :另一种运行测试的方法;在这种情况下,我们要测试字母的和是否为100;如果它的值为真,那么我们也要测试echo"${word}" [注:反斜杠(\ 表示继续下一行的命令]done <<<"${word}" :<<< 称为"这里"字符串;在这种情况下,它允许我将当前字符串(${word} 作为参数传递给while read -n 1 char 循环。
1 2 3 4 5 6 7 8 9 10 11 | # make an Associative Array of the 26 letters and values. declare -A lval=\($(seq 26 | for i in [{a..z}] ; do read x; echo $i=$x ; done)\) # spew out 240,000 words from some man pages. man bash csh dash ksh busybox find file sed tr gcc perl python make | tr '[:upper:][ \t]' '[:lower:] ' | sort -u | while read x ; do ["$x" ="${x//[^a-z]/}" ] && (( 100 == $(sed 's/./lval[&]+/g' <<< $x) 0 )) && echo"$x" done | head |
1 2 3 4 5 6 7 8 9 10 | accumulate activates addressing allmulti analysis applying augments backslashes bashopts boundary |
1 | (( 100 == $( hexdump -ve '/1"(%3i - 96) +" ' <<< $x ;) 86 )) |
1 | (102 - 96) + (111 - 96) + (111 - 96) + ( 10 - 96) + |
1 2 3 4 5 | while read x ; do ["$x" ="${x//[^a-z]/}" ] && (( 100 == $( hexdump -ve '/1"(%3i - 96) +" ' <<< $x ;) 86 )) && echo"$x" done < words.txt |
1 2 3 | man bash csh dash ksh busybox find file sed tr gcc perl python make | tr '[:upper:][ \t]' '[:lower:] ' | sort -u | egrep '^[a-z]+$' > words.txt |
1 2 3 4 5 6 7 | paste words.txt <(hexdump -ve '/1"%3i" ' < words.txt | sed 's/ *[^12]10[^0-9] */ /g;s/^ //;s/ $//' | sed 's/ \+\|$/ + -96 + /g;s/ + $//' ) | while read a b ; do (( 100 == $b )) && echo $a ; done |
工作原理:需要的是decdump(即decimal dump)将每一个单词在一个单独的行上。因为
1 2 3 | while read word; do ... done <Words.txt |
1 | occurences["$i"]=$(echo $word | grep ${letter["$i"]} -o | wc -l) |
1 2 | matches="${word//[^${letter[i]}]/}" occurences[i]="${#matches}" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | awk ' # initialize an array of character-to-number values BEGIN { # split our alphabet into an array: c[1]=a c[2]=b ... c[26]=z; # NOTE: assumes input is all lower case, otherwise we could either # add array values for upper case letters or modify processing to # convert all characters to lower case ... split("abcdefghijklmnopqrstuvwxyz", c,"") # build associative array to match letters w/ numeric values: # ord[a]=1 ord[b]=2 ... ord[z]=26 for (i=1; i <= 26; i++) { ord[c[i]]=i } } # now process our file of words { # loop through words; just in case more than 1 word per line (ie, NF > 1) word=1 while ( word <= NF ) { sum=0 # split our word into an array of characters split($word, c,"") # loop through our array of characters for (i=1; i <= length($word); i++) { # if not a letter then break out of loop if ( c[i] !~ /[a-z]/ ) { sum=999 break } # add letter to our running sum sum=sum + ord[c[i]] # if we go over 100 then break if ( sum >= 101 ) break } # end of character loop if ( sum == 100 ) print $word word++ } # end of word loop }' Words.txt |
dawg 的bash解决方案:3分钟32秒(比dawg 的机器慢2倍左右)在
awk 解决方案之上:3.5秒(在我的电脑以外的任何设备上都会更快)