how to remove a pattern from many files
这是我的档案。
1 2 3 4 5 6 7 8 9 10 | ... <!--START: Google Analytics ---> <script type="text/javascript" src="../src/goog/ga_body.js"> <!--END: Google Analytics ---> </body> </html> ... |
如何删除包括
1 2 3 4 | <!--START: Google Analytics ---> <script type="text/javascript" src="../src/goog/ga_body.js"> <!--END: Google Analytics ---> |
将会消失。这将被留下,也就是说,这不是什么,4行将被替换为什么。
1 2 3 4 | <nothing here 4 lines deleted> </body> </html> |
我正在考虑在bash中进行,所以sed和awk可能是我的最佳选择,尽管python可能更好。
编辑1这是我以前写过的,但可能是非常糟糕的编码,我将完成这个
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | #HEre I want to find 2 patterns and delete whats in between #this example works #this is the 2 patterns I want to fine Start and End #have to use some escape characters here for this to show properly # have to use for it to appear in this format #<!-- Start of StatCounter Code for DoYourOwnSite --> # text would go here #<!-- End of StatCounter Code for DoYourOwnSite -->> #b="<!-- Start of StatCounter Code for DoYourOwnSite -->" #b2="<!-- End of StatCounter Code for DoYourOwnSite -->" #p1="PATTERN-1" #p2="PATTERN-2" p1="<!-- Start of StatCounter Code for DoYourOwnSite -->" p2="<!-- End of StatCounter Code for DoYourOwnSite -->" fname="*.html" num_of_files_pattern1=ls #grep $p1 fname echo"fname(s) to apply the sed to:" echo $fname echo"num_of_files_pattern1 is:" echo $num_of_files_pattern1 echo"Pattern1 is equal to:" echo $p1 echo"Pattern2 is equal to:" echo $p2 #this is current dir where the script is DIR="$( cd"$( dirname"${BASH_SOURCE[0]}" )" && pwd )" echo"DIR is equal to:" echo $DIR #cd to the dir where I want to copy the files to: cd"$DIR" # this will find the pattern <\head> in all the .html files and place"This should appear before the closing head tag" this before it # it will also make a backup with .bak extension #sed -i.bak '/<\\head>/i\This should appear before the closing head tag' *.html echo"sed on the file" # this does the head part #sed '/PATTERN-1/,/PATTERN-2/d' *.txt # this works #sed"/$p1/,/$p2/d" *.txt # this works #sed"/$p1/,/$p2/d" $fname # this works sed -i.bak"/$p1/,/$p2/d" $fname # this works |
编辑2
这就是我最后得出的结论,但下面有一个更有力的答案:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | # ------------------------------------------------------------------ # [author] find2PatternsAndDeleteTextInBetween.sh # Description # Here I want to find 2 patterns and delete what's in between # this example works # # EXAMPLE: # this is the 2 patterns I want to find Start and End # <!-- Start of StatCounter Code for DoYourOwnSite --> # text would go here # <!-- End of StatCounter Code for DoYourOwnSite -->> # # ------------------------------------------------------------------ p1="<!--START: Google Analytics --->" p2="<!--END: Google Analytics --->" fname=".html" echo"fname(s) to apply the sed to:" echo *"$fname" echo -e" " echo"Pattern1 is equal to:" echo -e"$p1 " echo"Pattern2 is equal to:" echo -e"$p2 " echo -e"PWD is: $PWD " echo"sed on the file" #sed '/PATTERN-1/,/PATTERN-2/d' *.txt # this works #sed"/$p1/,/$p2/d" *.txt # this works #sed"/$p1/,/$p2/d" $fname # this works sed -i.bak"/$p1/,/$p2/d" *"$fname" # this works |
需要考虑的事项:
1 2 3 4 5 6 7 | $ awk '/<!--(START|END): Google Analytics --->/{f=!f;next} !f' file ... </body> </html> ... |
1 | $ sed -i'.bak' '/<!--START/,/<!--END/d' file |
如果您有其他带有类似标签的行,请添加更多的图案。
对于多个文件,例如file1、..file4
1 | $ for f in file{1..4}; do sed -i'.bak' '/<!--START/,/<!--END/d'"$f"; done |
从您问题中的脚本来看,您似乎已经知道如何使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | #!/usr/bin/env bash # cd to the dir. in which this script is located. # CAVEAT: Assumes that the script wasn't invoked through a *symlink* # located in a different dir. cd --"$(dirname --"$BASH_SOURCE")" || exit fpattern='*.html' # specify source-file globbing pattern shopt -s failglob # make sure that globbing expands to nothing if nothing matches fnames=( $fpattern ) # expand to matching files and store in array num_of_files_matching_pattern=${#fnames[@]} # count matching files (( num_of_files_matching_pattern > 0 )) || exit # abort, if no files match printf '%s %s '"Running from:""$PWD" printf '%s %s '"Pattern matching the files to process:""$fpattern" printf '%s %s '"# of matching files:""$num_of_files_matching_pattern" # Determine the range-endpoint-identifier-line regular expressions. # CAVEAT: Make sure you escape any regular-expression metacharacters you want # to be treated as *literals*. p1='^<!--START: Google Analytics --->$' p2='^<!--END: Google Analytics --->$' # Remove the range identified by its endpoints from all matching input files # and save the original files with extension '.bak' sed -i'.bak'"/$p1/,/$p2/d""${fnames[@]}" || exit |
另外:我建议不要在脚本文件名中使用后缀
文件中的shebang行足以告诉系统将脚本传递给哪个shell/解释器。
如果不指定为后缀,则可以在以后自由地更改实现(例如,改为python),而不会破坏依赖脚本的现有程序。
在目前的情况下,假设使用
bash 实际上是可以接受的,.sh 将是误导性的,因为它建议使用sh 只提供脚本。
确定正在运行的脚本的真实目录,即使通过位于不同目录中的symlink调用脚本:
如果您可以假设一个Linux平台(或者至少是GNU
readlink 平台),请使用:1dirname --"$(readlink -e --"$BASH_SOURCE")"否则,需要一个更复杂的辅助功能解决方案-请参阅我的答案。