关于javascript:JS突出显示匹配的内容x次

JS highlight matched content x times

如何突出显示X个事件

在纯JS中,如何突出显示来自一组文本的有限匹配子集,以便每个匹配只出现X个突出显示。

1
2
var matches = new Array('fox', 'dog');
var MaxHighlights = 2;

原始内容

那只敏捷的棕毛狐狸跳过那只懒狗,但那只懒狗很快就抓住了那只棕色狐狸。一般来说,狐狸和狗的比赛并不好。

突出显示的内容

那只敏捷的棕毛狐狸跳过那只懒狗,但那只懒狗很快就抓住了那只棕色狐狸。一般来说,狐狸和狗的比赛并不好。

对于额外的分数,我建议每个句子只突出显示一个匹配项。

首选突出显示的内容

那只敏捷的棕毛狐狸跳过那只懒狗,但那只懒狗很快就抓住了那只棕色狐狸。一般来说,狐狸和狗的比赛并不好。

我用这是我突出显示尝试的基础http://www.the-art-of-web.com/javascript/search-highlight


我的解决方案使用replace()替换模式中的单词边界和全局修饰符G。

replace的优点是,回调函数可以作为replacement传递。我希望你喜欢它,发现它非常有趣,因为还没有做太多的JS。因此,如果您发现任何错误,请更正:)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// test it
var WordsToMatch = new Array('fox', 'dog');

var MaxHighlights = 2;  // no limit = 0

var TestStr =
'The quick brown fox jumps over the lazy dog but the lazy dog is '+
'quick of the mark to catch the brown fox. In general the ' +
'fox versus the dog is not a good match.';

document.write(highlight(TestStr, WordsToMatch, MaxHighlights));

// --- JOHNNY 5's WORD HIGHLIGHTER ---

// highlight words in str using a callback function
function highlight (str, words, limit)
{
  for(var i = 0; i < words.length; i++)
  {
    // match each word case insensitive using word-boundaries
    var pattern = new RegExp("\\b" + words[i] +"\\b","gi");

    var j = 0;
    str = str.replace(pattern, function (w) {
      j++; return ((limit <= 0) || (j <= limit)) ?"" + w +"" : w;
    });
  }

  return str;
}

回调函数将返回突出显示的匹配项作为替换项,直到达到限制为止。

输出:

The quick brown fox jumps over the lazy dog but the lazy dog is quick of the mark to catch the brown fox. In general the fox versus the dog is not a good match.

编辑:现在我明白了,还有额外的积分…

For extra points I'd preferably only highlight one match per sentence.

这是一个更具挑战性,我希望它能起作用,因为它在大多数情况下都是应该的。判断一个句子是什么并不容易。好吧,我决定,简单一点,把分割序列看作是一个可定义的标点符号(var sep_punct),如果前面有一个大写字母或数字,后面跟着一个或多个空格。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
var WordsToMatch = new Array('fox', 'dog');

var TestStr =
'The quick brown fox jumps over the lazy dog but the lazy dog is '+
'quick of the mark to catch the brown fox. In general the ' +
'fox versus the dog is not a good match.';

// --- JOHNNY 5's FIRST WORD IN SENTENCE HIGHLIGHTER ---

// highlight first occurence of word in each sentence
function higlight_first_w_in_sentence(str, words)
{
  // split the string at what we consider a sentence:
  // new sentences usually start with upper letters, maybe digits
  // split-sequence: sep_punct, followed by one or more whitespaces,
  // looking ahead for an upper letter or digit
  var sep_punct = '[.;?!]';

  // set the split-pattern, starting with sep_punct
  var pattern = new RegExp(sep_punct +"\\s+(?=[A-Z0-9])","g");

  // remember split-sequence
  var sep = str.match(pattern);

  // split str into sentences
  var snt = str.split(pattern);

  // check sentences split
  if((typeof snt != 'undefined') && (Object.prototype.toString.call(snt) === '[object Array]'))
  {
    // now we loop through the sentences...
    for(var i = 0; i < snt.length; i++)
    {
      // and match each word case insensitive using word-boundaries (zero-with)
      for(var j = 0; j < words.length; j++)
      {
        var pattern = new RegExp("\\b" + words[j] +"\\b","i");

        // and replace it with highlighted reference 0,
        // which is $& in JS regex (part, that matches the whole pattern)
        snt[i] = snt[i].replace(pattern,"$&");
      }
    }

    // if seperators, rejoin string
    if((typeof sep != 'undefined') && (Object.prototype.toString.call(sep) === '[object Array]') && (sep.length > 0) &&
       (typeof snt != 'undefined') && (Object.prototype.toString.call(snt) === '[object Array]') && (snt.length > sep.length)
      )
    {
      var ret ="";
      for(var j = 0; j < snt.length; j++)
      {
        if(j>0) {
          ret += (typeof sep[j-1] != 'undefined') ? sep[j-1] :"";
        }

        ret += snt[j];
      }

      return ret;
    }

    // if no seperators
    return snt.join("");
  }

  // if failed
  return str;
}

document.write(higlight_first_w_in_sentence(TestStr, WordsToMatch));

输出:

The quick brown fox jumps over the lazy dog but the lazy dog is quick of the mark to catch the brown fox. In general the fox versus the dog is not a good match.


我已经有一段时间没有做过javascript了,所以这段代码可能有些生疏:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
matches = new Array('fox', 'dog');
originalContent = 'The quick brown fox jumps over the lazy dog but the lazy dog is quick of the mark to catch the brown fox. In general the fox versus the dog is not a good match.';

document.write(
                highlight(originalContent, matches, 2)
                + '' +
                preferredHighlight(originalContent, matches, 2)
);

function highlight(input, matches, max){
    var matchesStatistics = new Array();
    for(i = 0, c = matches.length; i < c;i++){ // Performance !!!
        matchesStatistics[matches[i]] = 0;
    }
    var re = new RegExp('\\b(?:' + matches.join('|') + ')\\b', 'g'); // Words regex
    var highlightedContent = input.replace(re, function(group0){
        matchesStatistics[group0]++;
        if(matchesStatistics[group0] > max){
            return group0;
        }else{
            return '' + group0 + '';
        }
    });
    return highlightedContent;
}

function preferredHighlight(input, matches, max){
    var sentenceRe = new RegExp('[\\s\\S]*?(?:[.?!]|$)', 'g'); // Sentence regex
    var wordRe = new RegExp('\\b(?:' + matches.join('|') + ')\\b', 'g'); // Words regex

    var highlightedContent = input.replace(sentenceRe, function(sentence){
        var matchesStatistics = 0;
        modifiedSentence = sentence.replace(wordRe, function(group0){
            matchesStatistics++;
            if(matchesStatistics > max){
                return group0;
            }else{
                return '' + group0 + '';
            }
        });
        return modifiedSentence;
    });
    return highlightedContent;
}

输出:

The quick brown fox jumps over the lazy dog but the lazy dog is quick of the mark to catch the brown fox. In general the fox versus the dog is not a good match.

The quick brown fox jumps over the lazy dog but the lazy dog is quick of the mark to catch the brown fox. In general the fox versus the dog is not a good match.

号解释正则表达式

  • 单词regex:我们使用.join('|')来连接数组元素,因此在本例中。我们的regex看起来像\\b(?:fox|dog)\\b。我们使用\b来确保只匹配fox,而不匹配firefox。需要双重逃逸。当然,将g修饰符设置为"全部替换"。
  • 句子regex:好吧,让我们把它分开:
    • [\\s\\S]*?:匹配任何不规则的零次或更多次。
    • (?:[.?!]|$):匹配.?!或行尾。
    • g修饰语:全部匹配。

突出显示功能:

我们的想法是创建一个数组来记住我们对某些单词进行了多少匹配。因此,在我们的示例中运行以下代码时:

1
2
3
4
var matchesStatistics = new Array();
for(i = 0, c = matches.length; i < c;i++){ // Performance !!!
    matchesStatistics[matches[i]] = 0;
}

我们有一个数组,它看起来像:

1
2
3
4
Array(
   "fox" => 0,
   "dog" => 0
)

然后我们匹配我们的单词,并使用一个函数作为回调来检查我们匹配了多少,以及是否应该突出显示。

首选突出显示功能:

我们基本上,首先匹配每个句子,然后替换单词(在每个句子中)。这里的词语也有限制。

ONLINE拆卸工具

参考文献:

  • 在javascript中使用var和不使用var的区别
  • 如何将变量传递给正则表达式javascript?
  • [C]在这个for循环中要调用多少次strlen()?