打印包含字符串和第一个单词的单词

Question 1

Bash/grep 版本：

#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.

text_file="$1"
shift

for string; do
    # Find string in file. Process output one line at a time.
    grep "$string" "$text_file" | 
        while read -r line
    do
        # Get the first word of the line.
        first_word="${line%% *}"
        # Remove special characters from the first word.
        first_word="${first_word//[^[:alnum:]]/}"

        # If the first word is the same as the string, don't print it twice.
        if [[ "$string" != "$first_word" ]]; then
            echo -ne "$first_word\t"
        fi

        echo "$string"
    done
done

像这样调用它：

./string-and-first-word.sh /path/to/file text thing try Better

输出：

This    text
Another thing
It  try
Better

Answer

Bash/grep 版本：

#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.

text_file="$1"
shift

for string; do
    # Find string in file. Process output one line at a time.
    grep "$string" "$text_file" | 
        while read -r line
    do
        # Get the first word of the line.
        first_word="${line%% *}"
        # Remove special characters from the first word.
        first_word="${first_word//[^[:alnum:]]/}"

        # If the first word is the same as the string, don't print it twice.
        if [[ "$string" != "$first_word" ]]; then
            echo -ne "$first_word\t"
        fi

        echo "$string"
    done
done

像这样调用它：

./string-and-first-word.sh /path/to/file text thing try Better

输出：

This    text
Another thing
It  try
Better

Question 2

Perl 来救援！

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;

open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
    if (my ($match) = /$regex/) {
        print my ($first) = /^\S+/g;
        if ($match ne $first) {
            print "\t$match";
        }
        print "\n";
    }
}

另存为first-plus-word，运行为

perl first-plus-word file.txt text thing try Better

它根据输入的单词创建一个正则表达式。然后将每一行与正则表达式进行匹配，如果匹配，则打印第一个单词，如果第一个单词与该单词不同，则也打印该单词。

Answer

Perl 来救援！

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;

open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
    if (my ($match) = /$regex/) {
        print my ($first) = /^\S+/g;
        if ($match ne $first) {
            print "\t$match";
        }
        print "\n";
    }
}

另存为first-plus-word，运行为

perl first-plus-word file.txt text thing try Better

它根据输入的单词创建一个正则表达式。然后将每一行与正则表达式进行匹配，如果匹配，则打印第一个单词，如果第一个单词与该单词不同，则也打印该单词。

Question 3

这是一个 awk 版本：

awk '
  NR==FNR {a[$0]++; next;} 
  {
    gsub(/"/,"",$0);
    for (i=1; i<=NF; i++)
      if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
  }
  ' file2 file1

其中file2是单词列表并且file1包含短语。

Answer

这是一个 awk 版本：

awk '
  NR==FNR {a[$0]++; next;} 
  {
    gsub(/"/,"",$0);
    for (i=1; i<=NF; i++)
      if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
  }
  ' file2 file1

其中file2是单词列表并且file1包含短语。

Question 4

尝试这个：

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/p' File
This    text
Another thing
It      try
        Better

如果之前的制表符Better有问题，请尝试以下操作：

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/; ta; b; :a; s/^\t//; p' File
This    text
Another thing
It      try
Better

以上内容已在 GNU sed（gsed在 OSX 上调用）上进行了测试。对于 BSD sed，可能需要进行一些小改动。

怎么运行的

s/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/

这将查找一个单词，，[[:alnum:]]+后跟一个空格，，[[:space:]]后跟任何内容，，.*后跟您的一个单词，，text|thing|try|Better后跟任何内容。如果找到，则将其替换为行中的第一个单词（如果有）、制表符和匹配的单词。
ta; b; :a; s/^\t//; p

如果替换命令导致替换，即在该行中找到您要的单词之一，则该ta命令会告诉 sed 跳转到标签a。如果没有，则我们分支 ( b) 到下一行。 :a定义标签 a。因此，如果找到了您要的单词之一，我们 (a) 执行替换s/^\t//，删除前导制表符（如果有），然后 (b) 打印 ( p) 该行。

Answer

尝试这个：

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/p' File
This    text
Another thing
It      try
        Better

如果之前的制表符Better有问题，请尝试以下操作：

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/; ta; b; :a; s/^\t//; p' File
This    text
Another thing
It      try
Better

以上内容已在 GNU sed（gsed在 OSX 上调用）上进行了测试。对于 BSD sed，可能需要进行一些小改动。

怎么运行的

s/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/

这将查找一个单词，，[[:alnum:]]+后跟一个空格，，[[:space:]]后跟任何内容，，.*后跟您的一个单词，，text|thing|try|Better后跟任何内容。如果找到，则将其替换为行中的第一个单词（如果有）、制表符和匹配的单词。
ta; b; :a; s/^\t//; p

如果替换命令导致替换，即在该行中找到您要的单词之一，则该ta命令会告诉 sed 跳转到标签a。如果没有，则我们分支 ( b) 到下一行。 :a定义标签 a。因此，如果找到了您要的单词之一，我们 (a) 执行替换s/^\t//，删除前导制表符（如果有），然后 (b) 打印 ( p) 该行。

打印包含字符串和第一个单词的单词

答案1

答案2

答案3

答案4

怎么运行的

相关内容