bash 命令输出前一个管道的结果

Question 1

您可以使用 while 循环来做到这一点：

while read l; do
  [ ${#l} -gt 65 ] && \
    echo "$l" | langid --line | grep -q "is" && \
    echo "$l"
done <file

read l逐行读取输入并将当前行存储在变量中$l。
[ ${#l} -gt 65 ]如果该行包含超过 65 个字符。
- echo "$l" | langid --line | grep -q "is"处理该行，grep对于语言，请注意-q,grep将保持沉默。我们只是想检查字符串是否存在，没有输出。
- echo "$l"如果该字符串存在，则打印原始行。
<file使用的内容file作为输入。

编辑：上面langid每行运行命令，这非常慢。如果您希望它在一次传输中运行（更快），请使用以下命令：

awk 'FNR==NR{a[NR]=$0}
  FNR!=NR&&$1~"is"{print a[FNR]}' \
<(sed -n '/^.\{65\}/p' file) \
<(sed -n '/^.\{65\}/p' file | langid --line)

awk处理两个“文件”：
- 的输出sed -n '/^.\{65\}/p' file：所有包含 65 个或更多字符的句子。
- 其输出sed -n '/^.\{65\}/p' file | langid --line在一次传输中处理包含 65 个或更多字符的所有行。
里面awk：
- FNR==NR适用于第一个“文件”
- a[NR]=$0使用行号作为索引填充数组
- FNR!=NR&&$1~"is"适用于第二个“文件”并检查该行是否包含该字符串is
- print a[FNR]a如果是这样，则打印先前创建的包含原始句子的数组中的相应行。

Answer

您可以使用 while 循环来做到这一点：

while read l; do
  [ ${#l} -gt 65 ] && \
    echo "$l" | langid --line | grep -q "is" && \
    echo "$l"
done <file

read l逐行读取输入并将当前行存储在变量中$l。
[ ${#l} -gt 65 ]如果该行包含超过 65 个字符。
- echo "$l" | langid --line | grep -q "is"处理该行，grep对于语言，请注意-q,grep将保持沉默。我们只是想检查字符串是否存在，没有输出。
- echo "$l"如果该字符串存在，则打印原始行。
<file使用的内容file作为输入。

编辑：上面langid每行运行命令，这非常慢。如果您希望它在一次传输中运行（更快），请使用以下命令：

awk 'FNR==NR{a[NR]=$0}
  FNR!=NR&&$1~"is"{print a[FNR]}' \
<(sed -n '/^.\{65\}/p' file) \
<(sed -n '/^.\{65\}/p' file | langid --line)

awk处理两个“文件”：
- 的输出sed -n '/^.\{65\}/p' file：所有包含 65 个或更多字符的句子。
- 其输出sed -n '/^.\{65\}/p' file | langid --line在一次传输中处理包含 65 个或更多字符的所有行。
里面awk：
- FNR==NR适用于第一个“文件”
- a[NR]=$0使用行号作为索引填充数组
- FNR!=NR&&$1~"is"适用于第二个“文件”并检查该行是否包含该字符串is
- print a[FNR]a如果是这样，则打印先前创建的包含原始句子的数组中的相应行。

Question 2

如果你的 shell 是 bash，你可以这样做：

sed -n '/^.\{65\}/p' www.mbl.is | while read line ; do
   LANGID=$(echo "$line" | langid --line)
   if [[ "$LANGID" =~ is ]] ; then
      echo "$line: $LANGID"
   fi
done

但这会非常慢，因为它运行多个实例langid（每个输入行一个）。您可能最好编写一个导入 langid 的 python 脚本，如 github 上的自述文件中所述。如上所述，一个简单的循环读取 stdin 并将其传递给langid.classify()就可以了。

我的 python 非常生锈，而且我没有安装 langid.py，所以这是未经测试的，但这是一个非常原始的 python 示例：

#! /usr/bin/python

import langid, fileinput, re

for line in fileinput.input():
  if len(line) > 65:
    id = langid.classify(line)
    if re.match(r'is',id):
      print line, ": ", id

它确实通过了编译测试，python -m py_compile langtest.py但这就是我能说的对它有利的一切。

由霜冻软件添加：

一个经过很大改进并可能经过测试和工作的版本：

#! /usr/bin/python

import sys, codecs, re
from fileinput import input as file
from langid import classify

#Output STDOUT as UTF-8
sys.stdout = codecs.getwriter("utf8")(sys.stdout)
sys.stderr = codecs.getwriter("utf8")(sys.stderr)

#read text as a positional argument and procss each line
for line in file():
    #check if line is greater than 65 characters
    if len(line) > 65:
        #determine the language of each line
        id = classify(line)
        #check if language is Icelandic
        if re.search('is', str(id)):
            #print the line and the langid classification 
            print line, ": ", id

还有一个更全面的 python 脚本，允许参数和一些额外的功能。要点代码

Answer