问题的根源

Question 1

find /tmp/test -name '*.txt' \
 -exec bash -c './thulac < "$(readlink -f {})" > "/mnt/tokenized/$(basename {})"' \;

使用 find 搜索文件并对结果执行命令。这样bash -c 'command'你就可以执行多个$()。

用于readlink -f {}创建结果的完整路径。

用于basename {}从结果中去除路径。

Answer

find /tmp/test -name '*.txt' \
 -exec bash -c './thulac < "$(readlink -f {})" > "/mnt/tokenized/$(basename {})"' \;

使用 find 搜索文件并对结果执行命令。这样bash -c 'command'你就可以执行多个$()。

用于readlink -f {}创建结果的完整路径。

用于basename {}从结果中去除路径。

Question 2

与您合作时，xargs您应该始终使用以“-”开头并包含双空格、“和”的输入来测试您的解决方案，因为它xargs因处理这些问题而臭名昭著：

mkdir -- '-"  '"'"
seq 10 > ./-\"\ \ \'/'-"  '"'".txt

这是使用 GNU Parallel 的解决方案：

find . -name "*.txt" -print0 |parallel  -0 ./thulac '<' {} '>' {/}

< 和 > 需要加引号，否则它们将被启动的 shell 解释parallel。我们希望它们由启动的 shell 来解释parallel。

Answer

与您合作时，xargs您应该始终使用以“-”开头并包含双空格、“和”的输入来测试您的解决方案，因为它xargs因处理这些问题而臭名昭著：

mkdir -- '-"  '"'"
seq 10 > ./-\"\ \ \'/'-"  '"'".txt

这是使用 GNU Parallel 的解决方案：

find . -name "*.txt" -print0 |parallel  -0 ./thulac '<' {} '>' {/}

< 和 > 需要加引号，否则它们将被启动的 shell 解释parallel。我们希望它们由启动的 shell 来解释parallel。

Question 3

find /mnt/test -name "*.txt" -print0 -printf "%f\0" |
xargs -0 -n 2 bash -c 'shift $1; ./thulac < $1 > /mnt/tokenized/$2' 2 1

您还希望传递带有空分隔符的完整路径名，以便当需要xargs 拆除空分隔列表时，它可以以正确的方式执行此操作。

否则，将会发生的情况是，一个文件的完整路径名将被合并到下一个文件的基本名中，这是您在多个文件名的情况下观察到的现象！

然后您需要一次向提供 2 个参数bash alligator，否则它将消耗尽可能多的参数，但它只将前两个参数传递给您的可执行文件./thulac。

更好的选择是放弃xargs& 在中完成所有工作find，因为 xargs 一次处理 2 个参数，这剥夺了xargs.在此版本中，我们提供完整路径名bash并自行计算文件名，bash而不是依赖find它来执行此操作。

find /mnt/test -name "*.txt" -exec bash -c './thulac < "$1" \
  > "/mnt/tokenized/${1##*/}"' {} {} \;

问题的根源

1. Good case when only 1 file present
-print0  -printf '%f'

 /mnt/test/test.txt\0test.txt
 |-----------------|--------|

arg0 = /mnt/test/test.txt
arg1 = test.txt
bash -c 'thulac < $0 > /mnt/tokenized/$1'
thulac < /mnt/test/test.txt > /mnt/tokenized/test.txt

2. Error case when > 1 file present
-print0  -printf '%f'
/mnt/test/test.txt\0test.txt/mnt/test/test33.txt\0test33.txt
|-----------------|-----------------------------|----------|

arg0 = /mnt/test/test.txt
arg1 = test.txt/mnt/test/test33.txt
arg2 = test33.txt
bash -c 'thulac < $0 > /mnt/tokenized/$1'
thulac < /mnt/test/test.txt > /mnt/tokenized/test.txt/mnt/test/test33.txt

使固定

We saw that the mixup occurred due to the absence of the delimiter '\0' in the -printf "%f"
So the correct way is:
find ... -print0 -printf "%f\0" | xargs ...
Ensuring that the list is partitioned at the right places and the 
sequence of fullpath1+file1\0fullpath2+file2\0... is maintained.

Now coming to the 'xargs' part, we write:
xargs -0 -n 2 bash -c '...' 2 1

Points to observe are the following:
   a) '-0' => arguments to xargs will be taken to be NULL separated.
   b) -n 2 => we feed 2 args at a time to bash from the total pool 
      delivered to xargs by find.
   c) 2 1 is just a best practice to get over different shell's behavior
      regarding what construes as $0, $1, $2, ...; In your particular case since you
      already know that $0 -> first arg, $1 -> 2nd arg, we could just as well have
     written what you did:
    find ... | xargs -0 -n 2 bash -c './thulac < $0 > /mnt/tokenized/$1'

Answer

find /mnt/test -name "*.txt" -print0 -printf "%f\0" |
xargs -0 -n 2 bash -c 'shift $1; ./thulac < $1 > /mnt/tokenized/$2' 2 1

您还希望传递带有空分隔符的完整路径名，以便当需要xargs 拆除空分隔列表时，它可以以正确的方式执行此操作。

否则，将会发生的情况是，一个文件的完整路径名将被合并到下一个文件的基本名中，这是您在多个文件名的情况下观察到的现象！

然后您需要一次向提供 2 个参数bash alligator，否则它将消耗尽可能多的参数，但它只将前两个参数传递给您的可执行文件./thulac。

更好的选择是放弃xargs& 在中完成所有工作find，因为 xargs 一次处理 2 个参数，这剥夺了xargs.在此版本中，我们提供完整路径名bash并自行计算文件名，bash而不是依赖find它来执行此操作。

find /mnt/test -name "*.txt" -exec bash -c './thulac < "$1" \
  > "/mnt/tokenized/${1##*/}"' {} {} \;

问题的根源

1. Good case when only 1 file present
-print0  -printf '%f'

 /mnt/test/test.txt\0test.txt
 |-----------------|--------|

arg0 = /mnt/test/test.txt
arg1 = test.txt
bash -c 'thulac < $0 > /mnt/tokenized/$1'
thulac < /mnt/test/test.txt > /mnt/tokenized/test.txt

2. Error case when > 1 file present
-print0  -printf '%f'
/mnt/test/test.txt\0test.txt/mnt/test/test33.txt\0test33.txt
|-----------------|-----------------------------|----------|

arg0 = /mnt/test/test.txt
arg1 = test.txt/mnt/test/test33.txt
arg2 = test33.txt
bash -c 'thulac < $0 > /mnt/tokenized/$1'
thulac < /mnt/test/test.txt > /mnt/tokenized/test.txt/mnt/test/test33.txt

使固定

We saw that the mixup occurred due to the absence of the delimiter '\0' in the -printf "%f"
So the correct way is:
find ... -print0 -printf "%f\0" | xargs ...
Ensuring that the list is partitioned at the right places and the 
sequence of fullpath1+file1\0fullpath2+file2\0... is maintained.

Now coming to the 'xargs' part, we write:
xargs -0 -n 2 bash -c '...' 2 1

Points to observe are the following:
   a) '-0' => arguments to xargs will be taken to be NULL separated.
   b) -n 2 => we feed 2 args at a time to bash from the total pool 
      delivered to xargs by find.
   c) 2 1 is just a best practice to get over different shell's behavior
      regarding what construes as $0, $1, $2, ...; In your particular case since you
      already know that $0 -> first arg, $1 -> 2nd arg, we could just as well have
     written what you did:
    find ... | xargs -0 -n 2 bash -c './thulac < $0 > /mnt/tokenized/$1'

Question 4

您没有准确地告诉您的脚本需要实现什么，但假设您希望将每个奇数文件作为第一个参数传递，每个偶数文件名作为第二个参数传递，以下是如何以可移植的方式执行此操作：

t=$(mktemp)
find /tmp/test -name "*.txt" -exec sh -c '
    if [ -s $1 ]
    then
        ./thulac < "$(<$1)" > "/mnt/tokenized/$2"
    else
        printf "%s" "$2" > "$1"
    fi' sh $t {} \;
rm $t

如果您只想传递找到的每个文件的路径和文件名，答案更简单，仍然只使用可移植命令和语法（POSIX），即不依赖于 bash、GNU find 和 GNU xargs：

find /tmp/test -name "*.txt" -exec sh -c '
    ./thulac < "$1" > "/mnt/tokenized/$(basename "$1")"' sh {} \;

请注意，{}仅在使用 shell 时才需要引用fish，这是极不可能的情况。

Answer

您没有准确地告诉您的脚本需要实现什么，但假设您希望将每个奇数文件作为第一个参数传递，每个偶数文件名作为第二个参数传递，以下是如何以可移植的方式执行此操作：

t=$(mktemp)
find /tmp/test -name "*.txt" -exec sh -c '
    if [ -s $1 ]
    then
        ./thulac < "$(<$1)" > "/mnt/tokenized/$2"
    else
        printf "%s" "$2" > "$1"
    fi' sh $t {} \;
rm $t

如果您只想传递找到的每个文件的路径和文件名，答案更简单，仍然只使用可移植命令和语法（POSIX），即不依赖于 bash、GNU find 和 GNU xargs：

find /tmp/test -name "*.txt" -exec sh -c '
    ./thulac < "$1" > "/mnt/tokenized/$(basename "$1")"' sh {} \;

请注意，{}仅在使用 shell 时才需要引用fish，这是极不可能的情况。

问题的根源

答案1

答案2

答案3

问题的根源

使固定

答案4

相关内容