我试图将以下命令(从文件中选择随机行)分配给变量,但不起作用。
givinv@87-109:~$ head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1
cower
givinv@87-109:~$
下面是我在尝试将其分配给变量时遇到的错误。
givinv@87-109:~$ VARIA=`head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1`
bash: command substitution: line 1: unexpected EOF while looking for matching `)'
bash: command substitution: line 2: syntax error: unexpected end of file
bash: command substitution: line 1: syntax error near unexpected token `)'
bash: command substitution: line 1: ` + 1)) file | tail -1'
-l: command not found
givinv@87-109:~$
我什至尝试了同样的 for 循环但不起作用::
givinv@87-109:~$ for i in `head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1`;do echo $i ;done
bash: syntax error near unexpected token `<'
givinv@87-109:~$
答案1
它不起作用,因为您试图嵌套未转义的反引号:
VARIA=`head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1`
head -$((${RANDOM} %
实际上,它首先尝试作为单个命令运行,这会给出两个第一个错误:
$ VARIA=`head -$((${RANDOM} % `
bash: command substitution: line 1: unexpected EOF while looking for matching `)'
bash: command substitution: line 2: syntax error: unexpected end of file
然后,它尝试运行
wc -l < file` + 1)) file | tail -1`
这意味着它尝试评估+ 1)) file | tail -1
(在反引号之间),这会给您带来下一个错误:
$ wc -l < file` + 1)) file | tail -1`
bash: command substitution: line 1: syntax error near unexpected token `)'
bash: command substitution: line 1: ` + 1)) file | tail -1'
您可以通过转义反引号来解决这个问题:
VARIA=`head -$((${RANDOM} % \`wc -l < file\` + 1)) file | tail -1`
然而,作为一般规则,通常最好根本不使用反引号。您几乎应该总是使用$()
它。它更加健壮,并且可以使用更简单的语法无限嵌套:
VARIA=$(head -$((${RANDOM} % $(wc -l < file) + 1)) file | tail -1)
答案2
只需使用这个命令
VARIA=$(head -n "$((${RANDOM} % $(wc -l < test) + 1))" test | tail -n 1)
将命令的结果分配给我们使用的变量$(...)
(古老的`...`
形式更难嵌套)。
答案3
作为从文件中读取随机行(并将其分配给变量)的另一种选择,请考虑简化水库取样方法,转换自thrig 的 Perl 实现到 awk,与Peter.O 的播种改进:
VARIA=$(awk -v seed=$RANDOM 'BEGIN { srand(seed) } { if (rand() * FNR < 1) { line=$0 } } END { print line }' /usr/share/dict/words)
这是 awk 脚本,包装得很好:
awk -v seed=$RANDOM '
BEGIN {
srand(seed)
}
{
if (rand() * FNR < 1) {
line=$0
}
}
END {
print line
}' /usr/share/dict/words
由于 awk 的srand()
工作方式,您会得到相同的值如果您在同一秒内运行此脚本除非你用其他随机的东西播种它;这里我传入了 bash 的 $RANDOM 作为种子。这里我从 /usr/share/dict/words 中选择单词,作为文本源。
此方法不关心文件中有多少行(我的本地副本有 479,828 行),因此它应该非常灵活。
为了查看程序的数学运算,我编写了一个包装脚本,该脚本迭代不同的行号和概率:
演示文件
#!/bin/sh
for lineno in 1 2 3 4 5 20 100
do
echo "0 .. 0.99999 < ( 1 / FNR == " $(printf 'scale=2\n1 / %d\n' "$lineno" | bc) ")"
for r in 0 0.01 0.25 0.5 0.99
do
result=$(printf '%f * %d\n' "$r" "$lineno" | bc)
case $result in
(0*|\.*) echo "Line $lineno: Result of probability $r * line $lineno is $result and is < 1, choosing line" ;;
(*) echo "Line $lineno: Result of probability $r * line $lineno is $result and is >= 1, not choosing line" ;;
esac
done
echo
done
结果是:
0 .. 0.99999 < ( 1 / FNR == 1.00 )
Line 1: Result of probability 0 * line 1 is 0 and is < 1, choosing line
Line 1: Result of probability 0.01 * line 1 is .010000 and is < 1, choosing line
Line 1: Result of probability 0.25 * line 1 is .250000 and is < 1, choosing line
Line 1: Result of probability 0.5 * line 1 is .500000 and is < 1, choosing line
Line 1: Result of probability 0.99 * line 1 is .990000 and is < 1, choosing line
0 .. 0.99999 < ( 1 / FNR == .50 )
Line 2: Result of probability 0 * line 2 is 0 and is < 1, choosing line
Line 2: Result of probability 0.01 * line 2 is .020000 and is < 1, choosing line
Line 2: Result of probability 0.25 * line 2 is .500000 and is < 1, choosing line
Line 2: Result of probability 0.5 * line 2 is 1.000000 and is >= 1, not choosing line
Line 2: Result of probability 0.99 * line 2 is 1.980000 and is >= 1, not choosing line
0 .. 0.99999 < ( 1 / FNR == .33 )
Line 3: Result of probability 0 * line 3 is 0 and is < 1, choosing line
Line 3: Result of probability 0.01 * line 3 is .030000 and is < 1, choosing line
Line 3: Result of probability 0.25 * line 3 is .750000 and is < 1, choosing line
Line 3: Result of probability 0.5 * line 3 is 1.500000 and is >= 1, not choosing line
Line 3: Result of probability 0.99 * line 3 is 2.970000 and is >= 1, not choosing line
0 .. 0.99999 < ( 1 / FNR == .25 )
Line 4: Result of probability 0 * line 4 is 0 and is < 1, choosing line
Line 4: Result of probability 0.01 * line 4 is .040000 and is < 1, choosing line
Line 4: Result of probability 0.25 * line 4 is 1.000000 and is >= 1, not choosing line
Line 4: Result of probability 0.5 * line 4 is 2.000000 and is >= 1, not choosing line
Line 4: Result of probability 0.99 * line 4 is 3.960000 and is >= 1, not choosing line
0 .. 0.99999 < ( 1 / FNR == .20 )
Line 5: Result of probability 0 * line 5 is 0 and is < 1, choosing line
Line 5: Result of probability 0.01 * line 5 is .050000 and is < 1, choosing line
Line 5: Result of probability 0.25 * line 5 is 1.250000 and is >= 1, not choosing line
Line 5: Result of probability 0.5 * line 5 is 2.500000 and is >= 1, not choosing line
Line 5: Result of probability 0.99 * line 5 is 4.950000 and is >= 1, not choosing line
0 .. 0.99999 < ( 1 / FNR == .05 )
Line 20: Result of probability 0 * line 20 is 0 and is < 1, choosing line
Line 20: Result of probability 0.01 * line 20 is .200000 and is < 1, choosing line
Line 20: Result of probability 0.25 * line 20 is 5.000000 and is >= 1, not choosing line
Line 20: Result of probability 0.5 * line 20 is 10.000000 and is >= 1, not choosing line
Line 20: Result of probability 0.99 * line 20 is 19.800000 and is >= 1, not choosing line
0 .. 0.99999 < ( 1 / FNR == .01 )
Line 100: Result of probability 0 * line 100 is 0 and is < 1, choosing line
Line 100: Result of probability 0.01 * line 100 is 1.000000 and is >= 1, not choosing line
Line 100: Result of probability 0.25 * line 100 is 25.000000 and is >= 1, not choosing line
Line 100: Result of probability 0.5 * line 100 is 50.000000 and is >= 1, not choosing line
Line 100: Result of probability 0.99 * line 100 is 99.000000 and is >= 1, not choosing line
原公式:
rand() * FNR < 1
可以在数学上重写为:
rand() < 1 / FNR
...这对我来说更直观,因为它显示了随着行号的增加,右侧的值不断减小。随着等式右侧的值下降,rand() 函数返回小于右侧的值的机会越来越小。
对于每个行号,我打印将要测试的公式的表示形式:rand() 输出的范围和“1 除以行号”。然后,我迭代一些样本随机值,看看是否会在给定该随机值的情况下选择该行。
一些示例案例值得一看:
- 在第 1 行,由于 rand() 生成 0 <= rand() < 1 范围内的值,因此结果将始终小于 (1 / 1 == 1),因此将始终选择第 1 行。
- 在第 2 行,您可以看到随机值需要小于 0.50,表示选择第 2 行的概率为 50%。
- 在第 100 行,rand() 现在需要生成一个小于 0.01 的值才能选择该行。