加快文本处理速度

Question

您想要首先从名为的文件中提取 36 行标题input，然后从文件的其余部分中随机选取 60000 行，并且可以多次随机选取同一行。所有输出都应转到名为output.

使用shufGNU coreutils：

#!/bin/sh

# Fetch header (36 first lines)
head -n 36 <input >output

# Scramble the other lines and pick 60000 (allowing for repeated lines)
tail -n +37 <input | shuf -r -n 60000 >>output

或者：

( head -n 36 <input; tail -n +37 <input | shuf -r -n 60000 ) >output

使用 GNU head，它将输入文件流保留在最后一行输出之后的位置，这意味着shuf可以在head完成读取的位置继续（这可能不适用于一些非 GNUhead实现）：

( head -n 36; shuf -r -n 60000 ) <input >output

Answer 1

您想要首先从名为的文件中提取 36 行标题input，然后从文件的其余部分中随机选取 60000 行，并且可以多次随机选取同一行。所有输出都应转到名为output.

使用shufGNU coreutils：

#!/bin/sh

# Fetch header (36 first lines)
head -n 36 <input >output

# Scramble the other lines and pick 60000 (allowing for repeated lines)
tail -n +37 <input | shuf -r -n 60000 >>output

或者：

( head -n 36 <input; tail -n +37 <input | shuf -r -n 60000 ) >output

使用 GNU head，它将输入文件流保留在最后一行输出之后的位置，这意味着shuf可以在head完成读取的位置继续（这可能不适用于一些非 GNUhead实现）：

( head -n 36; shuf -r -n 60000 ) <input >output

加快文本处理速度

答案1

相关内容