根据预定义的行集拆分文件

Question 1

下面是一个bash脚本，假设您的输入文件名为 infile，并且范围以每行 1 个的形式存储在名为 splits 的文件中：

i=1
for range in $(< splits); do
  sed -n "$(echo "$range" | cut -f1 -d, )p" infile > "file$i"
  sed -n "$(echo "$range" | cut -f2 -d, )p" infile >> "file$i"
  ((i++))
done

这仅用于sed打印范围指定的行，并将每个结果保存为新文件（创建的文件名为 file1 file2 file3 等）。两次调用sed用于保留行的指定顺序。

请注意，此简单脚本没有进行任何格式或错误检查，并且名为 file1 的现有文件将被覆盖。

简化的替代方案（由@muru）使用while read并让 bash 分割范围而不是 cut：

i=1
while IFS=',' read n1 n2 
do
    sed -n "$n1 p; $n2 p" infile > "file$i"
    ((i++))
done < splits

如果输出文件中行的顺序很重要（例如行 5,4 != 4,5），那么该sed位将需要分成两个类似于第一个脚本的单独调用。

Answer

下面是一个bash脚本，假设您的输入文件名为 infile，并且范围以每行 1 个的形式存储在名为 splits 的文件中：

i=1
for range in $(< splits); do
  sed -n "$(echo "$range" | cut -f1 -d, )p" infile > "file$i"
  sed -n "$(echo "$range" | cut -f2 -d, )p" infile >> "file$i"
  ((i++))
done

这仅用于sed打印范围指定的行，并将每个结果保存为新文件（创建的文件名为 file1 file2 file3 等）。两次调用sed用于保留行的指定顺序。

请注意，此简单脚本没有进行任何格式或错误检查，并且名为 file1 的现有文件将被覆盖。

简化的替代方案（由@muru）使用while read并让 bash 分割范围而不是 cut：

i=1
while IFS=',' read n1 n2 
do
    sed -n "$n1 p; $n2 p" infile > "file$i"
    ((i++))
done < splits

如果输出文件中行的顺序很重要（例如行 5,4 != 4,5），那么该sed位将需要分成两个类似于第一个脚本的单独调用。

Question 2

下面的python脚本将进行拆分：

#!/usr/bin/python3

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('file', type=argparse.FileType('r'))
parser.add_argument('lines', type=argparse.FileType('r'))

args = parser.parse_args()

file_lines = list(args.file)

for i, l in enumerate(args.lines):
    r = l.rstrip().split(',')
    with open('file{}'.format(i+1), 'w') as f:
        for k in r:
            try:
                f.write(file_lines[int(k)-1])
            except IndexError: # Ignore lines out of range
                pass

简单地这样称呼：

./split.py file lines

<file>abcdef 文件和1,2...... 行范围在哪里<lines>（甚至可以有多行，如 1,6,3,18,5）

Answer

下面的python脚本将进行拆分：

#!/usr/bin/python3

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('file', type=argparse.FileType('r'))
parser.add_argument('lines', type=argparse.FileType('r'))

args = parser.parse_args()

file_lines = list(args.file)

for i, l in enumerate(args.lines):
    r = l.rstrip().split(',')
    with open('file{}'.format(i+1), 'w') as f:
        for k in r:
            try:
                f.write(file_lines[int(k)-1])
            except IndexError: # Ignore lines out of range
                pass

简单地这样称呼：

./split.py file lines

<file>abcdef 文件和1,2...... 行范围在哪里<lines>（甚至可以有多行，如 1,6,3,18,5）

Question 3

这是在 awk 中实现此目的的一种方法

awk -F, 'NR==FNR {for (i=1;i<=NF;i++) a[$i]=FNR; next;} {print $0 >> "outfile"a[FNR];}' index file

它读取索引文件，并将其行号 ( FNR) 保存到由行上的值列表索引的数组中。然后它读取输入文件，并使用它是行号来查找什么输出要写入每行的文件编号。

Answer

这是在 awk 中实现此目的的一种方法

awk -F, 'NR==FNR {for (i=1;i<=NF;i++) a[$i]=FNR; next;} {print $0 >> "outfile"a[FNR];}' index file

它读取索引文件，并将其行号 ( FNR) 保存到由行上的值列表索引的数组中。然后它读取输入文件，并使用它是行号来查找什么输出要写入每行的文件编号。

Question 4

另一种bash解决方案，假设input作为输入、pattern作为模式和output作为输出：

#!/bin/bash
i=0 # set the output number to 0
while read row; do # for each line in file `pattern` as $row
    columns=$(<<< $row tr ',' '\n') # store each line obtained by transforming ',' in '\n' inside $row in an array $columns
    for column in $columns; do # for each member in array $columns as $column
        sed -n "${column}p" input
    done > output$i # write column $column in `input` to `output$i`
    ((i++)) # increment the output number
done < pattern

Answer

另一种bash解决方案，假设input作为输入、pattern作为模式和output作为输出：

#!/bin/bash
i=0 # set the output number to 0
while read row; do # for each line in file `pattern` as $row
    columns=$(<<< $row tr ',' '\n') # store each line obtained by transforming ',' in '\n' inside $row in an array $columns
    for column in $columns; do # for each member in array $columns as $column
        sed -n "${column}p" input
    done > output$i # write column $column in `input` to `output$i`
    ((i++)) # increment the output number
done < pattern

根据预定义的行集拆分文件

答案1

答案2

答案3

答案4

相关内容