输入文件:
文件:Article1.txt:
paragraph1 It is a long established fact that a reader will......
paragraph2 It is a long established fact that a reader will......
paragraph3 It is a long established fact that a reader will......
文件:Article2.txt:
It is a long established fact that a reader will......
It is a long established fact that a reader will......
It is a long established fact that a reader will......
文件:Article3.txt:
Lorem Ipsum is simply dummy text of the printing.......
Lorem Ipsum is simply dummy text of the printing......
Lorem Ipsum is simply dummy text of the printing.......
期望的输出:
文件:example.csv:
column1 column2 column3
Article1 paragraph1 It is a...... paragraph2 It is a.......
Article2 paragraph1 It is a...... paragraph2 It is a.......
Article3 Lorem I....... Lorem I.......
答案1
只是一个疯狂的猜测
awk 'BEGINFILE { printf "%s",FILENAME}
{ printf ",%s",$0 ;}
ENDFILE { printf "\n" ;}' file1.txt file2.txt file3.txt
这会将文件转换为 csv(但不带引号),文件将转换为一行。
替换",%s"
为"\t%s"
使用选项卡。
答案2
首先合并所有文本文件:
cat Article1.txt Article2.txt Article3.txt > Result.txt
然后将文本文件转换为 CSV:
(echo "Col1;Col2;Col3" ; cat Result.txt) | sed 's/;/<tab>/g' > file.csv
答案3
#! /usr/bin/perl
use strict; use warnings;
my %files=(); my @files=(); my $currentfile=''; my $maxcols=1;
while(<>) {
chomp;
# a hash such as %files is inherently unordered, so store each
# filename we process in @files, in the order that we see them.
if ($currentfile ne $ARGV) {
$currentfile = $ARGV ;
push @files, $currentfile;
};
# choose between the entire input line or the first 20 chars:
#push @{ $files{$currentfile} }, $_ ;
push @{ $files{$currentfile} }, substr($_,0,20) . '...';
# keep track of the largest number of columns in the %files
# hash-of-arrays. in other words, the largest number of lines in any
# input file.
if (@{ $files{$currentfile} } > $maxcols) {
$maxcols = @{ $files{$currentfile} }
};
};
print join("\t", map {"column$_"} @{[1..$maxcols+1]} ),"\n";
foreach my $f (@files) {
print join("\t",$f,@{ $files{$f} }),"\n";
}
输出:
column1 column2 column3 column4
Article1 paragraph1 It is a l... paragraph2 It is a l... paragraph3 It is a l...
Article2 It is a long establi... It is a long establi... It is a long establi...
Article3 Lorem Ipsum is simpl... Lorem Ipsum is simpl... Lorem Ipsum is simpl...
注意:输出是制表符分隔的。这些字段在视觉上与列标题不对齐,因为它们比默认选项卡宽度长。