我有以下名为 OpenSimStats.txt 的文件:
TestreportsRootAgentCount=0agent(s)
TestreportsChildAgentCount=0childagent(s)
TestreportsGCReportedMemory=10MB(Global)
TestreportsTotalObjectsCount=0Object(s)
TestreportsTotalPhysicsFrameTime=0ms
TestreportsPhysicsUpdateFrameTime=0ms
TestreportsPrivateWorkingSetMemory=2144MB(Global)
TestreportsTotalThreads=0Thread(s)(Global)
TestreportsTotalFrameTime=89ms
TestreportsTotalEventFrameTime=0ms
TestreportsLandFrameTime=0ms
TestreportsLastCompletedFrameAt=25msago
TestreportsTimeDilationMonitor=1
TestreportsSimFPSMonitor=55.3333320617676
TestreportsPhysicsFPSMonitor=55.4766654968262
TestreportsAgentUpdatesPerSecondMonitor=0persecond
TestreportsActiveObjectCountMonitor=0
TestreportsActiveScriptsMonitor=0
TestreportsScriptEventsPerSecondMonitor=0persecond
TestreportsInPacketsPerSecondMonitor=0persecond
TestreportsOutPacketsPerSecondMonitor=0persecond
TestreportsUnackedBytesMonitor=0
TestreportsPendingDownloadsMonitor=0
TestreportsPendingUploadsMonitor=0
TestreportsTotalFrameTimeMonitor=18.18239402771ms
TestreportsNetFrameTimeMonitor=0ms
TestreportsPhysicsFrameTimeMonitor=0.0106373848393559ms
TestreportsSimulationFrameTimeMonitor=0.17440040409565ms
TestreportsAgentFrameTimeMonitor=0ms
TestreportsImagesFrameTimeMonitor=0ms
TestreportsSpareFrameTimeMonitor=18.1818199157715ms
TestreportsLastReportedObjectUpdates=0
TestreportsSlowFrames=1
我想将此文件转换为 CSV 文件,如下所示:
TestreportsRootAgentCount,TestreportsChildAgentCount,...,TestreportsSlowFrames
0,0,10,0,0...,1
我的意思是:
- 取出分隔符之前和之后的所有单词,在本例中分隔符为“=”
- 将定界符左侧的所有单词放在一行中,并用逗号分隔
- 在末尾插入新行
- 然后将分隔符 (
=
) 后面的任何内容 - 仅数字(数字后面不带单位或字符)放在另一行中,其中这些数字用逗号分隔。 - 然后插入新行
关于如何在 Linux shell 脚本中完成此操作有什么想法/建议吗?通过使用 sed 或 gawk?
答案1
OpenSim 启示的 9 条路径:
加上sed
一些 shell 魔法:
sed 's/=.*//' OpenSimStats.txt | paste -sd, >out.csv
sed 's/.*=//; s/[^0-9]*$//' OpenSimStats.txt | paste -sd, >>out.csv
使用sed
, 不使用 shell 魔法:
sed -n 's/=.*//; 1{ h; b; }; $! H; $ { x; s/\n/,/g; p; }' OpenSimStats.txt >out.csv
sed -n 's/.*=//; 1{ s/[0-9]*$//; h; b; }; s/[^0-9]*$//; $! H; $ { x; s/\n/,/g; p; }' OpenSimStats.txt >>out.csv
借助贝壳魔法和一点点sed
:
paste -sd, <(cut -d= -f1 OpenSimStats.txt) <(cut -d= -f2 OpenSimStats.txt | sed 's/[^0-9]*$//')
加上cut
一些 shell 魔法:
cut -d= -f1 OpenSimStats.txt | paste -sd, >out.csv
cut -d= -f2 OpenSimStats.txt | sed 's/[^0-9]*$//' | paste -sd, >>out.csv
使用 GNU datamash
:
sed 's/=/,/; s/[^0-9]*$//' OpenSimStats.txt | datamash -t, transpose
和perl
:
perl -lnE 's/\D+$//o;
($a, $b) = split /=/;
push @a, $a; push @b, $b;
END { $, = ","; say @a; say @b }' OpenSimStats.txt
和grep
:
grep -o '^[^=]*' OpenSimStats.txt | paste -sd, >out.csv
egrep -o '[0-9.]+' OpenSimStats.txt | paste -sd, >>out.csv
和bash
:
#! /usr/bin/env bash
line1=()
line2=()
while IFS='=' read -r a b; do
line1+=("$a")
[[ $b =~ ^[0-9.]+ ]]
line2+=("$BASH_REMATCH")
done <OpenSimStats.txt
( set "${line1[@]}"; IFS=,; echo "$*" ) >out.csv
( set "${line2[@]}"; IFS=,; echo "$*" ) >>out.csv
和awk
:
awk -F= '
NR==1 { a = $1; sub(/[^0-9]+$/, "", $2); b = $2; next }
{ a = a "," $1; sub(/[^0-9]+$/, "", $2); b = b "," $2 }
END { print a; print b }' OpenSimStats.txt
数据迷的第十条途径,csvtk
:
csvtk replace -d= -f 2 -p '\D+$' -r '' <OpenSimStats.txt | csvtk transpose
第 11 条奖励路径vim
:
:%s/\D*$//
:%s/=/\r/
qaq
:g/^\D/y A | normal dd
:1,$-1 s/\n/,/
"aP
:2,$-2 s/\n/,/
:d 1
:w out.csv
答案2
awk
来帮你:
awk -F= '{a[NR,1]=$1;a[NR,2]=$2}
END{
for(i=1; i<NR; i++){
printf a[i,1] ","
}
print a[i,1];
for(i=1; i<NR; i++){
printf "%s", a[i,2]+0
}
print a[i,2];
}' file
该数组由第一列中的键和第二列中的值a
填充。$1
$2
读取所有行后,循环遍历数组的所有元素两次以显示键和值。
答案3
这是一个 Perl 解决方案:
$ perl -F= -lae '$F[1]=~s/[^0-9]//g; push @h,$F[0]; push @l,$F[1];
END{print join ",",@h; print join ",",@l}' OpenSimStats.txt
TestreportsRootAgentCount,TestreportsChildAgentCount,TestreportsGCReportedMemory,TestreportsTotalObjectsCount,TestreportsTotalPhysicsFrameTime,TestreportsPhysicsUpdateFrameTime,TestreportsPrivateWorkingSetMemory,TestreportsTotalThreads,TestreportsTotalFrameTime,TestreportsTotalEventFrameTime,TestreportsLandFrameTime,TestreportsLastCompletedFrameAt,TestreportsTimeDilationMonitor,TestreportsSimFPSMonitor,TestreportsPhysicsFPSMonitor,TestreportsAgentUpdatesPerSecondMonitor,TestreportsActiveObjectCountMonitor,TestreportsActiveScriptsMonitor,TestreportsScriptEventsPerSecondMonitor,TestreportsInPacketsPerSecondMonitor,TestreportsOutPacketsPerSecondMonitor,TestreportsUnackedBytesMonitor,TestreportsPendingDownloadsMonitor,TestreportsPendingUploadsMonitor,TestreportsTotalFrameTimeMonitor,TestreportsNetFrameTimeMonitor,TestreportsPhysicsFrameTimeMonitor,TestreportsSimulationFrameTimeMonitor,TestreportsAgentFrameTimeMonitor,TestreportsImagesFrameTimeMonitor,TestreportsSpareFrameTimeMonitor,TestreportsLastReportedObjectUpdates,TestreportsSlowFrames
0,0,10,0,0,0,2144,0,89,0,0,25,1,553333320617676,554766654968262,0,0,0,0,0,0,0,0,0,1818239402771,0,00106373848393559,017440040409565,0,0,181818199157715,0,1
该-a
标志使perl
行为类似于并将由(此处为 an )awk
给出的字段分隔符上的每个输入行拆分到数组中。向每个调用添加 ,并且 是将在每行上运行的脚本。-F
=
@F
-l
\n
print
-e
$F[1]=~s/[^0-9]//g;
:从第二个字段中删除所有非数字字符(数组从 0 开始计数,$F[1]
第二个字段也是如此)。push @h,$F[0]; push @l,$F[1];
:将第一个字段推入数组@h
,将第二个字段(现在非数字已被删除)推入数组@l
。END{}
:在处理整个输入文件后执行一次。print join ",",@h;
:连接@h
数组,
并打印它。print join ",",@l
: 如上所述,但对于@l
.
答案4
我知道这个问题已经得到解答,我想我只需添加我的两分钱来说明如何在(丑陋的)bash 中实现这一点
最终结果
echo -e $(cut -d"=" -f1 OpenSimStats.txt | tr '\n' ',' | sed 's/,$/\\n/')$(sed -r 's/.*=([0-9]*).*/\1,/g' OpenSimStats.txt | tr -d '\n' | sed 's/,$//')
这将在一行中产生您要求的以下结果:
TestreportsRootAgentCount,TestreportsChildAgentCount,TestreportsGCReportedMemory,TestreportsTotalObjectsCount,TestreportsTotalPhysicsFrameTime,TestreportsPhysicsUpdateFrameTime,TestreportsPrivateWorkingSetMemory,TestreportsTotalThreads,TestreportsTotalFrameTime,TestreportsTotalEventFrameTime,TestreportsLandFrameTime,TestreportsLastCompletedFrameAt,TestreportsTimeDilationMonitor,TestreportsSimFPSMonitor,TestreportsPhysicsFPSMonitor,TestreportsAgentUpdatesPerSecondMonitor,TestreportsActiveObjectCountMonitor,TestreportsActiveScriptsMonitor,TestreportsScriptEventsPerSecondMonitor,TestreportsInPacketsPerSecondMonitor,TestreportsOutPacketsPerSecondMonitor,TestreportsUnackedBytesMonitor,TestreportsPendingDownloadsMonitor,TestreportsPendingUploadsMonitor,TestreportsTotalFrameTimeMonitor,TestreportsNetFrameTimeMonitor,TestreportsPhysicsFrameTimeMonitor,TestreportsSimulationFrameTimeMonitor,TestreportsAgentFrameTimeMonitor,TestreportsImagesFrameTimeMonitor,TestreportsSpareFrameTimeMonitor,TestreportsLastReportedObjectUpdates,TestreportsSlowFrames
0,0,10,0,0,0,2144,0,89,0,0,25,1,55,55,0,0,0,0,0,0,0,0,0,18,0,0,0,0,0,18,0,1
代码分解
以下是所发生事件的详细情况。
外回声
echo -e $()$()
回显首先执行的两个嵌套命令的结果,以便-e
打印的结果\n
将转换为最终结果中的换行符。
第一个命令
cut -d"=" -f1 OpenSimStats.txt | tr '\n' ',' | sed 's/,$/\\n/'
第一个嵌套命令。用作=
分隔符将所有文本提取为一列(一系列\n
- 分隔值)。将全部替换\n
为,
,然后再次将最后一个逗号替换为 a \n
(否则最后一个值后面将跟有一个逗号)。
该命令本身会产生以下输出:
TestreportsRootAgentCount,TestreportsChildAgentCount,TestreportsGCReportedMemory,TestreportsTotalObjectsCount,TestreportsTotalPhysicsFrameTime,TestreportsPhysicsUpdateFrameTime,TestreportsPrivateWorkingSetMemory,TestreportsTotalThreads,TestreportsTotalFrameTime,TestreportsTotalEventFrameTime,TestreportsLandFrameTime,TestreportsLastCompletedFrameAt,TestreportsTimeDilationMonitor,TestreportsSimFPSMonitor,TestreportsPhysicsFPSMonitor,TestreportsAgentUpdatesPerSecondMonitor,TestreportsActiveObjectCountMonitor,TestreportsActiveScriptsMonitor,TestreportsScriptEventsPerSecondMonitor,TestreportsInPacketsPerSecondMonitor,TestreportsOutPacketsPerSecondMonitor,TestreportsUnackedBytesMonitor,TestreportsPendingDownloadsMonitor,TestreportsPendingUploadsMonitor,TestreportsTotalFrameTimeMonitor,TestreportsNetFrameTimeMonitor,TestreportsPhysicsFrameTimeMonitor,TestreportsSimulationFrameTimeMonitor,TestreportsAgentFrameTimeMonitor,TestreportsImagesFrameTimeMonitor,TestreportsSpareFrameTimeMonitor,TestreportsLastReportedObjectUpdates,TestreportsSlowFrames
第二条命令
sed -r 's/.*=([0-9]*).*/\1,/g' OpenSimStats.txt | tr -d '\n' | sed 's/,$//'
第二个嵌套命令。删除每行上所需数字周围的所有文本,从而生成一列数字(\n
- 分隔的值)。全部替换\n
为,
,然后删除末尾的逗号。
其结果如下:
0,0,10,0,0,0,2144,0,89,0,0,25,1,55,55,0,0,0,0,0,0,0,0,0,18,0,0,0,0,0,18,0,1
顶部的单行只是将这三个部分组合成一行以产生最终结果。