根据特定分隔符解析 txt 文件,然后将其转换为 CSV 文件

根据特定分隔符解析 txt 文件,然后将其转换为 CSV 文件

我有以下名为 OpenSimStats.txt 的文件:

TestreportsRootAgentCount=0agent(s)
TestreportsChildAgentCount=0childagent(s)
TestreportsGCReportedMemory=10MB(Global)
TestreportsTotalObjectsCount=0Object(s)
TestreportsTotalPhysicsFrameTime=0ms
TestreportsPhysicsUpdateFrameTime=0ms
TestreportsPrivateWorkingSetMemory=2144MB(Global)
TestreportsTotalThreads=0Thread(s)(Global)
TestreportsTotalFrameTime=89ms
TestreportsTotalEventFrameTime=0ms
TestreportsLandFrameTime=0ms
TestreportsLastCompletedFrameAt=25msago
TestreportsTimeDilationMonitor=1
TestreportsSimFPSMonitor=55.3333320617676
TestreportsPhysicsFPSMonitor=55.4766654968262
TestreportsAgentUpdatesPerSecondMonitor=0persecond
TestreportsActiveObjectCountMonitor=0
TestreportsActiveScriptsMonitor=0
TestreportsScriptEventsPerSecondMonitor=0persecond
TestreportsInPacketsPerSecondMonitor=0persecond
TestreportsOutPacketsPerSecondMonitor=0persecond
TestreportsUnackedBytesMonitor=0
TestreportsPendingDownloadsMonitor=0
TestreportsPendingUploadsMonitor=0
TestreportsTotalFrameTimeMonitor=18.18239402771ms
TestreportsNetFrameTimeMonitor=0ms
TestreportsPhysicsFrameTimeMonitor=0.0106373848393559ms
TestreportsSimulationFrameTimeMonitor=0.17440040409565ms
TestreportsAgentFrameTimeMonitor=0ms
TestreportsImagesFrameTimeMonitor=0ms
TestreportsSpareFrameTimeMonitor=18.1818199157715ms
TestreportsLastReportedObjectUpdates=0
TestreportsSlowFrames=1

我想将此文件转换为 CSV 文件,如下所示:

TestreportsRootAgentCount,TestreportsChildAgentCount,...,TestreportsSlowFrames
0,0,10,0,0...,1

我的意思是:

  1. 取出分隔符之前和之后的所有单词,在本例中分隔符为“=”
  2. 将定界符左侧的所有单词放在一行中,并用逗号分隔
  3. 在末尾插入新行
  4. 然后将分隔符 ( =) 后面的任何内容 - 仅数字(数字后面不带单位或字符)放在另一行中,其中这些数字用逗号分隔。
  5. 然后插入新行

关于如何在 Linux shell 脚本中完成此操作有什么想法/建议吗?通过使用 sed 或 gawk?

答案1

OpenSim 启示的 9 条路径:

加上sed一些 shell 魔法:

sed 's/=.*//' OpenSimStats.txt | paste -sd, >out.csv
sed 's/.*=//; s/[^0-9]*$//' OpenSimStats.txt | paste -sd, >>out.csv

使用sed, 不使用 shell 魔法:

sed -n 's/=.*//; 1{ h; b; }; $! H; $ { x; s/\n/,/g; p; }' OpenSimStats.txt >out.csv
sed -n 's/.*=//; 1{ s/[0-9]*$//; h; b; }; s/[^0-9]*$//; $! H; $ { x; s/\n/,/g; p; }' OpenSimStats.txt >>out.csv

借助贝壳魔法和一点点sed

paste -sd, <(cut -d= -f1 OpenSimStats.txt) <(cut -d= -f2 OpenSimStats.txt | sed 's/[^0-9]*$//')

加上cut一些 shell 魔法:

cut -d= -f1 OpenSimStats.txt | paste -sd, >out.csv
cut -d= -f2 OpenSimStats.txt | sed 's/[^0-9]*$//' | paste -sd, >>out.csv

使用 GNU datamash

sed 's/=/,/; s/[^0-9]*$//' OpenSimStats.txt | datamash -t, transpose

perl

perl -lnE 's/\D+$//o;
    ($a, $b) = split /=/;
    push @a, $a; push @b, $b;
    END { $, = ","; say @a; say @b }' OpenSimStats.txt

grep

grep -o '^[^=]*' OpenSimStats.txt | paste -sd, >out.csv
egrep -o '[0-9.]+' OpenSimStats.txt | paste -sd, >>out.csv

bash

#! /usr/bin/env bash
line1=()
line2=()
while IFS='=' read -r a b; do
    line1+=("$a")
    [[ $b =~ ^[0-9.]+ ]]
    line2+=("$BASH_REMATCH")
done <OpenSimStats.txt
( set "${line1[@]}"; IFS=,; echo "$*" ) >out.csv
( set "${line2[@]}"; IFS=,; echo "$*" ) >>out.csv

awk

awk -F= '
    NR==1 { a = $1; sub(/[^0-9]+$/, "", $2); b = $2; next }
    { a = a "," $1; sub(/[^0-9]+$/, "", $2); b = b "," $2 }
    END { print a; print b }' OpenSimStats.txt

数据迷的第十条途径,csvtk:

csvtk replace -d= -f 2 -p '\D+$' -r '' <OpenSimStats.txt | csvtk transpose

第 11 条奖励路径vim

:%s/\D*$//
:%s/=/\r/
qaq
:g/^\D/y A | normal dd
:1,$-1 s/\n/,/
"aP
:2,$-2 s/\n/,/
:d 1
:w out.csv

答案2

awk来帮你:

awk -F= '{a[NR,1]=$1;a[NR,2]=$2}
         END{
            for(i=1; i<NR; i++){
                printf a[i,1] ","
            }
            print a[i,1]; 
            for(i=1; i<NR; i++){
                printf "%s", a[i,2]+0
            } 
            print a[i,2];
        }' file

该数组由第一列中的键和第二列中的值a填充。$1$2

读取所有行后,循环遍历数组的所有元素两次以显示键和值。

答案3

这是一个 Perl 解决方案:

$ perl -F= -lae '$F[1]=~s/[^0-9]//g; push @h,$F[0]; push @l,$F[1]; 
                  END{print join ",",@h; print join ",",@l}' OpenSimStats.txt 
TestreportsRootAgentCount,TestreportsChildAgentCount,TestreportsGCReportedMemory,TestreportsTotalObjectsCount,TestreportsTotalPhysicsFrameTime,TestreportsPhysicsUpdateFrameTime,TestreportsPrivateWorkingSetMemory,TestreportsTotalThreads,TestreportsTotalFrameTime,TestreportsTotalEventFrameTime,TestreportsLandFrameTime,TestreportsLastCompletedFrameAt,TestreportsTimeDilationMonitor,TestreportsSimFPSMonitor,TestreportsPhysicsFPSMonitor,TestreportsAgentUpdatesPerSecondMonitor,TestreportsActiveObjectCountMonitor,TestreportsActiveScriptsMonitor,TestreportsScriptEventsPerSecondMonitor,TestreportsInPacketsPerSecondMonitor,TestreportsOutPacketsPerSecondMonitor,TestreportsUnackedBytesMonitor,TestreportsPendingDownloadsMonitor,TestreportsPendingUploadsMonitor,TestreportsTotalFrameTimeMonitor,TestreportsNetFrameTimeMonitor,TestreportsPhysicsFrameTimeMonitor,TestreportsSimulationFrameTimeMonitor,TestreportsAgentFrameTimeMonitor,TestreportsImagesFrameTimeMonitor,TestreportsSpareFrameTimeMonitor,TestreportsLastReportedObjectUpdates,TestreportsSlowFrames
0,0,10,0,0,0,2144,0,89,0,0,25,1,553333320617676,554766654968262,0,0,0,0,0,0,0,0,0,1818239402771,0,00106373848393559,017440040409565,0,0,181818199157715,0,1

-a标志使perl行为类似于并将由(此处为 an )awk给出的字段分隔符上的每个输入行拆分到数组中。向每个调用添加 ,并且 是将在每行上运行的脚本。-F=@F-l\nprint-e

  • $F[1]=~s/[^0-9]//g;:从第二个字段中删除所有非数字字符(数组从 0 开始计数,$F[1]第二个字段也是如此)。
  • push @h,$F[0]; push @l,$F[1];:将第一个字段推入数组@h,将第二个字段(现在非数字已被删除)推入数组@l
  • END{}:在处理整个输入文件后执行一次。
  • print join ",",@h;:连接@h数组,并打印它。
  • print join ",",@l: 如上所述,但对于@l.

答案4

我知道这个问题已经得到解答,我想我只需添加我的两分钱来说明如何在(丑陋的)bash 中实现这一点

最终结果

echo -e $(cut -d"=" -f1 OpenSimStats.txt | tr '\n' ',' | sed 's/,$/\\n/')$(sed -r 's/.*=([0-9]*).*/\1,/g' OpenSimStats.txt | tr -d '\n' | sed 's/,$//')

这将在一行中产生您要求的以下结果:

TestreportsRootAgentCount,TestreportsChildAgentCount,TestreportsGCReportedMemory,TestreportsTotalObjectsCount,TestreportsTotalPhysicsFrameTime,TestreportsPhysicsUpdateFrameTime,TestreportsPrivateWorkingSetMemory,TestreportsTotalThreads,TestreportsTotalFrameTime,TestreportsTotalEventFrameTime,TestreportsLandFrameTime,TestreportsLastCompletedFrameAt,TestreportsTimeDilationMonitor,TestreportsSimFPSMonitor,TestreportsPhysicsFPSMonitor,TestreportsAgentUpdatesPerSecondMonitor,TestreportsActiveObjectCountMonitor,TestreportsActiveScriptsMonitor,TestreportsScriptEventsPerSecondMonitor,TestreportsInPacketsPerSecondMonitor,TestreportsOutPacketsPerSecondMonitor,TestreportsUnackedBytesMonitor,TestreportsPendingDownloadsMonitor,TestreportsPendingUploadsMonitor,TestreportsTotalFrameTimeMonitor,TestreportsNetFrameTimeMonitor,TestreportsPhysicsFrameTimeMonitor,TestreportsSimulationFrameTimeMonitor,TestreportsAgentFrameTimeMonitor,TestreportsImagesFrameTimeMonitor,TestreportsSpareFrameTimeMonitor,TestreportsLastReportedObjectUpdates,TestreportsSlowFrames
0,0,10,0,0,0,2144,0,89,0,0,25,1,55,55,0,0,0,0,0,0,0,0,0,18,0,0,0,0,0,18,0,1

代码分解

以下是所发生事件的详细情况。

外回声

echo -e $()$()

回显首先执行的两个嵌套命令的结果,以便-e打印的结果\n将转换为最终结果中的换行符。

第一个命令

cut -d"=" -f1 OpenSimStats.txt | tr '\n' ',' | sed 's/,$/\\n/'

第一个嵌套命令。用作=分隔符将所有文本提取为一列(一系列\n- 分隔值)。将全部替换\n,,然后再次将最后一个逗号替换为 a \n(否则最后一个值后面将跟有一个逗号)。

该命令本身会产生以下输出:

TestreportsRootAgentCount,TestreportsChildAgentCount,TestreportsGCReportedMemory,TestreportsTotalObjectsCount,TestreportsTotalPhysicsFrameTime,TestreportsPhysicsUpdateFrameTime,TestreportsPrivateWorkingSetMemory,TestreportsTotalThreads,TestreportsTotalFrameTime,TestreportsTotalEventFrameTime,TestreportsLandFrameTime,TestreportsLastCompletedFrameAt,TestreportsTimeDilationMonitor,TestreportsSimFPSMonitor,TestreportsPhysicsFPSMonitor,TestreportsAgentUpdatesPerSecondMonitor,TestreportsActiveObjectCountMonitor,TestreportsActiveScriptsMonitor,TestreportsScriptEventsPerSecondMonitor,TestreportsInPacketsPerSecondMonitor,TestreportsOutPacketsPerSecondMonitor,TestreportsUnackedBytesMonitor,TestreportsPendingDownloadsMonitor,TestreportsPendingUploadsMonitor,TestreportsTotalFrameTimeMonitor,TestreportsNetFrameTimeMonitor,TestreportsPhysicsFrameTimeMonitor,TestreportsSimulationFrameTimeMonitor,TestreportsAgentFrameTimeMonitor,TestreportsImagesFrameTimeMonitor,TestreportsSpareFrameTimeMonitor,TestreportsLastReportedObjectUpdates,TestreportsSlowFrames

第二条命令

sed -r 's/.*=([0-9]*).*/\1,/g' OpenSimStats.txt | tr -d '\n' | sed 's/,$//'

第二个嵌套命令。删除每行上所需数字周围的所有文本,从而生成一列数字(\n- 分隔的值)。全部替换\n,,然后删除末尾的逗号。

其结果如下:

0,0,10,0,0,0,2144,0,89,0,0,25,1,55,55,0,0,0,0,0,0,0,0,0,18,0,0,0,0,0,18,0,1

顶部的单行只是将这三个部分组合成一行以产生最终结果。

相关内容