多列日志文件的后处理

多列日志文件的后处理

我正在对多列日志填充进行后处理,格式如下:

/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,6, -5.3300, 201.2781, 0,,  26,  8, 1, -0.2132
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,5, -5.2300, 230.0910, 0,,  26,  8, 1, -0.2092
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,4, -5.1500, 222.2095, 0,,  26,  8, 1, -0.2060
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,5, -5.0500, 201.1757, 0,,  26,  8, 1, -0.2020
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,2, -5.0200, 233.0833, 0,,  26,  8, 1, -0.2008
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,5, -4.9500, 203.5671, 0,,  26,  8, 1, -0.1980
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,4, -4.9500, 227.0462, 0,,  26,  8, 1, -0.1980
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,14, -4.7700, 231.9237, 0,,  26,  8, 1, -0.1908
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,5, -4.7200, 194.9009, 0,,  26,  8, 1, -0.1888
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,3, -4.6700, 217.3995, 0,,  26,  8, 1, -0.1868
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,1, -4.6400, 200.7227, 0,,  26,  8, 1, -0.1856
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -4.5900, 184.7898, 0,,  26,  8, 1, -0.1836
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,3, -4.5500, 215.7487, 0,,  26,  8, 1, -0.1820
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,3, -4.4500, 198.2857, 0,,  26,  8, 1, -0.1780
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,1, -4.4200, 204.6418, 0,,  26,  8, 1, -0.1768
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,6, -4.3700, 199.5359, 0,,  26,  8, 1, -0.1748
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,6, -4.3500, 232.3248, 0,,  26,  8, 1, -0.1740
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,3, -4.2700, 234.3468, 0,,  26,  8, 1, -0.1708
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,1, -4.2500, 195.9439, 0,,  26,  8, 1, -0.1700
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,7, -4.2400, 198.9363, 0,,  26,  8, 1, -0.1696
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,1, -4.1600, 208.6377, 0,,  26,  8, 1, -0.1664
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_01_lig_cne_420,3, -4.1500, 179.4341, 0,,  26,  8, 1, -0.1660
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,4, -4.1300, 233.9607, 0,,  26,  8, 1, -0.1652
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -4.1200, 189.5660, 0,,  26,  8, 1, -0.1648
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,1, -4.1100, 209.8679, 0,,  26,  8, 1, -0.1644
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,5, -4.1000, 213.5573, 0,,  26,  8, 1, -0.1640
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,1, -4.0700, 227.6124, 0,,  26,  8, 1, -0.1628
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,3, -4.0400, 209.6345, 0,,  26,  8, 1, -0.1616
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,4, -3.9700, 233.5914, 0,,  26,  8, 1, -0.1588
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,4, -3.9500, 223.9189, 0,,  26,  8, 1, -0.1580
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,1, -3.9000, 180.8133, 0,,  26,  8, 1, -0.1560
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,1, -3.9000, 224.1828, 0,,  26,  8, 1, -0.1560
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_02_lig_cne_420,1, -3.8800, 204.1735, 0,,  26,  8, 1, -0.1552
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -3.8500, 195.5399, 0,,  26,  8, 1, -0.1540
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,2, -3.8400, 227.9037, 0,,  26,  8, 1, -0.1536

请注意,第 1 列和第 2 列以逗号 (,) 分隔,其余列以逗号空格 (, ) 分隔。从这个日志文件中我需要:

  1. 将第一列(长unix格式路径/Users/gleb/Desktop/scripts/...)中的所有数据替换为相应的行号(仅第N行);
  2. 删除第 6-9 列(最后四列);

最终生成的日志应包含相同数量的行,但仅从第 1 列(带替换!)到第 5 列(带 的最后一列0,)。

我能够完成的是使用 sed 在第一列中进行替换,但这只是切断了路径,但没有在那里引入相应的行号:

sed -i '' -e 's|\/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/*.*/||' log.txt

答案1

gawk -F'^[^,]*,|, ' '{ print NR, $2, $3, $4, $5; }' OFS=', ' infile

首先跳过行,添加NR> Nawk,所以首先行将被跳过;要跳过第一行,你可以这样做:

gawk -F'^[^,]*,|, ' 'NR> 1{ print NR, $2, $3, $4, $5; }' OFS=', ' infile

随后您需要修改NRNR-1,因此将从1不是2,或者将其替换为另一个临时变量,例如:

gawk -F'^[^,]*,|, ' 'NR> 1{ print ++lineNumber, $2, $3, $4, $5; }' OFS=', ' infile

^[^,]*,匹配从行首到第一个逗号字符;
, 匹配逗号空格字符。

上面这些我们定义为字段分隔符(用 分隔|),并基于此我们打印相应的字段;NRawk代表当前行号。


另一种选择是使用cutand nl

<infile cut -d',' -f2-6 |nl -w1 -s', '

cut命令剪切字段 2~6 并nl用逗号分隔的行编号,-w将 1 列宽度设置为数字。

相关内容