我有包含数据的文件。这些数据有时含有我需要消除的伪影。示例行如下所示:
@@@@@@@@@@DK2018.4.24_0:0:0.200985,0.88,0.35,0.49,13.52,248.3
或者像这样:
\2017.9.12_0:0:0.152507,0.02,0.82,0.10,11.76,181.8
\2017.9.12_0:0:0.554122,0.18,0.93,0.04,11.76,191.1
\2017.9.12_0:0:0.654682,0.06,0.89,0.10,11.74,184.0
\2017.9.12_0:0:0.755092,0.00,0.89,0.06,11.77,180.5
\2017.9.12_0:0:0.855754,0.02,0.87,0.09,11.76,181.4
\2017.9.12_0:0:0.955123,0.13,0.80,0.23,11.77,189.8
\2017.9.12_0:0:1.055499,0.10,0.82,0.35,11.76,187.6
\2017.9.12_0:0:1.155970,0.18,0.81,0.40,11.74,192.9
\2017.9.12_0:0:1.256581,0.15,0.91,0.44,11.74,189.3
\2017.9.12_0:0:1.356065,0.26,0.78,0.46,11.72,198.7
\2017.9.12_0:0:1.456712,0.37,0.69,0.33,11.74,208.1
在这两种情况下,日期字符串前面都会出现一些不需要的字符。我需要删除这些并保留其他所有内容。有时,这些工件不在日期列中,而是在其他列之一中。
我尝试使用 sed 像这样:
sed 's/[^0-9:_.,]*//g' dat.log > test.log
目的是删除除数字、冒号、下划线、点和逗号之外的所有内容。这效果很好。问题是 sed 不写回换行符。我知道它会在过程中将其取消,但是当该行写入文件时如何将其恢复?
编辑:向示例输入添加了更多行,并添加了 sed 命令的输出:
2017.9.12_0:0:0.051920,0.03,0.74,0.09,11.72,182.72017.9.12_0:0:0.152507,0.02,0.82,0.10,11.76,181.82017.9.12_0:0:0.253551,0.00,0.89,0.04,11.77,180.52017.9.12_0:0:0.353267,0.04,0.96,0.02,11.77,182.72017.9.12_0:0:0.453707,0.15,0.95,0.02,11.71,189.32017.9.12_0:0:0.554122,0.18,0.93,0.04,11.76,191.12017.9.12_0:0:0.654682,0.06,0.89,0.10,11.74,184.02017.9.12_0:0:0.755092,0.00,0.89,0.06,11.77,180.52017.9.12_0:0:0.855754,0.02,0.87,0.09,11.76,181.42017.9.12_0:0:0.955123,0.13,0.80,0.23,11.77,189.82017.9.12_0:0:1.055499,0.10,0.82,0.35,11.76,187.62017.9.12_0:0:1.155970,0.18,0.81,0.40,11.74,192.92017.9.12_0:0:1.256581,0.15,0.91,0.44,11.74,189.32017.9.12_0:0:1.356065,0.26,0.78,0.46,11.72,198.72017.9.12_0:0:1.456712,0.37,0.69,0.33,11.74,208.1
编辑2:事实证明,问题在于数据来源的 Raspberry Pi 使用 Macintosh 数据集保存了文件。不知道为什么,但使用更改tr '\r' '\n' < macfile.txt > unixfile.txt
解决了问题。
答案1
在这里你似乎可以这样做:
tr -cd '0-9:_.,\r\n' < file.in > file.out
要删除除那些字符之外的所有字符,您希望保留包含行分隔符的原始格式。
答案2
你为什么不去掉并将tr
其包含在sed
命令中:
sed 's/\(^\|^M\)[^0-9:_.,]*/\n/g; s/^\n//' file
将控制字符 (^M, \r, 0x0D) 和 (\n, 0x0A) 包含在sed
,具体取决于您的操作系统和sed
版本。