我有一个包含许多字符的日志文件^H
,^M
因为生成该文件的过程会更新基于文本的进度条。
使用时,cat
输出会被评估并且显得人类可读且简洁。下面是一个示例输出。
Epoch 11/120
4355/4355 [==============================] - ETA: 0s - loss: 0.0096
Epoch 00011: val_loss did not improve from 0.00992
4355/4355 [==============================] - 1220s 280ms/step - loss: 0.0096 - val_loss: 0.0100
cat
然而,与上面的实际打印文本(大约 900 行,70MB)相比,文件本身很大。
以下是日志文件中包含的实际文本的片段。
1/Unknown - 0s 81us/step - loss: 0.5337^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 2/Unknown - 1s 438ms/step - loss: 0.5299^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^
H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 3/Unknown - 1s 386ms/step - loss: 0.5286^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 4/Unknown - 1s 357ms/step - loss: 0.5289^H^H^H^H^H^H^H^H^H^
H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 5/Unknown - 2s 339ms/step - loss: 0.5277^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 6/Unknown - 2s 327ms/
step - loss: 0.5258^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 7/Unknown - 2s 318ms/step - loss: 0.5250^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^
H^H^H^H^H^M 8/Unknown - 2s 312ms/step - loss: 0.5260^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 9/Unknown - 3s 307ms/step - loss: 0.5265^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^
H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 10/Unknown - 3s 303ms/step - loss: 0.5257^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^M 11/Unknown - 3s 299ms/step - loss: 0.5258^H^H^H^
基本上,我想创建一个看起来与cat
生成的文件一样的文件。
以下是我尝试过但收效甚微的一些事情:
tr -d '\b\r' < logfile > new_file
删除所有字符,但因此留下所有不需要的文本。cat logfile > new_file
实际上只是逐字复制文件,而不评估特殊字符。cat logfile | col -b > new_file
非常接近,但在重复的行之一上做了一些奇怪的事情:
4355/4355 [==============================] - ETA: 0ss--loss::0.0096557
Epoch 00011: val_loss did not improve from 0.00992
4355/4355 [==============================] - 1220s 280ms/step - loss: 0.0096 - val_loss: 0.0100
任何帮助,将不胜感激。
谢谢
答案1
为了清楚起见,将其发布为答案。
作为划艇指出,在这种情况下,该命令awk -F '\r' '{print $NF}' file
按预期工作,删除最后一个回车符后的所有内容。虽然这并不稳健泽夫泽克指出。
我在下面用 C++ 编写了一个更强大的解决方案。
#include <fstream>
#include <string>
#include <iostream>
using namespace std;
string filter_string(string line, const char *bspace, const char *creturn){
string new_str;
for(string::size_type i = 0; i < line.size(); ++i) {
// Step back if current string not empty
if (line[i] == *bspace){
if (new_str.size() != 0){
new_str.pop_back();
};
// Reset on carriage return
} else if (line[i] == *creturn){
new_str = "";
} else {
new_str += line[i];
};
}
return new_str;
};
int main(int argc, char* argv[]){
const char backspace = '\x08';
const char creturn = '\r';
if (argc != 2){
cerr << "USAGE: " << argv[0] << " [src]" << endl;
return 1;
}
// Filter lines in file
string line;
ifstream infile(argv[1]);
while (getline(infile, line)){
cout << filter_string(line, &backspace, &creturn) << endl;
};
return 0;
};
这里迭代每行中的每个字符,如果^H
存在 a,则字符串被推回一位(如果尚未为空),如果^M
存在回车符,则重置字符串。输出被发送到stdout
,然后可以通过管道传输到文件。
答案2
sed 's/.*\x0d//' logfile
似乎按照你的要求做。
请注意,col -b
失败是因为它忽略了空格:
$ echo $'--------\r1st try\r2nd \r3rd\n' | col -b
3rd-try-