我每天都会收到客户发来的数据,但是我读不懂。这是个加密文件。如果我直接用cat
、less
或 来打开它vi
,我读不懂。
客户共享的数据文件为file_name.ZIP.zip
(120 Mb)。我提取内容后,又得到了一个类似file_name.ZIP.zip
(120 Mb) 和file_name.ZIP
(125 mb) 的 zip 文件。再次提取后,我得到了file_name
(4-5 GB),文件类型为application/octet-stream; charset=binary format
。
注意:我收到的文件有一些不同的格式,比如二进制、ISO8859等等。
样本数据:
$ hexdump -C file_name | head
00000000 40 40 40 40 60 60 40 40 40 40 40 40 40 40 40 40 |@@@@``@@@@@@@@@@|
00000010 40 40 40 40 40 00 00 00 00 00 00 00 00 00 00 00 |@@@@@...........|
00000020 00 00 00 00 00 60 60 40 40 40 40 40 40 60 60 40 |.....``@@@@@@``@|
00000030 40 40 40 40 40 40 40 40 40 40 00 00 00 00 00 00 |@@@@@@@@@@......|
00000040 00 00 00 00 00 00 00 00 00 00 60 60 40 40 40 40 |..........``@@@@|
00000050 40 40 60 60 40 40 40 40 40 40 40 40 40 40 00 00 |@@``@@@@@@@@@@..|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60 60 |..............``|
00000070 40 40 40 40 40 40 60 60 40 40 40 40 40 40 40 40 |@@@@@@``@@@@@@@@|
00000080 40 40 40 00 00 00 00 00 00 00 00 00 00 00 00 00 |@@@.............|
00000090 00 00 00 60 60 40 40 40 40 40 40 60 60 40 40 40 |...``@@@@@@``@@@|
检查文件格式/类型,
$ file -bi file_name
application/octet-stream; charset=binary
之后我尝试使用更改文件格式iconv -l
,
iconv -f ascii -t utf-8 file_name > New_file_name.txt;
或者
iconv -f ISO8859-1 -t utf-8 file_name -o New_file_name.txt;
我如何解码或以人类可读的格式查看此文件?
如果我使用 hexdump,
$ hexdump -C file_name | head
00000000 40 40 40 40 60 60 40 40 40 40 40 40 40 40 40 40 |@@@@``@@@@@@@@@@|
00000010 40 40 40 40 40 00 00 00 00 00 00 00 00 00 00 00 |@@@@@...........|
00000020 00 00 00 00 00 60 60 40 40 40 40 40 40 60 60 40 |.....``@@@@@@``@|
00000030 40 40 40 40 40 40 40 40 40 40 00 00 00 00 00 00 |@@@@@@@@@@......|
00000040 00 00 00 00 00 00 00 00 00 00 60 60 40 40 40 40 |..........``@@@@|
00000050 40 40 60 60 40 40 40 40 40 40 40 40 40 40 00 00 |@@``@@@@@@@@@@..|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60 60 |..............``|
00000070 40 40 40 40 40 40 60 60 40 40 40 40 40 40 40 40 |@@@@@@``@@@@@@@@|
00000080 40 40 40 00 00 00 00 00 00 00 00 00 00 00 00 00 |@@@.............|
00000090 00 00 00 60 60 40 40 40 40 40 40 60 60 40 40 40 |...``@@@@@@``@@@|
答案1
在非文本文件上使用 iconv 是不合适的。
您可以使用十六进制转储程序来查看二进制文件的内容。
$ hexdump -C binary.data | head
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 c0 0e 40 00 00 00 00 00 |..>.......@.....|
00000020 40 00 00 00 00 00 00 00 80 56 00 00 00 00 00 00 |@........V......|
00000030 00 00 00 00 40 00 38 00 08 00 40 00 1f 00 1e 00 |[email protected]...@.....|
00000040 06 00 00 00 05 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000050 40 00 40 00 00 00 00 00 40 00 40 00 00 00 00 00 |@.@.....@.@.....|
00000060 c0 01 00 00 00 00 00 00 c0 01 00 00 00 00 00 00 |................|
00000070 08 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00 |................|
00000080 00 02 00 00 00 00 00 00 00 02 40 00 00 00 00 00 |..........@.....|
00000090 00 02 40 00 00 00 00 00 1c 00 00 00 00 00 00 00 |..@.............|
在很多情况下,这不会立即带来启发,你需要做以下两件事之一
- 获取文件格式的规范并获取或编写以人类可读形式呈现数据的解码器。
- 使用内容域的任何知识和演绎推理来检查二进制内容并找出(逆向工程)结构和含义。这通常是一项艰苦的工作。