7z x
在档案上做让我
'20 ª.1 ¯® '$'\302\212''¨à®¢®£à ¤áª ï ã«.rtf' IMG_6527.JPG
''$'\302\212''¨à®¢®£à ¤áª ï, ¨áâ.doc' IMG_6532.JPG
''$'\302\204''®¯ ᮣ« 襨¥(3).doc' IMG_6542.JPG
''$'\302\204\302\212\302\217''.doc' IMG_6543.JPG IMG_6526.JPG
显然,某些文件的编码方式不同,7z
默认情况下不会转换为 UTF-8。如何告诉7z
进行转换?
我找到的字符集的唯一选项:
-scc{UTF-8|WIN|DOS}
:设置控制台输入/输出的字符集
-scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}}
:设置列表文件的字符集
WIN
,DOS
,UTF-8
不工作。当尝试通过猜测字符集时
7z -scsCP1251 l 26-08-2016_10-18-14.zip
7z 给出警告:
Unsupported charset: cp1251
unzip
这样做正确吗(西里尔字母已转换):
'20 к.1 по Кировоградская ул.rtf' IMG_6532.JPG 'Доп соглашение(3).doc'
26-08-2016_10-18-14.zip IMG_6542.JPG 'Кировоградская, ист.doc'
IMG_6526.JPG IMG_6543.JPG
IMG_6527.JPG ДКП.doc
补充资料
- p7zip 版本:
15.14.1 (locale=ru_RU.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs AMD Phenom(tm) II X4 960T Processor (100FA0),ASM)
- 存档开头的十六进制转储 (
od -tx1z -Ax
):000000 50 4b 03 04 14 00 00 00 00 00 81 54 1a 49 7e 35 >PK.........T.I~5< 000010 fa 34 00 ec 00 00 00 ec 00 00 07 00 17 00 84 8a >.4..............< 000020 8f 2e 64 6f 63 75 70 13 00 01 19 fd 45 54 d0 94 >..docup.....ET..< 000030 d0 9a d0 9f 2e 64 6f 63 00 00 00 00 d0 cf 11 e0 >.....doc........< 000040 a1 b1 1a e1 00 00 00 00 00 00 00 00 00 00 00 00 >................< 000050 00 00 00 00 3e 00 03 00 fe ff 09 00 06 00 00 00 >....>...........< 000060 00 00 00 00 00 00 00 00 01 00 00 00 71 00 00 00 >............q...< 000070 00 00 00 00 00 10 00 00 73 00 00 00 01 00 00 00 >........s.......< 000080 fe ff ff ff 00 00 00 00 70 00 00 00 ff ff ff ff >........p.......< 000090 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >................< * 000230 ff ff ff ff ff ff ff ff ff ff ff ff ec a5 c1 00 >................< 000240 07 80 19 04 00 00 f0 12 bf 00 00 00 00 00 00 10 >................< 000250 00 00 00 00 00 08 00 00 72 7b 00 00 0e 00 62 6a >........r{....bj< 000260 62 6a 2a 16 2a 16 00 00 00 00 00 00 00 00 00 00 >bj*.*...........< 000270 00 00 00 00 00 00 00 00 19 04 16 00 34 8e 00 00 >............4...< 000280 48 7c 00 00 48 7c 00 00 4b 2c 00 00 00 00 00 00 >H|..H|..K,......< 000290 19 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >................< 0002a0 00 00 00 00 00 00 00 00 ff ff 0f 00 00 00 00 00 >................< 0002b0 00 00 00 00 ff ff 0f 00 00 00 00 00 00 00 00 00 >................< 0002c0 ff ff 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 >................< 0002d0 00 00 00 00 b7 00 00 00 00 00 3e 0e 00 00 00 00 >..........>.....< 0002e0 00 00 3e 0e 00 00 a0 1b 00 00 00 00 00 00 a0 1b >..>.............< 0002f0 00 00 00 00 00 00 a0 1b 00 00 00 00 00 00 a0 1b >................< 000300 00 00 00 00 00 00 a0 1b 00 00 14 00 00 00 00 00 >................< 000310 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 b4 1b >................< 000320 00 00 00 00 00 00 b4 1b 00 00 00 00 00 00 b4 1b >................< 000330 00 00 38 00 00 00 ec 1b 00 00 84 00 00 00 70 1c >..8...........p.< 000340 00 00 34 00 00 00 b4 1b 00 00 00 00 00 00 b8 28 >..4............(< 000350 00 00 e6 01 00 00 a4 1c 00 00 00 00 00 00 a4 1c >................< 000360 00 00 00 00 00 00 a4 1c 00 00 00 00 00 00 a4 1c >................< 000370 00 00 00 00 00 00 a4 1c 00 00 00 00 00 00 d8 1d >................< 000380 00 00 00 00 00 00 d8 1d 00 00 00 00 00 00 d8 1d >................< 000390 00 00 00 00 00 00 43 28 00 00 02 00 00 00 45 28 >......C(......E(< 0003a0 00 00 00 00 00 00 45 28 00 00 00 00 00 00 45 28 >......E(......E(< * 0003c0 00 00 00 00 00 00 45 28 00 00 00 00 00 00 9e 2a >......E(.......*< 0003d0 00 00 a2 02 00 00 40 2d 00 00 da 00 00 00 45 28 >[email protected](< 0003e0 00 00 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 >..-.............< 0003f0 00 00 00 00 00 00 a0 1b 00 00 00 00 00 00 d8 1d >................< 000400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >................< 000410 00 00 00 00 00 00 d8 1d 00 00 00 00 00 00 d8 1d >................< 000420
答案1
根据用于创建 zip 文件的编码,您可以通过临时将区域设置设置为“C”来防止不需要的翻译:
LC_ALL=C 7z x $archive
(这有助于 IZArc 在 Win7 上使用两个示例文件名创建 zip。)
但是,对于问题中的存档,“文件名”字段包含“ДКП.doc”的 CP1251 编码 ( 84 8a 8f 2e 64 6f 63
)。 “extra”字段使用 Info-zip 扩展名(参见 4.6.9 节)Zip 规范 v 6.3.4) 来存储 UTF-8 文件名。 unzip
知道此标头,并使用 UTF-8 名称,忽略 CP1251 名称。
7z
不对这个“额外字段”做任何事情,只使用 CP1251。根据当前的区域设置,它可能会使用确切的名称(原始字节84 8a 8f
)创建文件,或者更糟糕的是,将它们视为首先扩展到 UTF-8 的 unicode 点 ( c2 84 c2 8a c2 8f
)。
一种选择是使用外部实用程序来改变zip
第一个:
#!/bin/bash
cp orig.zip renamed.zip
index=0
zipinfo -1 orig.zip | while read name ; do
ziptool renamed.zip rename $index "$name"
index=$((index+1))
done
ziptool
来自解压缩。 zipinfo
分布于Info-ZIP 的解压,所以你可能刚刚使用过unzip
。
答案2
p7zip
我在 p7zip 的页面上找到了与《Igor Pavlov》的作者 Igor Pavlov 的讨论主题:Linux 中的 OEM 字符集问题。这是这个问答的双胞胎。这篇文章说明了一切。
-mcp 开关可能在 p7zip 中不起作用。但 -mcp 可在 7-zip(Windows 版本)中使用。所以现在我不知道如何让它适用于 p7zip。函数:CPP\Common\StringConvert.cpp 中的 Rusting MultiByteToUnicodeString(const AString &srcString, UINT codePage)
日期为 2016 年 4 月 18 日。我检查了 7 月份最新的 p7zip 版本,但开关仍然丢失。至少在文档中,因为我没有测试。
答案3
答案4
使用7z -scs1251 l 26-08-2016_10-18-14.zip