批量修复 Excel 97-2003 错误“格式与扩展名不匹配”

Question 1

错误“格式与扩展名不匹配”意味着文件不是真正的.xls。由于 Excel 仍然可以打开它们，因此它们可能是受支持的另一种格式。我猜是.xlsx。

您可以使用十六进制编辑器进行测试（例如氢键）检查文件签名，它在前几个字节中：

xlsx：前 4 个字节是50 4B 03 04
xls：前 8 个字节是D0 CF 11 E0 A1 B1 1A E1

要一次重命名当前文件夹中的所有文件，可以使用命令提示符（CMD）和以下命令：

ren *.xls *.xlsx

如果您的文件签名不是上述之一，请将其添加到您的帖子中。这将有助于识别这些文件。

由于文件是 HTML，而不是 xsl 甚至 xslx，因此可以使用 VBA 在 Excel 中进行批量转换。

文章如何在 Excel 中将多个 xls 文件转换为 xlsx 文件？包含适用于海报的脚本：

Sub ConvertToXlsx()
'Updateby Extendoffice
Dim strPath As String
Dim strFile As String
Dim xWbk As Workbook
Dim xSFD, xRFD As FileDialog
Dim xSPath As String
Dim xRPath As String
Set xSFD = Application.FileDialog(msoFileDialogFolderPicker)
With xSFD
.Title = "Please select the folder contains the xls files:"
.InitialFileName = "C:\"
End With
If xSFD.Show <> -1 Then Exit Sub
xSPath = xSFD.SelectedItems.Item(1)
Set xRFD = Application.FileDialog(msoFileDialogFolderPicker)
With xRFD
.Title = "Please select a folder for outputting the new files:"
.InitialFileName = "C:\"
End With
If xRFD.Show <> -1 Then Exit Sub
xRPath = xRFD.SelectedItems.Item(1) & "\"
strPath = xSPath & "\"
strFile = Dir(strPath & "*.xls")
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Do While strFile <> ""
If Right(strFile, 3) = "xls" Then
Set xWbk = Workbooks.Open(Filename:=strPath & strFile)
xWbk.SaveAs Filename:=xRPath & strFile & "x", _
FileFormat:=xlOpenXMLWorkbook
xWbk.Close SaveChanges:=False
End If
strFile = Dir
Loop
Application.DisplayAlerts = True
Application.ScreenUpdating = True
End Sub

Answer

错误“格式与扩展名不匹配”意味着文件不是真正的.xls。由于 Excel 仍然可以打开它们，因此它们可能是受支持的另一种格式。我猜是.xlsx。

您可以使用十六进制编辑器进行测试（例如氢键）检查文件签名，它在前几个字节中：

xlsx：前 4 个字节是50 4B 03 04
xls：前 8 个字节是D0 CF 11 E0 A1 B1 1A E1

要一次重命名当前文件夹中的所有文件，可以使用命令提示符（CMD）和以下命令：

ren *.xls *.xlsx

如果您的文件签名不是上述之一，请将其添加到您的帖子中。这将有助于识别这些文件。

由于文件是 HTML，而不是 xsl 甚至 xslx，因此可以使用 VBA 在 Excel 中进行批量转换。

文章如何在 Excel 中将多个 xls 文件转换为 xlsx 文件？包含适用于海报的脚本：

Sub ConvertToXlsx()
'Updateby Extendoffice
Dim strPath As String
Dim strFile As String
Dim xWbk As Workbook
Dim xSFD, xRFD As FileDialog
Dim xSPath As String
Dim xRPath As String
Set xSFD = Application.FileDialog(msoFileDialogFolderPicker)
With xSFD
.Title = "Please select the folder contains the xls files:"
.InitialFileName = "C:\"
End With
If xSFD.Show <> -1 Then Exit Sub
xSPath = xSFD.SelectedItems.Item(1)
Set xRFD = Application.FileDialog(msoFileDialogFolderPicker)
With xRFD
.Title = "Please select a folder for outputting the new files:"
.InitialFileName = "C:\"
End With
If xRFD.Show <> -1 Then Exit Sub
xRPath = xRFD.SelectedItems.Item(1) & "\"
strPath = xSPath & "\"
strFile = Dir(strPath & "*.xls")
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Do While strFile <> ""
If Right(strFile, 3) = "xls" Then
Set xWbk = Workbooks.Open(Filename:=strPath & strFile)
xWbk.SaveAs Filename:=xRPath & strFile & "x", _
FileFormat:=xlOpenXMLWorkbook
xWbk.Close SaveChanges:=False
End If
strFile = Dir
Loop
Application.DisplayAlerts = True
Application.ScreenUpdating = True
End Sub

Question 2

作为文件看起来像是 HTML 文件（在我看来，这是一种存储数据的疯狂方式）你可以研究能够抓取/解析网页的方法。

统计软件

我没有 Stata 可以玩，但是找到了readhtmlreadhtmltable该包具有读取网页表格的功能（不确定它是否可以处理本地 HTML 文件）。

Python

或者，我可以使用 Python 和pandas，它有一个read_html方法。我尝试了F - A & N Islands_September.xls你上传的文件而且效果相当好。

为了使其工作，您需要Python 环境中的lxml和包。pandas

import pandas as pd 
# This reads in the 'xls' file (which is actually HTML)
df = pd.read_html(r"c:\path\to\F - A & N Islands_September.xls")
# The result is a list with length one, so get the actual DataFrame with
df = df[0]
# Show the first few rows:
df.head()

                  Unnamed: 0_level_0 Unnamed: 1_level_0                                                   Unnamed: 2_level_0 Unnamed: 3_level_0       District                                               
                  Unnamed: 0_level_1 Unnamed: 1_level_1                                                   Unnamed: 2_level_1 Unnamed: 3_level_1 _A & N Islands Nicobar North and Middle Andaman South Andaman
0  M1 [Ante Natal Care Services ANC]                1.1                    Total number of pregnant women Registered for ANC              TOTAL            NaN     NaN                      NaN           NaN
1  M1 [Ante Natal Care Services ANC]              1.1.1  Of which Number registered within first trimester (within 12 weeks)              TOTAL            NaN     NaN                      NaN           NaN
2  M1 [Ante Natal Care Services ANC]                1.2                        Number of Pregnant women registered under JSY              TOTAL            NaN     NaN                      NaN           NaN
3  M1 [Ante Natal Care Services ANC]                1.3   Number of pregnant women received 3 ANC check ups during pregnancy              TOTAL            NaN     NaN                      NaN           NaN
4  M1 [Ante Natal Care Services ANC]              1.4.1          Number of pregnant women given TT1 during current pregnancy              TOTAL            NaN     NaN                      NaN           NaN

要将文件批量转换为 CSV，您可以执行以下操作：

from pathlib import Path
import pandas as pd

# Assuming your 'XLS' files are in subfolder data next to your python script
data = Path(r"./data")
data.mkdir(parents=True, exist_ok=True)
results = Path(r"./results")
results.mkdir(parents=True, exist_ok=True)

# Loop over all XLS files
for f in data.glob("*.xls"):
    outfile = results / f.with_suffix('.csv').name
    df = pd.read_html(f)[0]
    df.to_csv(outfile, index=False)

Answer

作为文件看起来像是 HTML 文件（在我看来，这是一种存储数据的疯狂方式）你可以研究能够抓取/解析网页的方法。

统计软件

我没有 Stata 可以玩，但是找到了readhtmlreadhtmltable该包具有读取网页表格的功能（不确定它是否可以处理本地 HTML 文件）。

Python

或者，我可以使用 Python 和pandas，它有一个read_html方法。我尝试了F - A & N Islands_September.xls你上传的文件而且效果相当好。

为了使其工作，您需要Python 环境中的lxml和包。pandas

import pandas as pd 
# This reads in the 'xls' file (which is actually HTML)
df = pd.read_html(r"c:\path\to\F - A & N Islands_September.xls")
# The result is a list with length one, so get the actual DataFrame with
df = df[0]
# Show the first few rows:
df.head()

                  Unnamed: 0_level_0 Unnamed: 1_level_0                                                   Unnamed: 2_level_0 Unnamed: 3_level_0       District                                               
                  Unnamed: 0_level_1 Unnamed: 1_level_1                                                   Unnamed: 2_level_1 Unnamed: 3_level_1 _A & N Islands Nicobar North and Middle Andaman South Andaman
0  M1 [Ante Natal Care Services ANC]                1.1                    Total number of pregnant women Registered for ANC              TOTAL            NaN     NaN                      NaN           NaN
1  M1 [Ante Natal Care Services ANC]              1.1.1  Of which Number registered within first trimester (within 12 weeks)              TOTAL            NaN     NaN                      NaN           NaN
2  M1 [Ante Natal Care Services ANC]                1.2                        Number of Pregnant women registered under JSY              TOTAL            NaN     NaN                      NaN           NaN
3  M1 [Ante Natal Care Services ANC]                1.3   Number of pregnant women received 3 ANC check ups during pregnancy              TOTAL            NaN     NaN                      NaN           NaN
4  M1 [Ante Natal Care Services ANC]              1.4.1          Number of pregnant women given TT1 during current pregnancy              TOTAL            NaN     NaN                      NaN           NaN

要将文件批量转换为 CSV，您可以执行以下操作：

from pathlib import Path
import pandas as pd

# Assuming your 'XLS' files are in subfolder data next to your python script
data = Path(r"./data")
data.mkdir(parents=True, exist_ok=True)
results = Path(r"./results")
results.mkdir(parents=True, exist_ok=True)

# Loop over all XLS files
for f in data.glob("*.xls"):
    outfile = results / f.with_suffix('.csv').name
    df = pd.read_html(f)[0]
    df.to_csv(outfile, index=False)

批量修复 Excel 97-2003 错误“格式与扩展名不匹配”

语境

我尝试过的方法

问题

答案1

答案2

统计软件

Python

相关内容