答案1
正如其他用户提到的,坏消息是您无法将 1900 年之前的日期存储为实际日期。
好消息是,您可以使用 PowerQuery 完成几乎所有工作。
您可以直接连接到网页并返回页面上表格中的数据(无需复制和粘贴)。您可以使用“数据”>“获取和转换数据”>“从 Web”,然后将维基百科页面的 URL 粘贴到文本框中:
这将允许您选择页面上的表格并将其拉入 PowerQuery,您可以在其中应用任意数量的步骤来准备数据。如果您不熟悉 M 语言,那么其中一些文本解析可能会非常复杂,因此我为您准备了一个查询。如果您使用数据>获取和转换数据>获取数据>从其他来源>空白查询>,然后在 Power Query 编辑器中单击主页选项卡上的高级编辑器按钮,则可以粘贴以下代码:
let
Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/List_of_American_Civil_War_battles")),
Data0 = Source{0}[Data],
#"Change data types" = Table.TransformColumnTypes(Data0,{{"Date", type text}, {"Battle", type text}, {"State", type text}, {"CWSAC", type text}, {"Outcome Victory", type text}, {"Outcome Notes", type text}}),
#"Filter out the headers" = Table.SelectRows(#"Change data types", each [Date] <> "Date"),
#"Duplicate the Date column" = Table.DuplicateColumn(#"Filter out the headers", "Date", "Date - Copy"),
#"Split the date column on ""-""" = Table.SplitColumn(#"Duplicate the Date column", "Date - Copy", Splitter.SplitTextByDelimiter("–", QuoteStyle.Csv), {"Date - Copy.1", "Date - Copy.2"}),
#"Trim the split date columns" = Table.TransformColumns(#"Split the date column on ""-""",{{"Date - Copy.2", Text.Trim, type text}, {"Date - Copy.1", Text.Trim, type text}}),
#"Convert the split date columns to text" = Table.TransformColumnTypes(#"Trim the split date columns",{{"Date - Copy.1", type text}, {"Date - Copy.2", type text}}),
#"Add Start Date Column" = Table.AddColumn(#"Convert the split date columns to text", "Start date", each if not Text.Contains([#"Date - Copy.1"],",") then
[#"Date - Copy.1"] & ", " & Text.End([#"Date - Copy.2"],4) else [#"Date - Copy.1"]),
#"Add End Date Column" = Table.AddColumn(#"Add Start Date Column", "End date", each if [#"Date - Copy.2"] = null then [Start date] else
if Value.Is(Value.FromText(
List.First(Text.Split([#"Date - Copy.2"]," "))), type text)
then [#"Date - Copy.2"] else List.First(Text.Split([Start date]," ")) & " " & [#"Date - Copy.2"]),
#"Remove the split date columns" = Table.RemoveColumns(#"Add End Date Column",{"Date - Copy.1", "Date - Copy.2"}),
#"Move the Start and End Dates to the beginning" = Table.ReorderColumns(#"Remove the split date columns",{"Date", "Start date", "End date", "Battle", "State", "CWSAC", "Outcome Victory", "Outcome Notes"}),
#"Add Start Year" = Table.AddColumn(#"Move the Start and End Dates to the beginning", "Start Year", each Number.FromText(Text.End([Start date],4))),
#"Add End Year" = Table.AddColumn(#"Add Start Year", "End Year", each Number.FromText(Text.End([End date],4))),
#"Add Start Month" = Table.AddColumn(#"Add End Year", "Start Month", each Text.Split([Start date]," "){0}),
#"Add End Month" = Table.AddColumn(#"Add Start Month", "End Month", each Text.Split([End date]," "){0}),
#"Add Start Day" = Table.AddColumn(#"Add End Month", "Start Day", each Number.FromText(Text.Replace(Text.Split([Start date]," "){1},",",""))),
#"Add End Day" = Table.AddColumn(#"Add Start Day", "End Day", each Number.FromText(Text.Replace(Text.Split([End date]," "){1},",",""))),
#"Reordered Columns" = Table.ReorderColumns(#"Add End Day",{"Date", "Start date", "Start Year", "Start Month", "Start Day", "End date", "End Year", "End Month", "End Day", "Battle", "State", "CWSAC", "Outcome Victory", "Outcome Notes"})
in
#"Reordered Columns"
将其粘贴到高级编辑器并关闭高级编辑器后,您应该看到如下内容:
从这里开始,我鼓励您单击查询设置中每个步骤旁边的小齿轮图标,以查看转换是如何完成的(如果您有兴趣学习的话!)。您会注意到我将日期范围分为开始日期和结束日期。我假设如果它不是一个范围,那么结束日期与开始日期相同(即战斗在一天内结束)。我还将每个日期分为其组成日、月和年,以防您想过滤这些项目。
准备就绪后,单击“主页>关闭并加载”将查询放入工作簿中。
祝你好运!