我有医院关于患者死亡的数据。数据排列不正确,数据如下链接所示
所有日期应位于 DOA(H 列)或 DOD(I 列)或 MB(J 列)列中,其余文本应排列在单独的列中。有谁能帮我清理这些数据,因为我有超过 5000 个观测值。 在此处输入链接描述
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| Sl.NO | District | State P No | Age In Years | Sex | Symptoms | Co-Morbidities | DOA | DOD | MB Date | Notes |
+=======+===========+=============+===============+=========+===================+====================================================+==================+=================================+============+=======================+
| 10 | X4 | 6553 | 53 | F | Fever | Cold | Cough | Thyroid disease | 10-06-2020 | 20-06-2020 |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 11 | X5 | 8872 | 62 | M | Fever | Diabetes Mellitus | 16-06-2020 | 16-06-2020 | 21-06-2020 | |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 12 | X5 | 8880 | 55 | M | Pneumonia | Respiratory distress Obese, Chronic Alcoholic | 18-06-2020 | 20-06-2020 | 21-06-2020 | |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 13 | X2 | 9149 | 70 | M | Loss of Appetite | Weakness, Hypertension | 18-06-2020 | 18-06-2020 | 21-06-2020 | |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 14 | X3 | 9150 | 46 | M | Weakness | Convulsions, Hypertension | 17-06-2020 | 18-06-2020 | 21-06-2020 | |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 15 | X4 | 7732 | 60 | Female | Fever | Cough | Breathlessness | uncontrolled Diabetes Mellitus | 17-06-2020 | 22-06-2020 |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 16 | X5 | 9237 | 90 | M | Asymptomatic | Hypertension | | 20-06-2020 | 22-06-2020 | Died at his residence |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
答案1
Power Query
您可以使用Windows Excel 2010+ 和 Office 365 Excel 中提供的获取所需的输出
- 选择原始表格中的某个单元格
Data => Get&Transform => From Table/Range
- 当 PQ UI 打开时,导航至
Home => Advanced Editor
- 记下代码第 2 行的表名称。
- 用以下代码替换现有代码M 代码以下
- 将粘贴代码第 2 行的表名更改为您的“真实”表名
- 检查任何注释以及窗口
Applied Steps
,以更好地理解算法和步骤
根据您的数据样本,我假设当出现错误时,这是因为有三种合并症被用逗号分开。
既然如此,我就测试一下该DOD
列是否包含日期。
- 如果 DOD 包含日期,则
- 将共病与 DOA 和 DOD 连接起来
- 从 MB 日期列获取 DOA 日期
- 从注释列获取 DOD 日期
- 将 MB 日期设置为空
如果该逻辑并不适用于所有人,或者存在错误,则可以很容易地进行更改。
Table.Group
所有处理魔法都发生在函数的参数中
M 代码
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
//Replace any blanks ("") with null for easier processing down the road
#"Replaced Value" = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Sl.NO", "District", "State P No", "Age In Years", "Sex", "Symptoms", "Co-Morbidities", "DOA", "DOD", "MB Date", "Notes"}),
//Group the table by the first six columns
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Sl.NO", "District", "State P No", "Age In Years", "Sex", "Symptoms"}, {
//Extract the columns to be processed using the logic shown
{"Co-Morbidities", each
if Value.Is([DOD]{0}, type datetime)
then [#"Co-Morbidities"]{0}
else Text.Combine({[#"Co-Morbidities"]{0},[DOA]{0}, [DOD]{0}},", "), type text},
{"DOA", each
if Value.Is([DOD]{0}, type datetime)
then DateTime.Date([DOA]{0})
else if [DOA]{0} = null then null
else DateTime.Date([#"MB Date"]{0}), type date},
{"DOD", each
if Value.Is([DOD]{0}, type datetime)
then DateTime.Date([DOD]{0})
else DateTime.Date([Notes]{0}), type date},
{"MB Date", each
if Value.Is([DOD]{0}, type datetime)
then DateTime.Date([#"MB Date"]{0})
else null, type date},
{"Notes", each
if Value.Is([Notes]{0}, type datetime)
then null
else [Notes]{0}}
})
in
#"Grouped Rows"
答案2
分阶段进行(因为您将不得不检查这一点)!这主要是因为您的示例仅显示 2 个日期,并且很可能缺少数据。好像这不像修复电子表格那么简单。
原始数据在 A2:k8。DOA 列 H 和 DOD 列 I。
Cell L2 =ISNUMBER(SEARCH("2020",H2))
Cell M2 =ISNUMBER(SEARCH("2020",I2))
正确:日期正确。错误:无日期。数据输入错误。
Cell N2 =IF(AND(NOT($L2),NOT($M2)),G2&", "&H2&", "&I2,"")
Cell O2 =IF(AND(NOT($L2),NOT($M2)),K2,IF(H2<>"",H2,""))
Cell P2 =IF(AND(NOT($L2),NOT($M2)),K2,I2)
Cell Q2 =IF(AND(NOT($L2),NOT($M2)),"",J2)
Cell R2 =IF(AND(NOT($L2),NOT($M2)),"",IF(K2<>"",K2,""))
如果正确,则使用现有数据,否则将不适当的数据移至适当的列。
进行备份。输入公式。检查、检查、检查。转换为值并删除坏列。