从列中提取 Excel 数据

从列中提取 Excel 数据

我有医院关于患者死亡的数据。数据排列不正确,数据如下链接所示

所有日期应位于 DOA(H 列)或 DOD(I 列)或 MB(J 列)列中,其余文本应排列在单独的列中。有谁能帮我清理这些数据,因为我有超过 5000 个观测值。 在此处输入链接描述

+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| Sl.NO | District  | State P No  | Age In Years  | Sex     | Symptoms          | Co-Morbidities                                     | DOA              | DOD                             | MB Date    | Notes                 |
+=======+===========+=============+===============+=========+===================+====================================================+==================+=================================+============+=======================+
| 10    | X4        | 6553        | 53            | F       | Fever             |  Cold                                              |  Cough           | Thyroid disease                 | 10-06-2020 | 20-06-2020            |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 11    | X5        | 8872        | 62            | M       | Fever             | Diabetes Mellitus                                  | 16-06-2020       | 16-06-2020                      | 21-06-2020 |                       |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 12    | X5        | 8880        | 55            | M       | Pneumonia         |  Respiratory distress Obese,   Chronic Alcoholic   | 18-06-2020       | 20-06-2020                      | 21-06-2020 |                       |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 13    | X2        | 9149        | 70            | M       | Loss of Appetite  |  Weakness, Hypertension                            | 18-06-2020       | 18-06-2020                      | 21-06-2020 |                       |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 14    | X3        | 9150        | 46            | M       | Weakness          |  Convulsions, Hypertension                         | 17-06-2020       | 18-06-2020                      | 21-06-2020 |                       |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 15    | X4        | 7732        | 60            | Female  | Fever             |  Cough                                             |  Breathlessness  | uncontrolled Diabetes Mellitus  | 17-06-2020 | 22-06-2020            |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+
| 16    | X5        | 9237        | 90            | M       | Asymptomatic      | Hypertension                                       |                  | 20-06-2020                      | 22-06-2020 | Died at his residence |
+-------+-----------+-------------+---------------+---------+-------------------+----------------------------------------------------+------------------+---------------------------------+------------+-----------------------+

答案1

Power Query您可以使用Windows Excel 2010+ 和 Office 365 Excel 中提供的获取所需的输出

  • 选择原始表格中的某个单元格
  • Data => Get&Transform => From Table/Range
  • 当 PQ UI 打开时,导航至Home => Advanced Editor
  • 记下代码第 2 行的表名称。
  • 用以下代码替换现有代码M 代码以下
  • 将粘贴代码第 2 行的表名更改为您的“真实”表名
  • 检查任何注释以及窗口Applied Steps,以更好地理解算法和步骤

根据您的数据样本,我假设当出现错误时,这是​​因为有三种合并症被用逗号分开。

既然如此,我就测试一下该DOD列是否包含日期。

  • 如果 DOD 包含日期,则
    • 将共病与 DOA 和 DOD 连接起来
    • 从 MB 日期列获取 DOA 日期
    • 从注释列获取 DOD 日期
    • 将 MB 日期设置为空

如果该逻辑并不适用于所有人,或者存在错误,则可以很容易地进行更改。

Table.Group所有处理魔法都发生在函数的参数中

M 代码

let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],

//Replace any blanks ("") with null for easier processing down the road
    #"Replaced Value" = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Sl.NO", "District", "State P No", "Age In Years", "Sex", "Symptoms", "Co-Morbidities", "DOA", "DOD", "MB Date", "Notes"}),

//Group the table by the first six columns
    #"Grouped Rows" = Table.Group(#"Replaced Value", {"Sl.NO", "District", "State P No", "Age In Years", "Sex", "Symptoms"}, {

//Extract the columns to be processed using the logic shown
        {"Co-Morbidities", each 
            if Value.Is([DOD]{0}, type datetime)
                then [#"Co-Morbidities"]{0}
                else Text.Combine({[#"Co-Morbidities"]{0},[DOA]{0}, [DOD]{0}},", "), type text},
        {"DOA", each 
            if Value.Is([DOD]{0}, type datetime)
                then DateTime.Date([DOA]{0})
                else if [DOA]{0} = null then null 
                else DateTime.Date([#"MB Date"]{0}), type date},
        {"DOD", each 
            if Value.Is([DOD]{0}, type datetime)
                then DateTime.Date([DOD]{0})
                else DateTime.Date([Notes]{0}), type date},
        {"MB Date", each 
                if Value.Is([DOD]{0}, type datetime)
                    then DateTime.Date([#"MB Date"]{0})
                    else null, type date},
        {"Notes", each 
                if Value.Is([Notes]{0}, type datetime)
                    then null 
                    else [Notes]{0}}
        })
in
    #"Grouped Rows"

结果
在此处输入图片描述

答案2

分阶段进行(因为您将不得不检查这一点)!这主要是因为您的示例仅显示 2 个日期,并且很可能缺少数据。好像这不像修复电子表格那么简单。

在此处输入图片描述

原始数据在 A2:k8。DOA 列 H 和 DOD 列 I。

Cell L2 =ISNUMBER(SEARCH("2020",H2))
Cell M2 =ISNUMBER(SEARCH("2020",I2))

正确:日期正确。错误:无日期。数据输入错误。

Cell N2 =IF(AND(NOT($L2),NOT($M2)),G2&", "&H2&", "&I2,"")
Cell O2 =IF(AND(NOT($L2),NOT($M2)),K2,IF(H2<>"",H2,""))
Cell P2 =IF(AND(NOT($L2),NOT($M2)),K2,I2)
Cell Q2 =IF(AND(NOT($L2),NOT($M2)),"",J2)
Cell R2 =IF(AND(NOT($L2),NOT($M2)),"",IF(K2<>"",K2,""))

如果正确,则使用现有数据,否则将不适当的数据移至适当的列。

进行备份。输入公式。检查、检查、检查。转换为值并删除坏列。

相关内容