现在我有一些以下格式的纵向数据:
ID | year_1b | 年_2 | 变量_1b | 变量_2 | ... |
---|---|---|---|---|---|
10001 | 2018 | 2020 | 1.1 | 1.2 | ... |
10002 | 2018 | 2020 | 0.3 | 0.5 | ... |
10002 | 2018 | 2020 | 1.8 | 2.6 | ... |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋱ |
为了在STATA中分析数据,我想将其重新组织成以下样式:
ID | 年 | 多变的 | ... |
---|---|---|---|
10001 | 2018 | 1.1 | ... |
10002 | 2018 | 0.3 | ... |
10002 | 2018 | 1.8 | ... |
⋮ | ⋮ | ⋮ | ⋮ |
10001 | 2020 | 1.2 | ... |
10002 | 2020 | 0.5 | ... |
10002 | 2020 | 2.6 | ... |
我在上面的例子中使用了一个变量。除 和 外year
,id
我有 1410 列这种“_1b”和“_2”格式,还有 1000 行。这些是 2018 年(变量名称以 _1b 结尾)和 2020 年(以 _2 结尾)向 1000 名受访者提出的 705 个调查问题(705 * 2 = 1410)。
是否有任何点击或 VBA 技巧可以用来快速重新组织数据?
非常感谢你的帮助!
答案1
可以在 Power Query 中使用以下方法完成此操作:使用 Power Query
- 在数据表中选择一些单元格
Data => Get&Transform => from Table/Range
或者from within sheet
- 当 PQ 编辑器打开时:
Home => Advanced Editor
- 记下表格姓名在第 2 行
- 将下面的 M 代码粘贴到您所看到的位置
- 将第 2 行的表名改回最初生成的表名。
- 阅读评论并探索
Applied Steps
以了解算法
let
//Change next line to reflect actual data source
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQwMDBU0lEyMjC0AFNGBkDKUM8QTBopxepA1BhhqDHQMwaTpnjUGOqBOXpmSrGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, year_1b = _t, year_2 = _t, Variable_1b = _t, Variable_2 = _t]),
//You can let PQ detect the types automatically
//Select all the columns, then Transform=>Detect Data Types
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"id", Int64.Type}, {"year_1b", Int64.Type}, {"year_2", Int64.Type}, {"Variable_1b", type number}, {"Variable_2", type number}}),
//Select the "id" column and unpivot the rest
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"id"}, "Attribute", "Value"),
//Since Variable and Year have matching suffixes, we can split on the underscore
// and then Group by the portion after the underscore
#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByDelimiter("_", QuoteStyle.Csv), {"Attribute.1", "Attribute.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Attribute.1", type text}, {"Attribute.2", type text}}),
//Pivot each subtable
//FillDown the "year" column and delete the rows with null in the Variable column
#"Grouped Rows" = Table.Group(#"Changed Type1", {"Attribute.2"}, {
{"Pivot", (t)=>
Table.SelectRows(
Table.FillDown(
Table.RemoveColumns(
Table.Pivot(
Table.AddIndexColumn(t,"Index"),
List.Distinct(t[Attribute.1],Comparer.OrdinalIgnoreCase),"Attribute.1","Value"),
{"Attribute.2","Index"}),
{"year"}),
each [Variable] <> null), type table[id=Int64.Type, year=Int64.Type, Variable=number]
}}),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Attribute.2"}),
#"Expanded Pivot" = Table.ExpandTableColumn(#"Removed Columns", "Pivot", {"id", "year", "Variable"})
in
#"Expanded Pivot"