我正在尝试查找并识别具有匹配单词的相邻行。
例如:
| --Col A-- | -Col B- |
| That Car!! | [blank] |
| Blue Car$ | [blank] |
| Ponies^!_ | [blank] |
| Car Sales | [blank] |
| Detroit__ | [blank] |
| Banana^ | [blank] |
“那辆车!!”和“蓝色汽车$”是相邻的单元格,均包含单词“汽车”。
我不知道如何在 B 列中编写 =formula() 来识别匹配的单词。公式需要做的就是识别重复的单词(不区分大小写)并为第二行输出单词“BLANK”。
删除重复项将完成剩下的工作。
包含“汽车销售”的行将不会被识别。
我怎样才能做到这一点?
答案1
此 stackoverflow 答案包含一个自定义 VBA 函数来计算文本之间的相似度:https://stackoverflow.com/a/42657518/4721734
使用它,您可以编写类似以下公式:
=IF(Similarity(A2,A3)>=0.5,A3,"BLANK")
答案2
这是一个 Power Query 解决方案(它还将删除不需要的行)
Power Query 适用于 Windows Excel 2010+ 和 Office 365
- 在数据表中选择一些单元格
Data => Get&Transform => from Table/Range
- 当 PQ 编辑器打开时:
Home => Advanced Editor
- 记下表格姓名在第 2 行
- 将下面的 M 代码粘贴到您所看到的位置
- 将第 2 行的表名改回最初生成的表名。
- 阅读评论并探索
Applied Steps
以了解算法
M 代码
let
Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//remove the non-alphabet characters and split into List of Words
#"Added Custom" = Table.AddColumn(#"Changed Type", "Remove non-alpha characters",
each Text.Split(Text.Combine(
List.Accumulate(Text.ToList([Column1]),{},
(state,current)=> if List.Contains({"A".."Z"," ","a".."z"},current) then
List.Combine({state,{current}}) else state))," ")),
//Add a column representing the previous row
ShiftedList = {null} & List.RemoveLastN(Table.Column(#"Added Custom", "Remove non-alpha characters"),1),
Custom1 = Table.ToColumns(#"Added Custom") & {ShiftedList},
prevRow = Table.FromColumns(Custom1, Table.ColumnNames(#"Added Custom") & {"Previous Row"}),
//Mark "BLANK" if previous row has the same word in the list of words
#"Added Custom1" = Table.AddColumn(prevRow, "Blank", each
try if List.ContainsAny([#"Remove non-alpha characters"],[Previous Row],Comparer.OrdinalIgnoreCase)
then "BLANK" else null otherwise null),
//filter out the non-null rows
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Blank] = null)),
//Remove unneeded columns
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Remove non-alpha characters", "Previous Row", "Blank"})
in
#"Removed Columns"