我正在寻找一种解决方案,用于比较两个带有文本的单元格,并仅提取两个单元格中匹配的部分。例如:
单元格 A1:“昨天我吃了一个苹果” 单元格 A2:“今天我吃了一个橘子”
我怎样才能将“我吃了一个”提取到另一个单元格中?
是否有人有一个我可以插入 VBA 的用户定义函数,或者是否有一种使用现有函数来执行此操作的奇特方法?
答案1
更新现在包括一个忽略单词列表。
WordSequence :在进行基本的单词解析后,根据单词的字符数寻找最长的匹配单词序列。返回在第一个参数中出现的序列。
用法:WordSequence(String1,String2 [,caseInsensitive As Boolean])
例 1 - OP 样本数据如下IgnoredWords = "a an"
。
=WordSequence("Yesterday I ate an apple", "today I ate an orange")
结果:I ate
示例 2 - 相同的示例数据,但IgnoredWords = ""
对代码进行了编辑(没有忽略的单词)。
=WordSequence("Yesterday I ate an apple", "today I ate an orange")
结果:I ate an
示例 3 - 结果可能根据分配给 String1 参数的字符串而有所不同。。B1
:
C2 显示字符串输入顺序对返回值的影响。B3
:C4 演示caseInsensitive := TRUE
(和输入顺序返回值)。
注意:示例 3IgnoredWords = "a an"
如下所示。在这种情况下,即使一个单元格或另一个单元格中缺少a
或,也会发生相同的匹配。此外,当或出现在 String1 中并且包含在匹配序列中时,返回结果将包括或。示例 1 的结果还说明不会出现在匹配单词序列的开头或结尾。这是因为它们不匹配任何内容,因为它们被忽略了。对于任何指定的标点符号也是如此。 an
a
an
IgnoredWords
' WordSequence (String1, String2 [, caseInsensitive As Boolean])
'
' Compares two String arguments for the longest common sequence
' of words. Longest is defined by the number of characters in each
' matched word in the sequence.
'
' Make the comparison case insensitive with an optional third
' argument set to True.
'
' Return the sequence of words from String1 along with inclusive
' text (white space, ignored words and specified punctuation).
'
' Authored by Ted Dillard
'
' Spaces, Tabs and the WordBoundary characters delimit the words
' in the comparison; only whole word matches are returned.
'
' Punctuation Indifference: Two mechanisms regulate matching based on
' specified punctuation characters. The matched sequence in String1
' is returned with any inclusive punctuation. Spaces, tabs and the
' defined punctuations are not counted when comparing match lengths.
' Any punctuation not specified in one of these two constants is
' treated as a character of the word (like the apostrophe in "I'll")
'
' Ignored Words: A list of words to ignore in matching. These words'
' lengths are not counted towards the longest match. Theses words
' will only be returned if they are between two matched words
' in String1, in which case they will be returned even if not in
' String2, because they were ignored.
'
Option Explicit
'
' IgnoredWords is a String of space separated words to ignore. Punctuation
' not listed in WordBoundary or IgnoredPunctuation can be in the words.
Private Const IgnoredWords = "a an" ' "" empty string ok
'
' (Two consecutive double quotes inside a quoted string puts one
' double quote into the string value.)
'
' WordBoundary characters are word delimiters and ignored in matching.
Private Const wordBoundary = "(){}[]:;<>,.?!"""
'
' IgnoredPunctuation are deleted before parsing words (ignored in matching).
' Therefore "'.-" means "we're"="were" and "123.456.7890"="123-456-7890"
Private Const IgnoredPunctuation = "-"
'
' WhiteSpace characters are used as word boundaries along with spaces.
Private Const WhiteSpace = vbTab & vbNewLine & vbLf & vbCr & vbCrLf
'
'-------------------------------------------------------------------------
' END of User Configurable Settings - the rest is code
'
Private Enum WordList
matchOnWords ' list first =0 also missing Optional Long parameter value
NonDeletion
AllPartials
End Enum
Private Type LongestMatch
start As Long
items As Long
End Type
Public Function WordSequence(ByVal sentence As String, ByVal sentence2 _
As String, Optional caseInsensitive As Boolean) As String
Dim matchOnWords() As String
Dim matchInfo As LongestMatch
'Optional Booleans are False by default. If Case_Insensitive is True the
'search will not be case sensitive. The default search is case sensitive.
'To make default case insensitive, change parameter name to caseSensitve
'and edit If condition to "Not caseSensitive"
WordSequence = sentence
If caseInsensitive Then
sentence = LCase(sentence)
sentence2 = LCase(sentence2)
End If
matchOnWords = getWords(sentence)
matchInfo = getLongestMatch(matchOnWords, getWords(sentence2))
If matchInfo.start = -1 Or matchInfo.items = 0 Then
WordSequence = ""
Else
WordSequence = getMatchedString(sentence, WordSequence, matchOnWords, _
matchInfo.start, matchInfo.items)
End If
End Function
Private Function getMatchedString(ByVal sentence As String, _
ByVal original As String, ByRef matchOnWords() As String, _
ByVal start As Long, ByVal items As Long) As String
Dim allPartialWords() As String
Dim allWords() As String
Dim begun As Boolean, ignoredWord As Boolean, wordBoundary As Boolean
Dim w As Long, i As Long, j As Long
' word list where all specified punctuations are used as word boundaries,
' [1] to deal with punctuation in the actual word but not the match words
allPartialWords = getWords(sentence, AllPartials)
allWords = getWords(sentence, NonDeletion) 'deals with ignored words
begun = False
ignoredWord = False
wordBoundary = True
i = 0: j = 0
For w = 0 To UBound(allPartialWords)
' make the beginning of the sentence be the beginning location of the
' first occurrence in the sentence where the current iterative word
' list element (partial word) is located removing preceding spaces,
' tabs and punctuation characters defined in punctuation constants.
sentence = Mid(sentence, InStr(sentence, allPartialWords(w)))
If Not begun And i = start Then 'Beginning of words match
begun = True
' delete any characters at the beginning of the original sentence
' that have already been removed from the sentence variable
original = Mid(original, Len(original) - Len(sentence) + 1)
End If
' remove the current partial word from the beginning of the sentence
sentence = Mid(sentence, Len(allPartialWords(w)) + 1)
If wordBoundary Then
' is entirety of all_word in ignored word list
If InStr(" " & IgnoredWords & " ", " " & allWords(j) & " ") Then
ignoredWord = True
End If
End If
' also remove from the beginning of all_word & match_word, along with
' [1] preceding ignored characters inclusive to this matchOnWord.
allWords(j) = Mid(allWords(j), InStr(allWords(j), _
allPartialWords(w)) + Len(allPartialWords(w)))
' ignored words are not part of match_on_words' list
If Not ignoredWord Then
matchOnWords(i) = Mid(matchOnWords(i), InStr(matchOnWords(i), _
allPartialWords(w)) + Len(allPartialWords(w)))
End If
If allWords(j) = "" Then ' all_word is consumed
wordBoundary = True ' no longer in a word,
ignoredWord = False ' so can no longer be in an ignored word
j = j + 1
Else ' part of the word is consumed, next time through dont test the
wordBoundary = False ' remaing part against ignored word list
End If
If matchOnWords(i) = "" Then ' match_on_word is consumed
i = i + 1 ' advance match_on_word iterator to next matched word
If begun Then
items = items - 1 'consumed word, decrement matched items count
If items = 0 Then ' consumed all matched words.
' original already starts at beginning of match.
' sentence had all matched partial words removed.
' remove rest of sentence characters from return value.
getMatchedString = Mid(original, 1, _
Len(original) - Len(sentence))
Exit Function
End If
End If
End If
Next w
getMatchedString = ""
End Function
Private Function getLongestMatch(ByRef words1() As String, _
ByRef words2() As String) As LongestMatch
Dim largestCharCnt As Long
largestCharCnt = 0
getLongestMatch.start = -1
getLongestMatch.items = 0
Dim i1 As Long, i2 As Long, i As Long, l As Long
For i1 = 0 To UBound(words1)
For i2 = 0 To UBound(words2)
If words1(i1) = words2(i2) Then
l = Len(words1(i1))
i = 1
Do While i1 + i <= UBound(words1)
If i2 + i > UBound(words2) Then Exit Do
If words1(i1 + i) <> words2(i2 + i) Then Exit Do
l = l + Len(words1(i1 + i))
i = i + 1
Loop
If l > largestCharCnt Then
largestCharCnt = l
getLongestMatch.start = i1
getLongestMatch.items = i
End If
End If
Next i2
Next i1
End Function
Private Function getWords(ByVal sentence As String, _
Optional listType As WordList) As String()
sentence = replaceChars(sentence, WhiteSpace, " ")
sentence = replaceChars(sentence, wordBoundary, " ")
If listType = matchOnWords Or listType = NonDeletion Then
sentence = replaceChars(sentence, IgnoredPunctuation, "")
Else ' listType = AllPartials
sentence = replaceChars(sentence, IgnoredPunctuation, " ")
End If
If listType = matchOnWords Then
' start & end as well for space delimiter-brackets' match
sentence = " " & sentence & " "
Dim w As Variant
' only match whole word using space delimiter-brackets' match
For Each w In Split(IgnoredWords)
sentence = Replace(sentence, " " & w & " ", " ")
Next w
End If
Do While InStr(sentence, " ") <> 0
sentence = Replace(sentence, " ", " ")
Loop
sentence = Trim(sentence)
getWords = Split(sentence)
End Function
Private Function replaceChars(ByVal source As String, ByVal chars As String, _
ByVal replacement As String) As String
Dim c As Long
For c = 1 To Len(chars)
source = Replace(source, Mid(chars, c, 1), replacement)
Next c
replaceChars = source
End Function
答案2
答案3
我想建议两个不同的公式,将提取“我吃了一个“来自单元格中的文本字符串。
公式A2、A3 和 A4, 在哪里“我吃了一个”,前面只有一个词。
=TRIM(MID(A2,FIND(CHAR(1),SUBSTITUTE(A2," ",CHAR(1),1))+1,FIND(CHAR(1),SUBSTITUTE(A2," ",CHAR(1),4))-FIND(CHAR(1),SUBSTITUTE(A2," ",CHAR(1),2))+2))
公式A7, 在哪里“我吃了一个”,前面有两个词。
=TRIM(MID(A7,FIND(CHAR(1),SUBSTITUTE(A7," ",CHAR(1),1))+6,FIND(CHAR(1),SUBSTITUTE(A7," ",CHAR(1),3))-FIND(CHAR(1),SUBSTITUTE(A7," ",CHAR(1),1))+2))
怎么运行的:
FIND(CHAR(1),SUBSTITUTE(A2," ",CHAR(1),4))
返回搜索字符串中 19
的Start Position
我吃了一个。
FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),4))-FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),2))+1
返回8
长度 我吃了一个(包括空格)。
FIND(CHAR(1),SUBSTITUTE(A2," ,CHAR(1),2))+2
返回14
并获取我吃了一个苹果。
FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),4))-FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),2))+2
返回9
减去14-9 = 5
、删除的我吃了一个从苹果,
得出公式:
=MID(C7,FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),1))+1,8)
返回,我吃了一个预期的答案。
編輯:
这个即兴的公式是提取文本字符串,我吃了一个从所有可能的组合如下所示。
=TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),MAX(99,FIND(" I",SUBSTITUTE(A1," ",REPT(" ",99)))-50),299))
怎么运行的:
- 公式在文本字符串中的单词之间插入大量空格,查找并提取预期的子字符串,TRIM 函数清除多余的空格。
SUBSTITUTE(A1," ",REPT(" ",99))
将每个单个空格替换为 99 个空格。
笔记, 99
只是一个任意数字,代表您需要提取的最长单词。
FIND(" I",SUBSTITUTE(A1," ",REPT(" ",99)))-50
FIND 返回 的位置
" I"
。减法
50
可以有效地将位置设置在感兴趣的子字符串前面的空格的中间。
笔记,在此公式中,计算的位置是366
。
以起始位置为单位,
MID
用于提取文本中的字符,从`中的文本99
开始366
A1
,再次填充空间。MAX
处理子字符串在文本中首先出现的问题,这里位置为负,并将MAX
其重置为1
。
注意:
- 根据需要调整公式中的单元格引用。