从两个单元格中提取部分匹配的文本

从两个单元格中提取部分匹配的文本

我正在寻找一种解决方案,用于比较两个带有文本的单元格,并仅提取两个单元格中匹配的部分。例如:

单元格 A1:“昨天我吃了一个苹果” 单元格 A2:“今天我吃了一个橘子”

我怎样才能将“我吃了一个”提取到另一个单元格中?

是否有人有一个我可以插入 VBA 的用户定义函数,或者是否有一种使用现有函数来执行此操作的奇特方法?

答案1

更新现在包括一个忽略单词列表。

WordSequence :在进行基本的单词解析后,根据单词的字符数寻找最长的匹配单词序列。返回在第一个参数中出现的序列。

用法:WordSequence(String1,String2 [,caseInsensitive As Boolean])

例 1 - OP 样本数据如下IgnoredWords = "a an"
=WordSequence("Yesterday I ate an apple", "today I ate an orange")
结果:I ate

示例 2 - 相同的示例数据,但IgnoredWords = ""对代码进行了编辑(没有忽略的单词)。
=WordSequence("Yesterday I ate an apple", "today I ate an orange")
结果:I ate an

示例 3 - 结果可能根据分配给 String1 参数的字符串而有所不同。。B1
: 电子表格中使用的 WordSequence 公式
C2 显示字符串输入顺序对返回值的影响。B3
:C4 演示caseInsensitive := TRUE(和输入顺序返回值)。

注意:示例 3IgnoredWords = "a an"如下所示。在这种情况下,即使一个单元格或另一个单元格中缺少a或,也会发生相同的匹配。此外,当或出现在 String1 中并且包含在匹配序列中时,返回结果将包括或。示例 1 的结果还说明不会出现在匹配单词序列的开头或结尾。这是因为它们不匹配任何内容,因为它们被忽略了。对于任何指定的标点符号也是如此。 anaanIgnoredWords

' WordSequence (String1, String2 [, caseInsensitive As Boolean])
'
'    Compares two String arguments for the longest common sequence
'    of words. Longest is defined by the number of characters in each
'    matched word in the sequence.
'
'    Make the comparison case insensitive with an optional third
'    argument set to True.
'
'    Return the sequence of words from String1 along with inclusive
'       text (white space, ignored words and specified punctuation).
'
' Authored by Ted Dillard
'
' Spaces, Tabs and the WordBoundary characters delimit the words
'    in the comparison; only whole word matches are returned.
'
' Punctuation Indifference:  Two mechanisms regulate matching based on
'    specified punctuation characters. The matched sequence in String1
'    is returned with any inclusive punctuation. Spaces, tabs and the
'    defined punctuations are not counted when comparing match lengths.
'    Any punctuation not specified in one of these two constants is
'    treated as a character of the word (like the apostrophe in "I'll")
'
' Ignored Words: A list of words to ignore in matching. These words'
'    lengths are not counted towards the longest match. Theses words
'    will only be returned if they are between two matched words
'    in String1, in which case they will be returned even if not in
'    String2, because they were ignored.
'
Option Explicit
'
' IgnoredWords is a String of space separated words to ignore. Punctuation
'    not listed in WordBoundary or IgnoredPunctuation can be in the words.
Private Const IgnoredWords = "a an"  ' "" empty string ok
'
' (Two consecutive double quotes inside a quoted string puts one
'    double quote into the string value.)
'
' WordBoundary characters are word delimiters and ignored in matching.
Private Const wordBoundary = "(){}[]:;<>,.?!"""
'
' IgnoredPunctuation are deleted before parsing words (ignored in matching).
'    Therefore "'.-" means "we're"="were" and "123.456.7890"="123-456-7890"
Private Const IgnoredPunctuation = "-"
'
' WhiteSpace characters are used as word boundaries along with spaces.
Private Const WhiteSpace = vbTab & vbNewLine & vbLf & vbCr & vbCrLf
'
'-------------------------------------------------------------------------
' END of User Configurable Settings - the rest is code
'
Private Enum WordList
    matchOnWords ' list first =0 also missing Optional Long parameter value
    NonDeletion
    AllPartials
End Enum
Private Type LongestMatch
    start As Long
    items As Long
End Type

Public Function WordSequence(ByVal sentence As String, ByVal sentence2 _
        As String, Optional caseInsensitive As Boolean) As String
    Dim matchOnWords() As String
    Dim matchInfo As LongestMatch

    'Optional Booleans are False by default. If Case_Insensitive is True the
    'search will not be case sensitive. The default search is case sensitive.
    'To make default case insensitive, change parameter name to caseSensitve
    'and edit If condition to "Not caseSensitive"
    WordSequence = sentence
    If caseInsensitive Then
        sentence = LCase(sentence)
        sentence2 = LCase(sentence2)
    End If

    matchOnWords = getWords(sentence)
    matchInfo = getLongestMatch(matchOnWords, getWords(sentence2))
    If matchInfo.start = -1 Or matchInfo.items = 0 Then
        WordSequence = ""
    Else
        WordSequence = getMatchedString(sentence, WordSequence, matchOnWords, _
                matchInfo.start, matchInfo.items)
    End If
End Function

Private Function getMatchedString(ByVal sentence As String, _
        ByVal original As String, ByRef matchOnWords() As String, _
        ByVal start As Long, ByVal items As Long) As String
    Dim allPartialWords() As String
    Dim allWords() As String
    Dim begun As Boolean, ignoredWord As Boolean, wordBoundary As Boolean
    Dim w As Long, i  As Long, j As Long

    ' word list where all specified punctuations are used as word boundaries,
    ' [1] to deal with punctuation in the actual word but not the match words
    allPartialWords = getWords(sentence, AllPartials)
    allWords = getWords(sentence, NonDeletion) 'deals with ignored words
    begun = False
    ignoredWord = False
    wordBoundary = True
    i = 0: j = 0
    For w = 0 To UBound(allPartialWords)
        ' make the beginning of the sentence be the beginning location of the
        '    first occurrence in the sentence where the current iterative word
        '    list element (partial word) is located removing preceding spaces,
        '    tabs and punctuation characters defined in punctuation constants.
        sentence = Mid(sentence, InStr(sentence, allPartialWords(w)))
        If Not begun And i = start Then 'Beginning of words match
            begun = True
            ' delete any characters at the beginning of the original sentence
            '    that have already been removed from the sentence variable
            original = Mid(original, Len(original) - Len(sentence) + 1)
        End If
        ' remove the current partial word from the beginning of the sentence
        sentence = Mid(sentence, Len(allPartialWords(w)) + 1)

        If wordBoundary Then
            ' is entirety of all_word in ignored word list
            If InStr(" " & IgnoredWords & " ", " " & allWords(j) & " ") Then
                ignoredWord = True
            End If
        End If
        ' also remove from the beginning of all_word & match_word, along with
        ' [1] preceding ignored characters inclusive to this matchOnWord.
        allWords(j) = Mid(allWords(j), InStr(allWords(j), _
                allPartialWords(w)) + Len(allPartialWords(w)))
         ' ignored words are not part of match_on_words' list
        If Not ignoredWord Then
            matchOnWords(i) = Mid(matchOnWords(i), InStr(matchOnWords(i), _
                    allPartialWords(w)) + Len(allPartialWords(w)))
        End If
        If allWords(j) = "" Then ' all_word is consumed
            wordBoundary = True  ' no longer in a word,
            ignoredWord = False  ' so can no longer be in an ignored word
            j = j + 1
        Else ' part of the word is consumed, next time through dont test the
            wordBoundary = False ' remaing part against ignored word list
        End If

        If matchOnWords(i) = "" Then ' match_on_word is consumed
            i = i + 1 ' advance match_on_word iterator to next matched word
            If begun Then
                items = items - 1 'consumed word, decrement matched items count
                If items = 0 Then ' consumed all matched words.
                    ' original already starts at beginning of match.
                    ' sentence had all matched partial words removed.
                    ' remove rest of sentence characters from return value.
                    getMatchedString = Mid(original, 1, _
                            Len(original) - Len(sentence))
                    Exit Function
                End If
            End If
        End If
    Next w
    getMatchedString = ""
End Function

Private Function getLongestMatch(ByRef words1() As String, _
        ByRef words2() As String) As LongestMatch
    Dim largestCharCnt As Long

    largestCharCnt = 0
    getLongestMatch.start = -1
    getLongestMatch.items = 0
    Dim i1 As Long, i2 As Long, i As Long, l As Long
    For i1 = 0 To UBound(words1)
        For i2 = 0 To UBound(words2)
            If words1(i1) = words2(i2) Then
                l = Len(words1(i1))
                i = 1
                Do While i1 + i <= UBound(words1)
                    If i2 + i > UBound(words2) Then Exit Do
                    If words1(i1 + i) <> words2(i2 + i) Then Exit Do
                    l = l + Len(words1(i1 + i))
                    i = i + 1
                Loop
                If l > largestCharCnt Then
                    largestCharCnt = l
                    getLongestMatch.start = i1
                    getLongestMatch.items = i
                End If
            End If
        Next i2
    Next i1
End Function

Private Function getWords(ByVal sentence As String, _
        Optional listType As WordList) As String()

    sentence = replaceChars(sentence, WhiteSpace, " ")
    sentence = replaceChars(sentence, wordBoundary, " ")

    If listType = matchOnWords Or listType = NonDeletion Then
        sentence = replaceChars(sentence, IgnoredPunctuation, "")
    Else ' listType = AllPartials
        sentence = replaceChars(sentence, IgnoredPunctuation, " ")
    End If
    If listType = matchOnWords Then
        ' start & end as well for space delimiter-brackets' match
        sentence = " " & sentence & " "
        Dim w As Variant
        ' only match whole word using space delimiter-brackets' match
        For Each w In Split(IgnoredWords)
            sentence = Replace(sentence, " " & w & " ", " ")
        Next w
    End If
    Do While InStr(sentence, "  ") <> 0
        sentence = Replace(sentence, "  ", " ")
    Loop
    sentence = Trim(sentence)
    getWords = Split(sentence)
End Function

Private Function replaceChars(ByVal source As String, ByVal chars As String, _
            ByVal replacement As String) As String
    Dim c As Long
    For c = 1 To Len(chars)
        source = Replace(source, Mid(chars, c, 1), replacement)
    Next c
    replaceChars = source
End Function

答案2

在此处输入图片描述 在此处输入图片描述

怎么运行的:

  1. 在以下位置写入搜索字符串M25。
  2. 查找搜索字符串的长度N25。

    =LEN($M$25)

  3. 要查找搜索字符串的位置,请使用O25& 向下填充。

    =SEARCH($M$25,L25,1)

  4. 最后写下这个来提取搜索字符串P25并填写。

    =MID(L25,O25,$N$25)

注意:

  1. 上面显示的方法最简单,可以从任意长度的文本中提取搜索字符串。

  2. 此方法可用于提取任何字符串,如我所示速度快的汽车还。

  3. 使用搜索找到位置,最终使公式能够处理区分大小写的字符串。

答案3

在此处输入图片描述

我想建议两个不同的公式,将提取“我吃了一个“来自单元格中的文本字符串。

  • 公式A2、A3 和 A4, 在哪里“我吃了一个”,前面只有一个词。

    =TRIM(MID(A2,FIND(CHAR(1),SUBSTITUTE(A2," ",CHAR(1),1))+1,FIND(CHAR(1),SUBSTITUTE(A2," ",CHAR(1),4))-FIND(CHAR(1),SUBSTITUTE(A2," ",CHAR(1),2))+2))
    
  • 公式A7, 在哪里“我吃了一个”,前面有两个词。

    =TRIM(MID(A7,FIND(CHAR(1),SUBSTITUTE(A7," ",CHAR(1),1))+6,FIND(CHAR(1),SUBSTITUTE(A7," ",CHAR(1),3))-FIND(CHAR(1),SUBSTITUTE(A7," ",CHAR(1),1))+2))
    

怎么运行的:

FIND(CHAR(1),SUBSTITUTE(A2," ",CHAR(1),4))

返回搜索字符串中 19Start Position我吃了一个

FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),4))-FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),2))+1

返回8长度 我吃了一个(包括空格)。

FIND(CHAR(1),SUBSTITUTE(A2," ,CHAR(1),2))+2

返回14并获取我吃了一个苹果

FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),4))-FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),2))+2

返回9减去14-9 = 5、删除的我吃了一个苹果,

得出公式:

=MID(C7,FIND(CHAR(1),SUBSTITUTE(C7," ",CHAR(1),1))+1,8)

返回,我吃了一个预期的答案。

編輯:

这个即兴的公式是提取文本字符串,我吃了一个所有可能的组合如下所示。

在此处输入图片描述

=TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),MAX(99,FIND(" I",SUBSTITUTE(A1," ",REPT(" ",99)))-50),299))

怎么运行的:

  • 公式在文本字符串中的单词之间插入大量空格,查找并提取预期的子字符串,TRIM 函数清除多余的空格。

SUBSTITUTE(A1," ",REPT(" ",99))

将每个单个空格替换为 99 个空格。

笔记, 99只是一个任意数字,代表您需要提取的最长单词。

    FIND(" I",SUBSTITUTE(A1," ",REPT(" ",99)))-50
  • FIND 返回 的位置" I"

  • 减法50可以有效地将位置设置在感兴趣的子字符串前面的空格的中间。

笔记,在此公式中,计算的位置是366

  • 以起始位置为单位,MID用于提取文本中的字符,从`中的文本99开始366

  • A1,再次填充空间。

  • MAX处理子字符串在文本中首先出现的问题,这里位置为负,并将MAX其重置为1

注意:

  • 根据需要调整公式中的单元格引用。

相关内容