如何输出文本文件中最常见的字符串？

Question

这是我在 Powershell 中想到的。请告诉我你的想法

 $database = Get-content -Path c:\temp\database.txt
 $MyArrayList = New-Object -TypeName "System.Collections.ArrayList"

 foreach($line in $database){
 $flag = $false
 [Int32]$OutNumber = $null

     if ($line -match "database" -or [String]::IsNullOrWhiteSpace($line)) {
         continue

     }
     else {


          if([Int32]::TryParse($line.Substring(0,1),[ref]$OutNumber)) {
          $tmp = $line.Substring(2).trim()
          $MyArrayList.Add($tmp)
          $flag = $true

      } 



        if($line -match 'Number') {
        $tmp = $line.Substring($line.IndexOf(":")+1).trim()
        $MyArrayList.Add($tmp)
        $flag = $true


      }

       if ($flag -eq $false) {
          $MyArrayList.Add($line)

       }



   }

  }

 $MyArrayList   | Group-Object

这是我的输出

计数名称组
----- ---- -----
1 本关于 abc 的书。{关于 abc 的书。}
2 本关于 abc 的论文。{关于 abc 的论文。，关于 abc 的论文。}
3 本关于 xyz 的书。{关于 xyz 的书。，关于 xyz 的书。，关于 xyz 的书。}
2 本关于 abc 的文章。{关于 abc 的文章。，关于 abc 的文章。}
1 本关于 xyz 的书包括... {包括关于 xyz 的书。}
1 本关于 xyz 的文章包括... {包括关于 xyz 的文章。}

Answer 1

这是我在 Powershell 中想到的。请告诉我你的想法

 $database = Get-content -Path c:\temp\database.txt
 $MyArrayList = New-Object -TypeName "System.Collections.ArrayList"

 foreach($line in $database){
 $flag = $false
 [Int32]$OutNumber = $null

     if ($line -match "database" -or [String]::IsNullOrWhiteSpace($line)) {
         continue

     }
     else {


          if([Int32]::TryParse($line.Substring(0,1),[ref]$OutNumber)) {
          $tmp = $line.Substring(2).trim()
          $MyArrayList.Add($tmp)
          $flag = $true

      } 



        if($line -match 'Number') {
        $tmp = $line.Substring($line.IndexOf(":")+1).trim()
        $MyArrayList.Add($tmp)
        $flag = $true


      }

       if ($flag -eq $false) {
          $MyArrayList.Add($line)

       }



   }

  }

 $MyArrayList   | Group-Object

这是我的输出

计数名称组
----- ---- -----
1 本关于 abc 的书。{关于 abc 的书。}
2 本关于 abc 的论文。{关于 abc 的论文。，关于 abc 的论文。}
3 本关于 xyz 的书。{关于 xyz 的书。，关于 xyz 的书。，关于 xyz 的书。}
2 本关于 abc 的文章。{关于 abc 的文章。，关于 abc 的文章。}
1 本关于 xyz 的书包括... {包括关于 xyz 的书。}
1 本关于 xyz 的文章包括... {包括关于 xyz 的文章。}

如何输出文本文件中最常见的字符串？

答案1

相关内容