如何删除除字母数字模式之外的所有内容?

如何删除除字母数字模式之外的所有内容?

我的文件也有很多垃圾和特殊字符。我想保留特定的字母数字模式并忽略其他所有内容 - 例如 AB123456789 - 我只想提取这个关键字,即两个字母“AB”后跟 9 个数字。

样本输入:

[{"u_affected_cis":"m324nkj43nkj3n4kj34n","number":"hhggjjiiijjjf","akdsfj_skdfj":"","as_group":"1,324kj3k4j3k4jk34","order":"","__status":"成功","阶段":"gfhgh","cmdb_ci":"0989iujlkj","u_benefit_organization":"","u_creating_group":"luiy98798yukuh","work_notes_list":"","优先级":"4","u_tier4_location":" ","re​​view_date":"","u_mf_batch_inst_opdoc_move":"","u_requesting_group":"kjhljlkjhlkuh098709kjh","business_duration":"","number":"AB123456789","re​​quested_by":tgfgtf878789khgo7869876ff9007da158c","u_temp","change_plan":"","asd_def":"2023-02-10 11:58:21","implementation_plan":"","short_description":"数据","u_alternate_programmer_work_number":"","work_start":"","u_assignment_group_updated":"","yy_uhggfjk":"","fds":"change_request","close_by":"abcdef","start_date": “2023-02-10”}]

样本输出:

AB123456789

答案1

如果您的实际输入是有效的 JSON,那么使用支持 JSON 的工具会更好,例如 jq:

jq -r '.[0].number'.

(我说“如果”,因为您发布的输入不是有效的 JSON,因为它缺少双引号,并且其中一个键没有附加值;我认为损坏可能是在您准备问题时发生的。 )

答案2

一些 sed 应该完成这项工作:

sed -e '/AB[0-9]\{9\}/!d' -e 's/.*\(AB[0-9]\{9\}\).*/\1/'

答案3

如果您的文件始终具有相同数量的字段并且您的模式出现在相同位置(例如:第 72 列),您可以使用简单的 awk:

awk -F "\"" '{print $72}' input-file.txt

模式匹配似乎不适合您,因为文件开头具有相同的模式 (AF123456789)。

我希望这个答案对你有帮助。

答案4

我创建了这些文件来复制您正在做的较小规模的事情:

┌─[root@Fedora]─[~/stack_exchange]─[03:38 pm]
└─[$]› ls
1234fnjfck   CA123456789      EA123456789  HA123456789  KA123456789   NA123456789  QA123456789  TA123456789              VA123456789  YA123456789
AA123456789  DA123456789      FA123456789  IA123456789  LA123456789  OA123456789  RA123456789  testing-please-delete-me  WA123456789  ZA123456789
BA123456789  DELETE1234  GA123456789  JA123456789  MA123456789  PA123456789  SA123456789  UA123456789              XA123456789

与模式匹配的正则表达式变量将能够使用 if 语句处理在 for in 循环中拉取与模式不匹配的所需文件:

┌─[root@Fedora]─[~/stack_exchange]─[04:07 pm]
└─[$]› pattern="^[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$"

[$]› for i in $(ls ~/stack_exchange); do if ! [[ $i =~ $pattern ]]; then echo "$i does not match!"; fi; done
1234fnjfck does not match!
DELETE1234 does not match!
testing-please-delete-me does not match!

因此要删除它们:

[$]› for i in $(ls ~/stack_exchange); do if ! [[ $i =~ $pattern ]]; then rm -f $i; fi; done

结果:

[$]› ls
AA123456789  CA123456789  EA123456789  GA123456789  IA123456789  KA123456789  MA123456789  OA123456789  QA123456789  SA123456789  UA123456789  WA123456789  YA123456789
BA123456789  DA123456789  FA123456789  HA123456789  JA123456789  LA123456789  NA123456789  PA123456789  RA123456789  TA123456789  VA123456789  XA123456789  ZA123456789

相关内容