我的文件也有很多垃圾和特殊字符。我想保留特定的字母数字模式并忽略其他所有内容 - 例如 AB123456789 - 我只想提取这个关键字,即两个字母“AB”后跟 9 个数字。
样本输入:
[{"u_affected_cis":"m324nkj43nkj3n4kj34n","number":"hhggjjiiijjjf","akdsfj_skdfj":"","as_group":"1,324kj3k4j3k4jk34","order":"","__status":"成功","阶段":"gfhgh","cmdb_ci":"0989iujlkj","u_benefit_organization":"","u_creating_group":"luiy98798yukuh","work_notes_list":"","优先级":"4","u_tier4_location":" ","review_date":"","u_mf_batch_inst_opdoc_move":"","u_requesting_group":"kjhljlkjhlkuh098709kjh","business_duration":"","number":"AB123456789","requested_by":tgfgtf878789khgo7869876ff9007da158c","u_temp","change_plan":"","asd_def":"2023-02-10 11:58:21","implementation_plan":"","short_description":"数据","u_alternate_programmer_work_number":"","work_start":"","u_assignment_group_updated":"","yy_uhggfjk":"","fds":"change_request","close_by":"abcdef","start_date": “2023-02-10”}]
样本输出:
AB123456789
答案1
如果您的实际输入是有效的 JSON,那么使用支持 JSON 的工具会更好,例如 jq:
jq -r '.[0].number'.
(我说“如果”,因为您发布的输入不是有效的 JSON,因为它缺少双引号,并且其中一个键没有附加值;我认为损坏可能是在您准备问题时发生的。 )
答案2
一些 sed 应该完成这项工作:
sed -e '/AB[0-9]\{9\}/!d' -e 's/.*\(AB[0-9]\{9\}\).*/\1/'
答案3
如果您的文件始终具有相同数量的字段并且您的模式出现在相同位置(例如:第 72 列),您可以使用简单的 awk:
awk -F "\"" '{print $72}' input-file.txt
模式匹配似乎不适合您,因为文件开头具有相同的模式 (AF123456789)。
我希望这个答案对你有帮助。
答案4
我创建了这些文件来复制您正在做的较小规模的事情:
┌─[root@Fedora]─[~/stack_exchange]─[03:38 pm]
└─[$]› ls
1234fnjfck CA123456789 EA123456789 HA123456789 KA123456789 NA123456789 QA123456789 TA123456789 VA123456789 YA123456789
AA123456789 DA123456789 FA123456789 IA123456789 LA123456789 OA123456789 RA123456789 testing-please-delete-me WA123456789 ZA123456789
BA123456789 DELETE1234 GA123456789 JA123456789 MA123456789 PA123456789 SA123456789 UA123456789 XA123456789
与模式匹配的正则表达式变量将能够使用 if 语句处理在 for in 循环中拉取与模式不匹配的所需文件:
┌─[root@Fedora]─[~/stack_exchange]─[04:07 pm]
└─[$]› pattern="^[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$"
[$]› for i in $(ls ~/stack_exchange); do if ! [[ $i =~ $pattern ]]; then echo "$i does not match!"; fi; done
1234fnjfck does not match!
DELETE1234 does not match!
testing-please-delete-me does not match!
因此要删除它们:
[$]› for i in $(ls ~/stack_exchange); do if ! [[ $i =~ $pattern ]]; then rm -f $i; fi; done
结果:
[$]› ls
AA123456789 CA123456789 EA123456789 GA123456789 IA123456789 KA123456789 MA123456789 OA123456789 QA123456789 SA123456789 UA123456789 WA123456789 YA123456789
BA123456789 DA123456789 FA123456789 HA123456789 JA123456789 LA123456789 NA123456789 PA123456789 RA123456789 TA123456789 VA123456789 XA123456789 ZA123456789