我有一个逗号分隔的字符串,其中可能包含带逗号的引号元素。例如:
issuer=C = US, O = "DigiCert, Inc.", CN = DigiCert High Assurance TLS Hybrid ECC SHA256 2020 CA1
我想提取不同的元素,忽略引用的逗号(DigiCert, Inc.
)。
该脚本应符合 POSIX 标准并在非 GNU 系统上运行。
答案1
鉴于你不想一个通用的解决方案,即您正在寻找一种破解方案并且不希望有一个强大的解决方案,这看起来相当破解,但会产生正确的输出,至少如果您给出的示例输入是您可以合理地处理的最复杂的情况遇到:
#!/usr/bin/env bash
set -o posix
grep '^[[:blank:]]*Issuer:' |
sed -Ee 's/^.* O[[:blank:]]*=[[:blank:]]*("[^"]*"|[^",]*),.*/\1/'
即使作为一名黑客,我也确信如果有人需要的话,这可以得到改进。
上面的代码几乎符合 POSIX 标准,并且在我的非 GNU 系统上运行。
$ grep -w Issuer: /usr/local/etc/ssl/cert.pem | head -5; \
echo '...'; grep -w Issuer: /usr/local/etc/ssl/cert.pem | tail -5
Issuer: C = ES, O = FNMT-RCM, OU = AC RAIZ FNMT-RCM
Issuer: C = ES, O = FNMT-RCM, OU = Ceres, organizationIdentifier = VATES-Q2826004J, CN = AC RAIZ FNMT-RCM SERVIDORES SEGUROS
Issuer: CN = ACCVRAIZ1, OU = PKIACCV, O = ACCV, C = ES
Issuer: C = IT, L = Milan, O = Actalis S.p.A./03358520967, CN = Actalis Authentication Root CA
Issuer: C = US, O = AffirmTrust, CN = AffirmTrust Commercial
...
Issuer: C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust ECC Certification Authority
Issuer: C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority
Issuer: C = US, O = "VeriSign, Inc.", OU = VeriSign Trust Network, OU = "(c) 1999 VeriSign, Inc. - For authorized use only", CN = VeriSign Class 1 Public Primary Certification Authority - G3
Issuer: C = US, O = "VeriSign, Inc.", OU = VeriSign Trust Network, OU = "(c) 1999 VeriSign, Inc. - For authorized use only", CN = VeriSign Class 2 Public Primary Certification Authority - G3
Issuer: C = US, OU = www.xrampsecurity.com, O = XRamp Security Services Inc, CN = XRamp Global Certification Authority
$ ./test.sh < /usr/local/etc/ssl/cert.pem | head -5; \
echo '...'; ./test.sh < /usr/local/etc/ssl/cert.pem | tail -5
FNMT-RCM
FNMT-RCM
ACCV
Actalis S.p.A./03358520967
AffirmTrust
...
The USERTRUST Network
The USERTRUST Network
"VeriSign, Inc."
"VeriSign, Inc."
XRamp Security Services Inc