文件名:hello1
cassandraDriver.contactPoints=10.65.4.203,10.65.4.220
cassandraDriver.port=9042
文件名:hello2
cassandraDriver.contactPoints=10.65.4.203
cassandraDriver.port=80
onprem.cassandra.contactPoints= 10.135.83.48
cassandraDriver.port=8080
onprem.cassandra.contactPoints:10.5.14.20
预期输出:
host port filename
10.65.4.203 9042 hello1,hello2
10.65.4.220 9042 hello1
10.135.83.48 8080 hello2
10.5.14.20 hello2
相同的主机和端口不应重复。
下面是我编写的用于拉取主机和端口的脚本
#!/bin/bash
stty -echo
if [[ $# -ne 1 ]]; then
echo "\nPlease call '$0 <repo name>' to run this command!\n"
exit 1
fi
#echo "Cloning the repository $1"
git clone ww.abc.com
cd $1
#echo "Checking the files ... "
for file in $(git ls-files);
do
#echo " --$file -- ";
grep -P '((?<=[^0-9.]|^)[1-9][0-9]{0,2}(\.([0-9]{0,3})){3}(?=[^0-9.]|$)|(http|ftp|https|ftps|sftp)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?|\.port|\.host|contact-points|contactPoints|\.uri)' $file | grep '^[^#]' | awk '{split($0,a,"="); print a[2]}';
done | awk '!_[$0]++'
#echo "Done."
stty echo
cd ..
#rm -rf $1
谁能帮助我获得预期的输出?
答案1
您的要求尚不清楚,但这可能是您想要做的(使用 GNU awk 表示数组的数组):
$ cat tst.awk
BEGIN { FS="[[:space:]]*[,:][[:space:]]*"; OFS="\t" }
{ sub(/^[^=:]*[=:][[:space:]]*/,"") }
NR%2 { split($0,hosts); next }
{
for (i in hosts) {
host = hosts[i]
for (i=1; i<=NF; i++) {
exists[host][$i][FILENAME]
}
}
}
END {
print "host", "port", "filename"
for (host in exists) {
for (port in exists[host]) {
printf "%s%s%s%s", host, OFS, port, OFS
sep = ""
for (filename in exists[host][port]) {
printf "%s%s", sep, filename
sep = ","
}
print ""
}
}
}
。
$ awk -f tst.awk hello1 hello2
host port filename
10.12.17.18 8934 hello2
10.5.14.20 1234 hello1,hello2
10.5.14.20 8934 hello2
10.5.67.8 8934 hello2
10.11.12.203 1234 hello1
。
$ awk -f tst.awk hello1 hello2 | column -s$'\t' -t
host port filename
10.12.17.18 8934 hello2
10.5.14.20 1234 hello1,hello2
10.5.14.20 8934 hello2
10.5.67.8 8934 hello2
10.11.12.203 1234 hello1