如何使用AWK选择URL中的文件名?

如何使用AWK选择URL中的文件名?

我有一个以这种方式运行的 awk 脚本。

原始数据文本:

date:
  1.0.1: http://example.com/1.0.1.tgz
  1.0.2: http://example.com/1.0.2.tgz
  1.0.3: http://example.com/1.0.3.tgz
  1.0.4: http://example.com/1.0.4.tgz
  1.0.5: http://example.com/1.0.5.tgz
  1.0.6: http://example.com/1.0.6.tgz
  1.0.7: http://example.com/1.0.7.tgz
  1.0.8: http://example.com/1.0.8.tgz
  1.0.9: http://example.com/1.0.9.tgz
  1.0.10: http://example.com/1.0.10.tgz

通过 awk 转换为 HTML 表单:

<table>
    <thead>
        <tr>
            <th>ver</th>
            <th>link</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>1.0.1</td>
            <td><a href="http://example.com/1.0.1.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.2</td>
            <td><a href="http://example.com/1.0.2.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.3</td>
            <td><a href="http://example.com/1.0.3.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.4</td>
            <td><a href="http://example.com/1.0.4.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.5</td>
            <td><a href="http://example.com/1.0.5.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.6</td>
            <td><a href="http://example.com/1.0.6.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.7</td>
            <td><a href="http://example.com/1.0.7.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.8</td>
            <td><a href="http://example.com/1.0.8.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.9</td>
            <td><a href="http://example.com/1.0.9.tgz">download</a></td>
        </tr>
        <tr>
            <td>1.0.10</td>
            <td><a href="http://example.com/1.0.10.tgz">download</a></td>
        </tr>
    </tbody>
</table>

我想用链接文件名替换表单中的“下载”文本。我应该如何修改呢?以下是现有的awk脚本代码。

#!/usr/bin/env awk

BEGIN {
    print "<table>"
    print "\t<thead>"
    print "\t\t<tr>"
    print "\t\t\t<th>ver</th>"
    print "\t\t\t<th>link</th>"
    print "\t\t</tr>"
    print "\t</thead>"
    print "\t<tbody>"
}

match($0, /^ +(.*): (.*)$/, r) {
    print "\t\t<tr>"
    printf "\t\t\t<td>%s</td>\n", r[1]
    printf "\t\t\t<td><a href=\"%s\">download</a></td>\n", r[2]
    print "\t\t</tr>"
}

END {
    print "\t</tbody>"
    print "</table>"
}

我是一个初学者,希望得到大家的帮助。任何有用的建议,提前谢谢您!

答案1

在第 18 行尝试这个。

printf "\t\t\t<td><a href=\"%s\">%s.tgz</a></td>\n", r[2], r[1]

答案2

只需在 match() 正则表达式中创建第三个捕获组来保存文件名,然后将其打印在适当的行上:

$ cat tst.awk
BEGIN {
    print "<table>"
    print "\t<thead>"
    print "\t\t<tr>"
    print "\t\t\t<th>ver</th>"
    print "\t\t\t<th>link</th>"
    print "\t\t</tr>"
    print "\t</thead>"
    print "\t<tbody>"
}

match($0, /^ +(.*): (.*\/([^/]+))$/, r) {
    print "\t\t<tr>"
    printf "\t\t\t<td>%s</td>\n", r[1]
    printf "\t\t\t<td><a href=\"%s\">%s</a></td>\n", r[2], r[3]
    print "\t\t</tr>"
}

END {
    print "\t</tbody>"
    print "</table>"
}

$ awk -f tst.awk data.text
<table>
        <thead>
                <tr>
                        <th>ver</th>
                        <th>link</th>
                </tr>
        </thead>
        <tbody>
                <tr>
                        <td>1.0.1</td>
                        <td><a href="http://example.com/1.0.1.tgz">1.0.1.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.2</td>
                        <td><a href="http://example.com/1.0.2.tgz">1.0.2.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.3</td>
                        <td><a href="http://example.com/1.0.3.tgz">1.0.3.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.4</td>
                        <td><a href="http://example.com/1.0.4.tgz">1.0.4.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.5</td>
                        <td><a href="http://example.com/1.0.5.tgz">1.0.5.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.6</td>
                        <td><a href="http://example.com/1.0.6.tgz">1.0.6.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.7</td>
                        <td><a href="http://example.com/1.0.7.tgz">1.0.7.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.8</td>
                        <td><a href="http://example.com/1.0.8.tgz">1.0.8.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.9</td>
                        <td><a href="http://example.com/1.0.9.tgz">1.0.9.tgz</a></td>
                </tr>
                <tr>
                        <td>1.0.10</td>
                        <td><a href="http://example.com/1.0.10.tgz">1.0.10.tgz</a></td>
                </tr>
        </tbody>
</table>

相关内容