tar 使用管道删除文件而不提取:正确的语法是什么?

tar 使用管道删除文件而不提取:正确的语法是什么?

我正在使用 GNU tar 来处理 tar(docker 镜像层)以修改其中的一些 jar。我正在做:

  • 将图像保存为 tar 到磁盘
  • 提取它,所以我把每一层放在一个目录中
  • 进入每一层,我有一个layer.tar,一个json和一个VERSION
  • 迭代所有*/*.jar文件layer.tar,尝试找到一些类文件
  • 如果我找到它们,则提取具有文件树结构的 jar,从中删除类文件,然后将其放回layer.tar覆盖原始 jar
  • 将每一层打包回新的 tar,使用 docker 加载并稍后推送(尚未完成)

我为此创建了一个脚本,它几乎可以完成工作,但是有两个 jar 并排放置,一个包含要删除的类,另一个不包含。

#!/bin/bash

# tar needs find to package without ".". u for update, c for create
function pack_all_without_period() {
    find $1 -printf "%P\n" -type f -o -type l -o -type d | sudo tar -$3vf $2 --no-recursion -C $1 -T -
}

if [ -z $1 ]; then
    printf "Save the image as tar, extract, and enter each layer to remove the vulnerable classes(JMSAppender/SocketServer/SimpleSocketServer)\nPlease provide the image name. \n"
    exit 1
fi
dir="log4j-1.x-fix"
image_tar=amq-image-to-fix.tar
if [ ! -d $dir ]; then 
    mkdir $dir
fi
# save image to tar
docker save $1 -o $image_tar
# extract tar
tar xf $image_tar -C $dir
# each layer is extracted to a folder, each folder has a "layer.tar". 
# Go into each folder, extract `layer.tar`, and use `jar` to remove the classes
# and package them back to `layer.tar` (-a to append), and delete the extracted folders.
# at last, package all layers + manifest.json and so back into another tar, WITHOUT COMPRESSION
cd $dir
# enter layer and exit
for layer in */; do
    echo Processing layer $layer
    cd $layer
    # tar does not support overwrite, as tape cannot be overwritten; so I wanted to remove the original jar from tar, 
    # then append it back with tar -u/-A/-r; but then I found tar --delete is extremely slow(by design)
    # so at last I have to extract all files and package them back
    mkdir temp
    sudo tar --extract --directory=temp --file layer.tar --wildcards "*.jar"   # file tree is preserved, so package them back is easy
    if [[ $? -eq 0 ]]; then 
        for f in $(find . -mindepth 2 -name "*.jar" -not -type l -printf "%P\n"); do # exclude jolokia.jar(link)
            sudo jar -tvf $f | grep -E "(*JMSAppender*.class|*SocketServer.class|*log4j*.class)"
            if [[ $? -eq 0 ]]; then
                echo Found classes in $f
                read -p "Do you want to remove these classes? (Y/N) " option
                if [[ $option == 'Y' || $option == 'y' ]]; then
                    echo Removing class file from $f
                    sudo zip -d $f "*JMSAppender.class" "*SocketServer.class" "*SimpleSocketServer.class"
                    ######### here I need to delete the original jar with the classes I just deleted, but I don't know how ############
                else continue
                fi
            else
                continue
            fi

        done
        # append folders to tar, without leading "."
        echo Appending modified folders to layer.tar anew
        pack_all_without_period temp layer.tar r
    fi
    sudo rm -r $(find . -maxdepth 1 -mindepth 1 -type d -print)
    cd .. # back to $dir
done
cd ..

# tar will always include a folder "." as root. This function get rid of it, so the archive
# only contains the content of the folder
# compress will preserve ownership and group by default; and to extract while preserving the same info,
# we use '--same-owner', which is used by default when using sudo. 
# again, append all layers and files to new tar, without leading "."
echo after processing all layers, we are at $(pwd)
pack_all_without_period $dir amq-image-fixed.tar c
sudo rm -Irv $dir $image_tar




但我发现:

  1. tar只能附加,不能覆盖。所以我做了修改,先删除原来的 jar,layer.tar然后再附加
  2. 然后我发现tar --delete some/path/foo.tar不适用于tar --file xxx --delete path-to-jar。GNU tar 文档声称--delete在 stdin 和 stdout 管道中有效(https://www.gnu.org/software/tar/manual/html_node/delete.html) 但正确的语法是什么?我试过这些,但没有用:
    sudo tar tf layer.tar $f | sudo tar --delete #not deleting
    sudo tar xf layer.tar --exclude $f | sudo tar cf layer.tar -T -  # create tar of size 0

其他一些注意事项:

  • 我不想提取所有文件,因为每层都包含我/usr不想/boot处理的文件。我的 jar 基本上都在以下左右/opt(不是 100% 确定)
  • 我需要保留所有权/时间戳等。这就是我使用的原因sudo(但不确定这是否能实现我的目的)

我使用这样的脚本:

./remove-log4j-1.x-classes.sh registry.access.redhat.com/jboss-amq-6/amq63-openshift:1.4-44.1638430186

请帮忙,谢谢!

编辑:我现在尝试:

tar tf layer.tar -O | tar f - --delete $f > layer-new.tar

或者

zcat -f layer.tar | tar f - --delete $f > layer-new.tar

但是我失败了,错误如下:

tar: opt/amq/lib/optional/log4j-1.2.17.redhat-1.jar: Not found in archive
tar: Exiting with failure status due to previous errors

答案1

现在检查 tar 的版本后:

$ tar --version
tar (GNU tar) 1.29
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

我去了 GNU Tar 页面并下载了最新版本,现在是 1.34

https://ftp.gnu.org/gnu/tar/tar-latest.tar.gz

组织得非常好的仓库,因为它还包含 下的测试/tests。在这里我找到了几个以 开头的测试用例delete,并且在 中delete02.at,我找到了正确的语法(并且毫不奇怪,它说从 stdin 中删除带有存档的成员无法正常工作。实际上它适用于 tar 1.29 和 1.34,因此您可以跳过 1.34 的安装):


# Deleting a member with the archive from stdin was not working correctly.

AT_SETUP([deleting a member from stdin archive])
AT_KEYWORDS([delete delete02])

AT_TAR_CHECK([
genfile -l 3073 -p zeros --file 1
cp 1 2
cp 2 3
tar cf archive 1 2 3
tar tf archive
cat archive | tar f - --delete 2 > archive2
echo separator
tar tf archive2],
[0],
[1
2
3
separator
1
3
])

AT_CLEANUP

因此,现在的语法是:

cat tar_archive | tar f - --delete <filename_to_delete> > another_archive

您使用cat获取 tarball 的内容,将 ( |) 管道传输到tar其自身,并将来自 stdin 的文件处理 ( -,现在是 的管道cat),然后删除并重定向 ( >) 到另一个文件。此后,您可以将此新文件重命名为要替换的原始存档名称。但是,您无法就地编辑。

如果要安装它,请使用./configure && sudo make && sudo make install。奇怪的是,它并没有替换tar 1.29下的/bin,而是安装在 中/usr/local/bin/tar

因此现在完整的脚本是:

#!/bin/bash

tar=/usr/local/bin/tar # or tar=/bin/tar, the syntax is the same

# tar needs find to package without ".". u for update, c for create
function pack_all_without_period() {
    find $1 -printf "%P\n" -type f -o -type l -o -type d | sudo $tar -$3f $2 --no-recursion -C $1 -T -
}

if [ -z $1 ]; then
    printf "Save the image as tar, extract, and enter each layer to remove the vulnerable classes(JMSAppender/SocketServer/SimpleSocketServer)\nPlease provide the image name. \n"
    exit 1
fi
dir="fix"
image_tar=amq-image-to-fix.tar
if [ ! -d $dir ]; then 
    mkdir $dir
fi
# save image to tar
docker save $1 -o $image_tar
# extract tar
$tar xf $image_tar -C $dir
# each layer is extracted to a folder, each folder has a "layer.tar". 
# Go into each folder, extract `layer.tar`, and use `jar` to remove the classes
# and package them back to `layer.tar` (-a to append), and delete the extracted folders.
# at last, package all layers + manifest.json and so back into another tar, WITHOUT COMPRESSION
cd $dir
# enter layer and exit
for layer in */; do
    echo Processing layer $layer
    cd $layer
    # tar does not support overwrite, as tape cannot be overwritten; so I wanted to remove the original jar from tar, 
    # then append it back with tar -u/-A/-r; but then I found tar --delete is extremely slow(by design)
    # so at last I have to extract all files and package them back
    sudo $tar --extract --directory=. --file layer.tar --wildcards "*.jar"   # file tree is preserved, so package them back is easy
    if [[ $? -eq 0 ]]; then 
        for f in $(find . -mindepth 1 -name "*.jar" -not -type l -printf "%P\n"); do # exclude jolokia.jar(link)
            sudo jar -tvf $f | grep -E "(*JMSAppender*.class|*SocketServer.class|*log4j*.class)"
            if [[ $? -eq 0 ]]; then
                echo Found classes in $f
                read -p "Do you want to remove these classes? (Y/N) " option
                if [[ $option == 'Y' || $option == 'y' ]]; then
                    echo Removing class file from $f
                    sudo zip -d $f "*JMSAppender.class" "*SocketServer.class" "*SimpleSocketServer.class"
                    ######### here the correct syntax, finally #########
                    cat layer.tar | tar f - --delete $f > layer-new.tar
                    sudo mv layer-new.tar layer.tar
                    tar -rf layer.tar $f
                else continue
                fi
            else
                continue
            fi

        done
        
        sudo rm -r $(find . -maxdepth 1 -mindepth 1 -type d -print)
    fi
    cd .. # back to $dir
done

cd ..

# tar will always include a folder "." as root. This function get rid of it, so the archive
# only contains the content of the folder
# compress will preserve ownership and group by default; and to extract while preserving the same info,
# we use '--same-owner', which is used by default when using sudo. 
# again, append all layers and files to new tar, without leading "."
echo after processing all layers, we are at $(pwd)
pack_all_without_period $dir amq-image-fixed.tar c
sudo rm -Irv $dir $image_tar

相关内容