busybox tar 或将大文件拆分为多个较小的文件

busybox tar 或将大文件拆分为多个较小的文件

我在 at91sam9g20 上运行 Linux 版本 3.4.8。

我想获取一个大记录并将其拆分为多个文件。我尝试了多种方法,但似乎都无法正常工作,例如

tar -c -M --tape-length=102400 --file=disk1.tar mytest.tar.g
z
tar: invalid option -- M
BusyBox v1.20.2 (2012-09-24 16:21:25 CEST) multi-call binary.

Usage: tar -[cxthvO] [-X FILE] [-T FILE] [-f TARFILE] [-C DIR] [FILE]...

Create, extract, or list files from a tar file

Operation:
        c       Create
        x       Extract
        t       List
        f       Name of TARFILE ('-' for stdin/out)
        C       Change to DIR before operation
        v       Verbose
        O       Extract to stdout
        h       Follow symlinks
        exclude File to exclude
        X       File with names to exclude
        T       File with names to include

busybox 似乎有一个 tar 的精简版本,不允许某些参数。

当我尝试拆分时,我得到以下信息:

/:# split -sh: split: 未找到

有没有使用 busybox 命令集将大文件拆分为多个文件的方法?

Currently defined functions:
        [, [[, addgroup, adduser, ar, arping, ash, awk, basename, blkid,
        bunzip2, bzcat, cat, catv, chattr, chgrp, chmod, chown, chroot, chrt,
        chvt, cksum, clear, cmp, cp, cpio, crond, crontab, cut, date, dc, dd,
        deallocvt, delgroup, deluser, devmem, df, diff, dirname, dmesg, dnsd,
        dnsdomainname, dos2unix, du, dumpkmap, echo, egrep, eject, env,
        ether-wake, expr, false, fdflush, fdformat, fgrep, find, fold, free,
        freeramdisk, fsck, fuser, getopt, getty, grep, gunzip, gzip, halt,
        hdparm, head, hexdump, hostid, hostname, hwclock, id, ifconfig, ifdown,
        ifup, inetd, init, insmod, install, ip, ipaddr, ipcrm, ipcs, iplink,
        iproute, iprule, iptunnel, kill, killall, killall5, klogd, last, less,
        linux32, linux64, linuxrc, ln, loadfont, loadkmap, logger, login,
        logname, losetup, ls, lsattr, lsmod, lsof, lspci, lsusb, lzcat, lzma,
        makedevs, md5sum, mdev, mesg, microcom, mkdir, mkfifo, mknod, mkswap,
        mktemp, modprobe, more, mount, mountpoint, mt, mv, nameif, netstat,
        nice, nohup, nslookup, od, openvt, passwd, patch, pidof, ping,
        pipe_progress, pivot_root, poweroff, printenv, printf, ps, pwd, rdate,
        readlink, readprofile, realpath, reboot, renice, reset, resize, rm,
        rmdir, rmmod, route, run-parts, runlevel, sed, seq, setarch,
        setconsole, setkeycodes, setlogcons, setserial, setsid, sh, sha1sum,
        sha256sum, sha512sum, sleep, sort, start-stop-daemon, strings, stty,
        su, sulogin, swapoff, swapon, switch_root, sync, sysctl, syslogd, tail,
        tar, tee, telnet, test, tftp, time, top, touch, tr, traceroute, true,
        tty, udhcpc, umount, uname, uniq, unix2dos, unlzma, unxz, unzip,
        uptime, usleep, uudecode, uuencode, vconfig, vi, vlock, watch,
        watchdog, wc, wget, which, who, whoami, xargs, xz, xzcat, yes, zcat

答案1

您可以使用 busybox 的dd小程序及其bs,countskip参数将大文件拆分为多个块。

dd联机帮助页部分来自busybox

dd [if=FILE] [of=FILE] [ibs=N] [obs=N] [bs=N] [count=N] [skip=N]
[seek=N] [conv=notrunc|noerror|sync|同步]

       Copy a file with converting and formatting

               if=FILE         Read from FILE instead of stdin
               of=FILE         Write to FILE instead of stdout
               bs=N            Read and write N bytes at a time
               ibs=N           Read N bytes at a time
               obs=N           Write N bytes at a time
               count=N         Copy only N input blocks
               skip=N          Skip N input blocks
               seek=N          Skip N output blocks
               conv=notrunc    Don't truncate output file
               conv=noerror    Continue after read errors
               conv=sync       Pad blocks with zeros
               conv=fsync      Physically write data out before finishing

所以基本上你会做这样的事情:

$ dd if=bigfile of=part.0 bs=1024 count=1024 skip=0
$ dd if=bigfile of=part.1 bs=1024 count=1024 skip=1024
$ dd if=bigfile of=part.2 bs=1024 count=1024 skip=2048

对于每个part.X文件dd写入count * bs bytes忽略skip输入文件中的第一个字节。

一个非常基本的单行代码(结合了sedbusybox中的xargsddapplet)可能如下所示:

seq 0 19 | xargs -n1 sh -c 'dd if=bigfile of=part.$0 bs=1024 count=1024 skip=$(expr $0 \* 1024)'

生成part.X最多20 个1048576 bytes大小的文件。

分割示例bigfile

$ ls -l
total 2940
-rw-rw-r-- 1 user user 3000000 Apr 27 13:21 bigfile

$ seq 0 20 | xargs -n1 sh -c 'dd if=bigfile of=part.$0 bs=1024 count=1024 skip=$(expr $0 \* 1024)'
1024+0 records in
1024+0 records out
1024+0 records in
1024+0 records out
881+1 records in
881+1 records out
0+0 records in
0+0 records out
[...]

$ ls -l
total 5968
-rw-rw-r-- 1 user user 3000000 Apr 27 13:21 bigfile
-rw-rw-r-- 1 user user 1048576 Apr 27 13:43 part.0
-rw-rw-r-- 1 user user 1048576 Apr 27 13:43 part.1
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.10
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.11
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.12
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.13
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.14
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.15
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.16
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.17
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.18
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.19
-rw-rw-r-- 1 user user  902848 Apr 27 13:43 part.2
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.3
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.4
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.5
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.6
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.7
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.8
-rw-rw-r-- 1 user user       0 Apr 27 13:43 part.9

可以轻松完成恢复cat(或dd再次使用seek参数)。 0字节文件可以被跳过:

$ cat part.0 part.1 part.2 > bigfile.res
$ diff bigfile bigfile.res

根据您的需要,您不应该使用seq和计算大文件的具体大小并在 shell 脚本中完成所有操作。

答案2

我相信,您可以为基于 ARM 的微控制器下载静态链接的 busybox 二进制文件:http://www.busybox.net/downloads/binaries/latest/

据我所知,那里提供的所有二进制文件都包括此处描述的所有实用程序:http://www.busybox.net/downloads/BusyBox.html

因此,您将能够以非常简单的方式使用 split 实用程序。

相关内容