如何查看超过10GB的日志文件？

Question 1

该脚本会将文本文件拆分为给定数量的部分，避免跨部分拆分文本行。它可用于一次只有足够空间容纳一个部分的地方。它的操作方式是从末尾开始复制源文件的各个部分，然后截断源文件以释放空间。因此，如果您有 1.8GB 的文件和 0.5GB 的可用空间，则需要使用 4 个部分（如果您希望输出文件更小，则需要使用更多部分）。最后一部分只是重命名，因为不需要复制它。分割后，源文件不再存在（反正也没有空间了）。

主要部分是一个 awk 脚本（用 Bash 包装），它仅设置部分大小（包括调整部分与换行符一致）。它使用 system() 函数来调用 dd、truncate 和 mv 来完成所有繁重的工作。

$ bash --version
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
$ dd --version
dd (coreutils) 8.28
$ truncate --version
truncate (GNU coreutils) 8.28

该脚本采用一到四个参数：

./splitBig Source nSect Dest Debug

Source: is the filename of the file to be split into sections.

nSect: is the number of sections required (default 10).

Dest: is a printf() format used to generate the names of the sections.
Default is Source.%.3d, which appends serial numbers (from .001 up) to the source name.
Section numbers correspond to the original order of the source file.

Debug: generates some diagnostics (default is none).

检测结果：

$ mkdir TestDir
$ cd TestDir
$ 
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:54 leipzig1M.txt
$ 
$ time ../splitBig leipzig1M.txt 5

real    0m0.780s
user    0m0.045s
sys 0m0.727s
$ ls -s -l
total 126620
25324 -rw-rw-r-- 1 paul paul 25928991 Aug 27 15:56 leipzig1M.txt.001
25324 -rw-rw-r-- 1 paul paul 25929019 Aug 27 15:56 leipzig1M.txt.002
25324 -rw-rw-r-- 1 paul paul 25928954 Aug 27 15:56 leipzig1M.txt.003
25324 -rw-rw-r-- 1 paul paul 25928977 Aug 27 15:56 leipzig1M.txt.004
25324 -rw-rw-r-- 1 paul paul 25928856 Aug 27 15:56 leipzig1M.txt.005
$ 
$ rm lei*
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:57 leipzig1M.txt
$ time ../splitBig leipzig1M.txt 3 "Tuesday.%1d.log" 1
.... Section   3 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=86429864 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=86430023 iflag=skip_bytes of="Tuesday.3.log" status=none
#.. system: truncate -s 86430023 "leipzig1M.txt"
.... Section   2 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=43214932 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=43214997 iflag=skip_bytes of="Tuesday.2.log" status=none
#.. system: truncate -s 43214997 "leipzig1M.txt"
.... Section   1 ....
#.. system: mv "leipzig1M.txt" "Tuesday.1.log"

real    0m0.628s
user    0m0.025s
sys 0m0.591s
$ ls -s -l
total 126612
42204 -rw-rw-r-- 1 paul paul 43214997 Aug 27 15:58 Tuesday.1.log
42204 -rw-rw-r-- 1 paul paul 43215026 Aug 27 15:58 Tuesday.2.log
42204 -rw-rw-r-- 1 paul paul 43214774 Aug 27 15:58 Tuesday.3.log
$

脚本：

#! /bin/bash --

LC_ALL="C"

splitFile () {  #:: (inFile, Pieces, outFmt, Debug)

    local inFile="${1}" Pieces="${2}" outFmt="${3}" Debug="${4}"

    local Awk='
BEGIN {
    SQ = "\042"; szLine = 8192; szFile = "128M";
    fmtLine = "dd bs=%d count=1 if=%s skip=%d iflag=skip_bytes status=none";
    fmtFile = "dd bs=%s if=%s skip=%d iflag=skip_bytes of=%s status=none";
    fmtClip = "truncate -s %d %s";
    fmtName = "mv %s %s";
}

function findNl (fIn, Seek, Local, cmd, lth, txt) {

    cmd = sprintf (fmtLine, szLine, SQ fIn SQ, Seek);
    if (Db) printf ("#.. findNl: %s\n", cmd);
    cmd | getline txt; close (cmd);
    lth = length (txt);
    if (lth == szLine) printf ("#### Line at %d will be split\n", Seek);
    return ((lth == szLine) ? Seek : Seek + lth + 1);
}

function Split (fIn, Size, Pieces, fmtOut, Local, n, seek, cmd) {

    for (n = Pieces; n > 1; n--) {
        if (Db) printf (".... Section %3d ....\n", n);
        seek = int (Size * ((n - 1) / Pieces));
        seek = findNl( fIn, seek);
        cmd = sprintf (fmtFile, szFile, SQ fIn SQ, seek,
            SQ sprintf (outFmt, n) SQ);
        if (Db) printf ("#.. system: %s\n", cmd);
        system (cmd);
        cmd = sprintf (fmtClip, seek, SQ fIn SQ);
        if (Db) printf ("#.. system: %s\n", cmd);
        system (cmd);
    }
    if (Db) printf (".... Section %3d ....\n", n);
    cmd = sprintf (fmtName, SQ fIn SQ, SQ sprintf (outFmt, n) SQ);
    if (Db) printf ("#.. system: %s\n", cmd);
    system (cmd);
}

{ Split( inFile, $1, Pieces, outFmt); }
'
    stat -L -c "%s" "${inFile}" | awk -v inFile="${inFile}" \
        -v Pieces="${Pieces}" -v outFmt="${outFmt}" \
        -v Db="${Debug}" -f <( printf '%s' "${Awk}" )
}

#### Script body starts here.

    splitFile "${1}" "${2:-10}" "${3:-${1}.%.3d}" "${4}"

Answer

该脚本会将文本文件拆分为给定数量的部分，避免跨部分拆分文本行。它可用于一次只有足够空间容纳一个部分的地方。它的操作方式是从末尾开始复制源文件的各个部分，然后截断源文件以释放空间。因此，如果您有 1.8GB 的文件和 0.5GB 的可用空间，则需要使用 4 个部分（如果您希望输出文件更小，则需要使用更多部分）。最后一部分只是重命名，因为不需要复制它。分割后，源文件不再存在（反正也没有空间了）。

主要部分是一个 awk 脚本（用 Bash 包装），它仅设置部分大小（包括调整部分与换行符一致）。它使用 system() 函数来调用 dd、truncate 和 mv 来完成所有繁重的工作。

$ bash --version
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
$ awk --version
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
$ dd --version
dd (coreutils) 8.28
$ truncate --version
truncate (GNU coreutils) 8.28

该脚本采用一到四个参数：

./splitBig Source nSect Dest Debug

Source: is the filename of the file to be split into sections.

nSect: is the number of sections required (default 10).

Dest: is a printf() format used to generate the names of the sections.
Default is Source.%.3d, which appends serial numbers (from .001 up) to the source name.
Section numbers correspond to the original order of the source file.

Debug: generates some diagnostics (default is none).

检测结果：

$ mkdir TestDir
$ cd TestDir
$ 
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:54 leipzig1M.txt
$ 
$ time ../splitBig leipzig1M.txt 5

real    0m0.780s
user    0m0.045s
sys 0m0.727s
$ ls -s -l
total 126620
25324 -rw-rw-r-- 1 paul paul 25928991 Aug 27 15:56 leipzig1M.txt.001
25324 -rw-rw-r-- 1 paul paul 25929019 Aug 27 15:56 leipzig1M.txt.002
25324 -rw-rw-r-- 1 paul paul 25928954 Aug 27 15:56 leipzig1M.txt.003
25324 -rw-rw-r-- 1 paul paul 25928977 Aug 27 15:56 leipzig1M.txt.004
25324 -rw-rw-r-- 1 paul paul 25928856 Aug 27 15:56 leipzig1M.txt.005
$ 
$ rm lei*
$ cp /home/paul/leipzig1M.txt ./
$ ls -s -l
total 126608
126608 -rw-rw-r-- 1 paul paul 129644797 Aug 27 15:57 leipzig1M.txt
$ time ../splitBig leipzig1M.txt 3 "Tuesday.%1d.log" 1
.... Section   3 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=86429864 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=86430023 iflag=skip_bytes of="Tuesday.3.log" status=none
#.. system: truncate -s 86430023 "leipzig1M.txt"
.... Section   2 ....
#.. findNl: dd bs=8192 count=1 if="leipzig1M.txt" skip=43214932 iflag=skip_bytes status=none
#.. system: dd bs=128M if="leipzig1M.txt" skip=43214997 iflag=skip_bytes of="Tuesday.2.log" status=none
#.. system: truncate -s 43214997 "leipzig1M.txt"
.... Section   1 ....
#.. system: mv "leipzig1M.txt" "Tuesday.1.log"

real    0m0.628s
user    0m0.025s
sys 0m0.591s
$ ls -s -l
total 126612
42204 -rw-rw-r-- 1 paul paul 43214997 Aug 27 15:58 Tuesday.1.log
42204 -rw-rw-r-- 1 paul paul 43215026 Aug 27 15:58 Tuesday.2.log
42204 -rw-rw-r-- 1 paul paul 43214774 Aug 27 15:58 Tuesday.3.log
$

脚本：

#! /bin/bash --

LC_ALL="C"

splitFile () {  #:: (inFile, Pieces, outFmt, Debug)

    local inFile="${1}" Pieces="${2}" outFmt="${3}" Debug="${4}"

    local Awk='
BEGIN {
    SQ = "\042"; szLine = 8192; szFile = "128M";
    fmtLine = "dd bs=%d count=1 if=%s skip=%d iflag=skip_bytes status=none";
    fmtFile = "dd bs=%s if=%s skip=%d iflag=skip_bytes of=%s status=none";
    fmtClip = "truncate -s %d %s";
    fmtName = "mv %s %s";
}

function findNl (fIn, Seek, Local, cmd, lth, txt) {

    cmd = sprintf (fmtLine, szLine, SQ fIn SQ, Seek);
    if (Db) printf ("#.. findNl: %s\n", cmd);
    cmd | getline txt; close (cmd);
    lth = length (txt);
    if (lth == szLine) printf ("#### Line at %d will be split\n", Seek);
    return ((lth == szLine) ? Seek : Seek + lth + 1);
}

function Split (fIn, Size, Pieces, fmtOut, Local, n, seek, cmd) {

    for (n = Pieces; n > 1; n--) {
        if (Db) printf (".... Section %3d ....\n", n);
        seek = int (Size * ((n - 1) / Pieces));
        seek = findNl( fIn, seek);
        cmd = sprintf (fmtFile, szFile, SQ fIn SQ, seek,
            SQ sprintf (outFmt, n) SQ);
        if (Db) printf ("#.. system: %s\n", cmd);
        system (cmd);
        cmd = sprintf (fmtClip, seek, SQ fIn SQ);
        if (Db) printf ("#.. system: %s\n", cmd);
        system (cmd);
    }
    if (Db) printf (".... Section %3d ....\n", n);
    cmd = sprintf (fmtName, SQ fIn SQ, SQ sprintf (outFmt, n) SQ);
    if (Db) printf ("#.. system: %s\n", cmd);
    system (cmd);
}

{ Split( inFile, $1, Pieces, outFmt); }
'
    stat -L -c "%s" "${inFile}" | awk -v inFile="${inFile}" \
        -v Pieces="${Pieces}" -v outFmt="${outFmt}" \
        -v Db="${Debug}" -f <( printf '%s' "${Awk}" )
}

#### Script body starts here.

    splitFile "${1}" "${2:-10}" "${3:-${1}.%.3d}" "${4}"

Question 2

我根本不清楚你想要实现什么目标。据我所知，您所问的问题在问题标题中，“如何查看超过10GB的日志文件？“而问题本身仅包含您自己认为可能有效的想法和想法。

因此，回答我能找到的唯一问题，一种选择是使用寻呼机，例如less

less 10GBlogfile

文档 ( man less) 最终列出了您可以使用的按键，或者一旦您启动它，您就可以使用它h来获取帮助 - 按键和相关操作的列表。对于初学者，G将转到最后一行，光标键（包括 PageUp、PageDown）将四处移动，/将搜索 RE 字符串，n/N将搜索下一个/上一个匹配项，q并将退出寻呼机

Answer

我根本不清楚你想要实现什么目标。据我所知，您所问的问题在问题标题中，“如何查看超过10GB的日志文件？“而问题本身仅包含您自己认为可能有效的想法和想法。

因此，回答我能找到的唯一问题，一种选择是使用寻呼机，例如less

less 10GBlogfile

文档 ( man less) 最终列出了您可以使用的按键，或者一旦您启动它，您就可以使用它h来获取帮助 - 按键和相关操作的列表。对于初学者，G将转到最后一行，光标键（包括 PageUp、PageDown）将四处移动，/将搜索 RE 字符串，n/N将搜索下一个/上一个匹配项，q并将退出寻呼机

Question 3

我知道您已经尝试过split -n 20 - 但您是否想过split -n 20 --filter 'grep <whatever> or something' 这会将原始文件拆分为多个组件，并将它们分别通过管道传递给您想要的任何命令。

分割的成本应该不会很高——尤其是如果你这么做的话split --bytes=100M——它基本上只是一个seek和读/写。但是，我不确定它如何处理可变长度编码，例如 UTF-8。如果您知道数据是 ASCII，那么它就非常安全。否则，你最好做类似的事情split --line-bytes=<size>- 但这将不得不解析更多的数据，这可能会很昂贵。

Answer

我知道您已经尝试过split -n 20 - 但您是否想过split -n 20 --filter 'grep <whatever> or something' 这会将原始文件拆分为多个组件，并将它们分别通过管道传递给您想要的任何命令。

分割的成本应该不会很高——尤其是如果你这么做的话split --bytes=100M——它基本上只是一个seek和读/写。但是，我不确定它如何处理可变长度编码，例如 UTF-8。如果您知道数据是 ASCII，那么它就非常安全。否则，你最好做类似的事情split --line-bytes=<size>- 但这将不得不解析更多的数据，这可能会很昂贵。

Question 4

您似乎正在尝试从日志中提取并分析错误。该问题没有通用的答案 - 如何隔离与日志文件中特定模式相关的事件完全取决于日志文件的结构以及生成日志文件的事物的性质。

您不知道特定用户的日志有多长

是否有显式（用户名）或隐式（会话 ID、进程 ID、IP 地址）标识符？如果不是，听起来您需要一个，然后必须多次遍历日志文件才能：

识别错误实例、时间戳和用户标识符
捕获周围的非错误事件

Answer

您似乎正在尝试从日志中提取并分析错误。该问题没有通用的答案 - 如何隔离与日志文件中特定模式相关的事件完全取决于日志文件的结构以及生成日志文件的事物的性质。

您不知道特定用户的日志有多长

是否有显式（用户名）或隐式（会话 ID、进程 ID、IP 地址）标识符？如果不是，听起来您需要一个，然后必须多次遍历日志文件才能：

识别错误实例、时间戳和用户标识符
捕获周围的非错误事件

如何查看超过10GB的日志文件？

答案1

答案2

答案3

答案4

相关内容