解析固定宽度文件并根据 Oracle Db 查找这些解析值

Question

尝试像下面的脚本一样作为起点。

顺便说一句，你没有说你正在运行什么类型的 UNIX，你正在使用什么 shell，或者awk你有什么版本。我假设您正在运行 Linux（或安装了 GNU 核心实用程序的其他系统）、bash 和 GNU awk。还有信息压缩的版本unzip。如果这些假设不正确，您将必须调整 bash+awk 脚本以适合您的系统。

$ cat process-zip-files.sh 
#!/bin/bash

# create a temporary directory
# mktemp is in GNU coreutils
td="$(mktemp -d)"

for zf in *.zip; do
  # unzip options: -qq = very quiet, -o = don't prompt for overwrite,
  # -d = directory to unzip files into.
  unzip -qq -o -d "$td" "$zf" '*.[0-9][0-9][0-9]'
done

# Process each unzipped text file individually
# This awk script requires GNU awk.  The ENDFILE pattern
# is a GNU extension to awk.
awk '/^999 / { data[i++] = substr($0,20,34) };

     ENDFILE {
       out="";
       for (i in data) { out = out data[i] "," };
       sub(/,$/,"",out);
       print "(" out ")";
       delete data;
     }' "$td/"*

# delete the temporary directory and everything in it
rm -rf "$td/"

999请注意，该脚本假定每个文本文件中至少有一条记录。如果不是这种情况，您必须检查数组的 ENDFILE 块中是否至少有一个元素，否则它将仅针对这些文本文件data输出一行。()

这是一个最基本的脚本，不执行任何错误检查或处理，也不尝试处理异常或复杂的情况。

示例输出（创建包含示例文本的 zip 文件后）：

$ ./process-zip-files.sh 
(123456789012345,234567890123456)

更好的脚本将用perl.它将建立与 Oracle 数据库的连接（使用 open数据库接口和DBD::甲骨文。

然后它将打开今天的一批 zip 文件（使用存档::Zip模块）并处理其中的每个 .NNN 文本文件。使用文本文件中的数据，它将构造一个 SQL 语句并将其发送到 Oracle DB。

该语句可以是搜索、插入、更新、删除——任何通常可以使用 SQL 执行的操作。

python将是另一种很好的实现语言 - 它还有用于与 zip 文件和 Oracle 等数据库交互的库模块。

那里是直接从 bash 或其他 shell 与 Oracle 和其他 SQL 数据库进行交互的方法......但是您必须做的大量空格和引用使其成为令人烦恼的乏味和令人沮丧的编程练习，容易出现恼人的小问题。只需学习完成这项工作所需的 perl（或 python）的最小子集就容易多了......然后一旦完成一次，将来就可以轻松完成类似的任务。

现在已经快凌晨 2 点了，我需要睡觉，所以此时我什至没有时间编写此类脚本的基本版本。无论如何，您的问题并没有真正提供有关您想要如何处理 Oracle 中的数据的任何详细信息。

PS：从您在问题中发布的两个 shell 代码示例来看，您似乎喜欢用单行代码做事。俏皮话很有用，但它们并不总是问题的最佳解决方案......而且，它们通常是一个糟糕的解决方案。不要害怕用 awk 或 perl 等语言编写脚本 - 使用它们，无论是独立的还是作为 shell 脚本的一部分，这就是 unix 和 linux 的使用方式。

尝试在 shell 中进行数据处理，使用许多命令的长且复杂的管道，几乎肯定比在 awk 或 perl 中编写自定义工具更困难......并且 shell 管道很可能更加脆弱，因为肯定会慢几个数量级。对于小数据文件和简单的处理任务，性能可能并不重要。对于大量数据和/或复杂的处理，这可能意味着几秒钟和几个小时的运行时间之间的差异。

Archive::Zip下面是在 Perl 中使用and/or DBI&模块执行相同操作的几个示例DBD。这些 Perl 脚本不需要临时目录来将 .zip 文件解压缩到其中，因为它们直接从 .zip 存档中读取匹配的文件。

第一个示例只是复制了 bash + awk 脚本的功能：

$ cat process-zip-files.pl
#!/usr/bin/perl

use strict;

use Archive::Zip;

# First arg is the source directory. defaults to ./
my $dir = shift // '.';

foreach my $zf (glob "$dir/*.zip") {
  # open the zip file
  my $zip = Archive::Zip->new($zf);

  # get the list of files ending with a dot and at least one digit
  my @txt = grep { /\.\d+$/ } $zip->memberNames();

  # iterate over each matching filename
  foreach my $f (@txt) {
    my @data = ();

    # Iterate over each line of the file ($f).  This code is fine
    # for smallish files, but it would be better to use the
    # Archive::Zip::MemberRead module for large files to avoid
    # reading the entire file into memory at once.
    foreach (split /\n/, $zip->contents($f)) {
      if (m/^999\s/) {
        # perl substr offsets start at 0, not 1.  So the
        # next line grabs 15 chars, starting from char 20
        # and adds the string to the @data array.
        push @data, substr($_,19,15);
      }
    };

    # Now do something with the data from this file
    @data = map { "'$_'" } @data; # quote each element of @data
    print "(", join(",",@data), ")\n";

  }  # end of current member file
} # end of current zipfile

$ ./process-zip-files.pl 
('123456789012345','234567890123456')

您可以直接与数据库交互，而不仅仅是打印数据。我在这里只能给出模糊且相当无用的示例，因为我不知道您的数据库表结构是什么样的，或者您实际上想要如何处理从 .NNN 文件中提取的数据。

$ cat process-zip-files-sql.pl
#!/usr/bin/perl

use strict;

use Archive::Zip;
use Archive::Zip::MemberRead;
use DBI;

# First arg is the source directory. defaults to ./
my $dir = shift // '.';

# I don't have Oracle, and I couldn't be bothered setting up
# a database, table, and login account on mysql or postgres
# for this example, so I'll use SQLite.  Other databases are
# just as easy to connect to, but the connect() call will
# require other details like hostname, port, login, password,
# etc.
#
# Set up a database handle ($dbh) to the sqlite db called
# "notoracle.sqlite3":

my $dbname='notoracle.sqlite3';
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname","","");

foreach my $zf (glob "$dir/*.zip") {
  my $zip = Archive::Zip->new($zf);
  my @txt = grep { /\.\d+$/ } $zip->memberNames();

  foreach my $f (@txt) {
    my @data = ();

    # This example uses Archive::Zip::MemberRead, just to show
    # how to use it.
    my $fh  = Archive::Zip::MemberRead->new($zip, $f);
    while (defined(my $l = $fh->getline())) {
      if ($l =~ m/^999\s/) {
        push @data, substr($l,19,15);
      }
    };
    $fh->close();

    # Example 1: print matching records (each element needs to be
    # quoted when using IN, can't use placeholders):

    my @qdata = map { "'$_'" } @data; # quote each element of @data
    my $values = join(",",@qdata);
    my $sql = "select * from mytable where myfield in ($values)";
    print join(",", $dbh->selectrow_array($sql)),"\n";

    # Example 2 - using a placeholder ?, one element of @data
    # at a time.  There is no need to quote each element of
    # the @data array because placeholders handle quoting
    # automagically if and when required, depending on the data
    # type of the database field.

    my $sth = $dbh->prepare('select * from mytable where myfield = ?');
    foreach my $d (@data) {
      while (my @row = $sth->fetchrow_array($sql,undef,$d)) {
        print join(",",@row), "\n";
      }
    }
  }  # end of current member file
} # end of current zipfile

此示例没有示例输出，因为它是一个非功能性概念示例。由于同样的原因，此代码未经测试，可能存在小错误。它确实可以编译正常，perl -w -c process-zip-files-sql.pl但不能保证它确实有效或可以做任何有用的事情。

Answer 1