从表 DDL 中提取列名

2024-6-10 • tag-icon

text-processing

从表 DDL 中提取列名

我有一个包含表格列表的文本文件。我需要提取每个表的列，并需要将其与表名一起写入另一个 csv 文件。

例子

describe test_table
+-----------+------------+
| col_name  | data_type  |
+-----------+------------+
| Name      | string     |
| Age       | string     |
+-----------+------------+

我需要创建包含以下详细信息的 csv 文件

test_table,Name,Age

你能建议一下吗？

答案1

在每个 Unix 机器上的任何 shell 中使用任何 awk：

$ cat tst.awk
$1 == "describe" {
    out = $2
    next
}
/^[+]/ {
    mod = (++cnt % 3)
    if ( mod == 0 ) {
        print out
    }
    next
}
mod == 2 {
    out = out "," $2
}

$ awk -f tst.awk file
test_table,Name,Age

答案2

下面是在 perl 中执行此操作的一种方法的示例：

$ cat extract-column-names.pl
#!/usr/bin/perl -l

while(<>) {
  # Is the current line a "describe" line or are we at the End Of File?
  if (m/describe\s+(.*)/i || eof) {
    # Do we already a table name and column names?
    if ($table && @columns) {
      print join(",", $table, @columns);
      # clear current @columns array
      @columns=();
    };
    # extract table name
    $table = $1;
    next;
  };

  # skip header lines, ruler lines, and empty lines
  next if (m/col_name|-\+-|^\s*$/);

  # extract column name with regex capture group
  if (m/^\|\s+(\S+)\s+\|/) { push @columns, $1 };
}

示例输入，具有多个表描述：

$ cat table.txt
describe test_table
+-----------+------------+
| col_name  | data_type  |
+-----------+------------+
| Name      | string     |
| Age       | string     |
+-----------+------------+

describe test_table2
+------------+------------+
| col_name   | data_type  |
+------------+------------+
| FirstName  | string     |
| MiddleName | string     |
| LastName   | string     |
+------------+------------+

示例运行：

$ ./extract-column-names.pl table.txt
test_table,Name,Age
test_table2,FirstName,MiddleName,LastName

顺便说一句，这个脚本还可以处理标准输入（例如cat table.txt | ./extract-column-names.pl）和多个文件名参数（例如./extract-column-names.pl table1.txt table2.txt ... tableN.txt）。

data_type此外，添加提取每列的功能也不困难。这可以存储在单独的数组中（例如@types），或者您可以更改脚本以使用哈希（用作col_name键和data_type值）。但是，如果使用散列，请务必记住它们本质上是无序的，因此您仍然需要数组@columns来记住列的显示顺序。

单行版本：

$ perl -lne 'if (m/describe\s+(.*)/i || eof) {if ($table && @columns) {print join(",", $table, @columns);@columns=()}$table = $1;next};next if (m/col_name|-\+-|^\s*$/);if (m/^\|\s+(\S+)\s+\|/) {push @columns, $1};' table.txt 
test_table,Name,Age
test_table2,FirstName,MiddleName,LastName

答案3

准备测试环境：

# Get all the table
mysql -S /var/run/mysqld/mysqld.sock -D mysql -e 'SHOW TABLES;' > all_tables

# Get all the column for each tables
xargs -a all_tables -i -- /bin/sh -c '
    printf "describe %s\\n" "$1"
    mysql -S /var/run/mysqld/mysqld.sock -D mysql -te "DESC $1"
' _Z_ {} >> tables

这是摘录，文件看起来像您的示例数据：

describe column_stats
+---------------+-----------------------------------------+------+-----+---------+-------+
| Field         | Type                                    | Null | Key | Default | Extra |
+---------------+-----------------------------------------+------+-----+---------+-------+
| db_name       | varchar(64)                             | NO   | PRI | NULL    |       |
| table_name    | varchar(64)                             | NO   | PRI | NULL    |       |
| column_name   | varchar(64)                             | NO   | PRI | NULL    |       |
| min_value     | varbinary(255)                          | YES  |     | NULL    |       |
| max_value     | varbinary(255)                          | YES  |     | NULL    |       |
| nulls_ratio   | decimal(12,4)                           | YES  |     | NULL    |       |
| avg_length    | decimal(12,4)                           | YES  |     | NULL    |       |
| avg_frequency | decimal(12,4)                           | YES  |     | NULL    |       |
| hist_size     | tinyint(3) unsigned                     | YES  |     | NULL    |       |
| hist_type     | enum('SINGLE_PREC_HB','DOUBLE_PREC_HB') | YES  |     | NULL    |       |
| histogram     | varbinary(255)                          | YES  |     | NULL    |       |
+---------------+-----------------------------------------+------+-----+---------+-------+
describe columns_priv
+-------------+----------------------------------------------+------+-----+---------------------+-------------------------------+
| Field       | Type                                         | Null | Key | Default             | Extra                         |
+-------------+----------------------------------------------+------+-----+---------------------+-------------------------------+
| Host        | char(60)                                     | NO   | PRI |                     |                               |
| Db          | char(64)                                     | NO   | PRI |                     |                               |
| User        | char(80)                                     | NO   | PRI |                     |                               |
| Table_name  | char(64)                                     | NO   | PRI |                     |                               |
| Column_name | char(64)                                     | NO   | PRI |                     |                               |
| Timestamp   | timestamp                                    | NO   |     | current_timestamp() | on update current_timestamp() |
| Column_priv | set('Select','Insert','Update','References') | NO   |     |                     |                               |
+-------------+----------------------------------------------+------+-----+---------------------+-------------------------------+

解析使用awk：

awk -F'|' -v OFS=, '
    match($0, /^describe /) {
        tbl = substr($0, RSTART+RLENGTH)
        c = 3
        next
    } 

    /^[+]/ && c < 0 {print tbl}

    c-- <= 0 {
        gsub(/ */, "", $2)
        tbl = tbl OFS $2
    } 
' tables

相关内容