在UNIX脚本中将非结构化记录形成结构化记录

在UNIX脚本中将非结构化记录形成结构化记录

使用UNIX系统编写以下要开发的代码。

我有一个 20-40 GB 数据的文件包含如下数据。提供样本记录。

AUTO="text1" CAR="its mine" LORRY="commercial vehicle" CART="simple " BULLCART="same old one simple with bull" TRUCK="Multi purpose"
AUTO="text2" BUS="commercial vehicle" LORRY="its a vehicle" CART="without bull" BULLCART="with bull" TRUCK="Multi purpose"
AUTO="text3" BUS="commercial vehicle" CAR="Personal" LORRY="mini one ?" BULLCART="bull" TRUCK="Multi purpose"
AUTO="" CART="simple without bull" BULLCART="nothing spl with bull" TRUCK="Multi purpose"
AUTO="long text" BUS="commercial vehicle" CAR="jubel" BULLCART="" TRUCK="Multi purpose"
AUTO="message" CAR="others" LORRY="commercial vehicle" BULLCART="not null" TRUCK="Multi purpose"
AUTO="cleverwiz" BUS="commercial vehicle" CAR="yours" LORRY="max vehicle" CART="bull is there" TRUCK="Multi purpose"
AUTO="passengers only" BUS="commercial vehicle" CAR="ramsoh" LORRY="maintainable " CART="old one" BULLCART="simple with bull" 

字段顺序为 AUTO、BUS、CAR、LORRY、CART、BULLCART、TRUCK

预期输出应该是,如果 CAR 不存在,则需要插入第三个字段 CAR="";如果没有 LORRY,则插入 LORRY="" 作为第四个字段。

如果您查看第一条记录 BUS 不存在,则需要插入 BUS="" 作为第二个字段。所以输出将是

AUTO="text1" BUS="" CAR="its mine" LORRY="commercial vehicle" CART="simple " BULLCART="same old one simple with bull" TRUCK="Multi purpose"

第 4 条记录 BUS、CAR、LORRY 不存在,因此需要插入 BUS="" CAR=""。输出将是

AUTO="" BUS="" CAR="" LORRY="" CART="simple without bull" BULLCART="nothing spl with bull" TRUCK="Multi purpose"

答案1

有人可能会说,放弃现成的解决方案并不好,但如果你问的话。干得好:

#!/bin/bash


#  AUTO, BUS, CAR, LORRY, CART, BULLCART, TRUCK

get_param(){
    param=`echo $1|sed -E  '/.*\b('$2'\=\"[^"]*\").*/!d;s/.*\b('$2'\=\"[^"]*\").*/\1/'`

    if [ -z "$param" ]; then 
        param=$2'=""'
    fi
    echo $param;
}
process_line(){
    auto=`get_param "$1" 'AUTO'`
    bus=`get_param "$1" 'BUS'`
    car=`get_param "$1" 'CAR'`
    lorry=`get_param "$1" 'LORRY'`
    cart=`get_param "$1" 'CART'`
    bullcart=`get_param "$1" 'BULLCART'`
    truck=`get_param "$1" 'TRUCK'`
    echo $auto $bus $car $lorry $cart $bullcart $truck 
}

while read LINE;do
        process_line "$LINE"
done < source.txt

相关内容