将数据从 LDIF 文件转换为 CSV

将数据从 LDIF 文件转换为 CSV

需要从 LDIF(文本)文件中空行之间的文本块转换选定的属性,并将其转换为带有逗号分隔分隔符的 CSV 文件,类似于以下示例:

例子:

LDIF 文件(作为输入):

<Blank Line>
AA: User11_Value1
BB: User11_Value2
CC: User11_Value3
DD: User11 Space Value4
<Blank Line>
AA: User22_Value1
BB: User22_Value2
CC: User22_Value3
DD: User22 Space Value4
<Blank Line>

将其转换为 CSV 格式(作为输出):

AA,BB,DD
User11_Value1,User11_Value2,User11 Space Value4
User22_Value1,User22_Value2,User22 Space Value4

答案1


与米勒 (http://johnkerl.org/miller/doc) 并且 sed 非常短且简单

sed 's/://g' input.txt | mlr --x2c cut -x -f CC

给你

AA,BB,DD
User11_Value1,User11_Value2,User11 Space Value4
User22_Value1,User22_Value2,User22 Space Value4

Whit sed 我删除了:以获得本地米勒输入格式(XTAB)之一,然后将 XTAB 转换为 CSV,--x2c最后我CC用 cut 删除了字段。

答案2

这是从 STDIN 读取 LDIF 并输出为 CSV 的脚本

#!/bin/bash

#

# Converts LDIF data to CSV.

# Doesn't handle comments very well. Use -LLL with ldapsearch to remove them.

#

# 2010-03-07

# [email protected]

#


# Show usage if we don't have the right params

if [ "$1" == "" ]; then

    echo ""

    echo "Usage: cat ldif.txt | $0 <attributes> [...]"

    echo "Where <attributes> contains a list of space-separated attributes to include in the CSV. LDIF data is read from stdin."

    echo ""

    exit 99

fi


ATTRS="$*"


c=0

while read line; do


    # Skip LDIF comments

    [ "${line:0:1}" == "#" ] && continue;


    # If this line is blank then it's the end of this record, and the beginning

    # of a new one.

    #

    if [ "$line" == "" ]; then


        output=""


        # Output the CSV record

        for i in $ATTRS; do


            eval data=\$RECORD_${c}_${i}

            output=${output}\"${data}\",


            unset RECORD_${c}_${i}


        done


        # Remove trailing ',' and echo the output

        output=${output%,}

        echo $output


        # Increase the counter

        c=$(($c+1))

    fi


    # Separate attribute name/value at the semicolon (LDIF format)

    attr=${line%%:*}

    value=${line#*: }


    # Save all the attributes in variables for now (ie. buffer), because the data

    # isn't necessarily in a set order.

    #

    for i in $ATTRS; do

        if [ "$attr" == "$i" ]; then

            eval RECORD_${c}_${attr}=\"$value\"

        fi

    done


done

点击这里了解更多

答案3

我发现简单脚本存在一些严重缺陷,如下所示:

  • 没有正确处理用于非 ASCII 字符或八位字节字符串的 Base64 编码数据
  • 没有正确处理换行
  • LDAP数据模型有多值属性

如果您不想在阅读完后自行解决此问题RFC 2849我建议使用 python-ldap 子模块实现一个简短的 Python 脚本目录和内置数据集模块。

相关内容