根据文件中的日期获取最新条目

根据文件中的日期获取最新条目

请看看您是否可以在下面提供帮助,我尝试了一些不同的方法,但无法实现我想要的目标。

用户ID.txt

user1
user2
user3
user4
user5

文件1.txt

AmLogin server1 [03/Feb/2021:00:04:09 -0600] "11.11.11.11 uid=user1,ou=users,ou=company1,o=company"
AmLogin server1 [03/Feb/2021:00:05:11 -0600] "22.22.22.22 uid=user2,ou=users,ou=company1,o=company"
AmLogin server1 [03/Feb/2021:00:08:25 -0600] "33.33.33.33 uid=user3,ou=users,ou=company1,o=company"

文件2.txt

AmLogin server2 [04/Feb/2021:00:01:09 -0600] "11.11.11.11 uid=user1,ou=users,ou=company1,o=company"
AmLogin server2 [04/Feb/2021:00:01:11 -0600] "22.22.22.22 uid=user2,ou=users,ou=company1,o=company"
AmLogin server2 [04/Feb/2021:00:01:25 -0600] "33.33.33.33 uid=user3,ou=users,ou=company1,o=company"
AmLogin server2 [04/Feb/2021:00:02:30 -0600] "11.11.11.11 uid=user1,ou=users,ou=company1,o=company"
AmLogin server2 [04/Feb/2021:00:05:20 -0600] "2.2.2.2 uid=user2,ou=people,dc=company2,dc=com"
AmLogin server5 [07/Feb/2021:00:02:30 -0600] "11.11.11.11 uid=user4,ou=People,ou=company1,o=company"
AmLogin server5 [08/Feb/2021:00:05:20 -0600] "2.2.2.2 uid=user5,ou=people,ou=employees,dc=company2,dc=com"

文件3.txt

AmLogin server3 [05/Feb/2021:00:01:11 -0600] "22.22.22.22 uid=user2,ou=users,ou=company1,o=company"
AmLogin server3 [05/Feb/2021:00:01:25 -0600] "33.33.33.33 uid=user3,ou=users,ou=company1,o=company"
AmLogin server3 [05/Feb/2021:00:09:25 -0600] "33.33.33.33 uid=user3,ou=users,ou=company1,o=company"
AmLogin server3 [08/Dec/2020:00:11:44 -0600] "33.33.33.33 uid=user3,ou=users,ou=company1,o=company" "App1" [0002222000] [0] []
AmLogin server3 [09/Feb/2021:00:07:50 -0600] "33.33.33.33 uid=user3,ou=users,ou=company1,o=company" "App2" [0003455000] [0] []

我想获取所有用户的最近登录信息,其中包含以下详细信息,上面显示的只是一个示例文件。我们有 100 个巨大的日志文件,应该从中提取数据。请注意,下面,user2 被提及两次,尽管 userid 显示相同,但​​它们是来自不同可分辨名称 (DN) 的两个不同用户。专有名称表示用户的整个路径。示例 user2 存在于ou=用户,ou=公司1,o=公司另一个 user2 存在于ou=人、dc=公司2、dc=com。对于 user4 其下ou=人员,ou=公司1,o=公司对于 user5 来说,它位于ou=人员、ou=员工、dc=company2、dc=com

请注意,双引号后末尾的某些条目(user3)有一些随机文本(“App1”[0002222000] [0] []),可以忽略。

预期输出.txt

user1|04/Feb/2021:00:02:30|uid=user1,ou=users,ou=company1,o=company
user2|05/Feb/2021:00:01:11|uid=user2,ou=users,ou=company1,o=company
user2|04/Feb/2021:00:05:20|uid=user2,ou=people,dc=company2,dc=com
user3|09/Feb/2021:00:07:50 -0600|uid=user3,ou=users,ou=company1,o=company
user4|07/Feb/2021:00:02:30|uid=user4,ou=People,ou=company1,o=company
user5|08/Feb/2021:00:05:20|uid=user5,ou=people,ou=employees,dc=company2,dc=com

或者没有时间,如果这会让事情变得更容易的话。

预期输出.txt

user1|04/Feb/2021|uid=user1,ou=users,ou=company1,o=company
user2|05/Feb/2021|uid=user2,ou=users,ou=company1,o=company
user2|04/Feb/2021|uid=user2,ou=people,dc=company2,dc=com
user3|09/Feb/2021|uid=user3,ou=users,ou=company1,o=company
user4|07/Feb/2021|uid=user4,ou=People,ou=company1,o=company
user5|08/Feb/2021|uid=user5,ou=people,ou=employees,dc=company2,dc=com

尝试使用 grep 所有带有用户名的文件$i,但这需要很长时间:

grep $i file*.txt | tail -1

答案1

#!/bin/sh

while read -r user
  do
    for group in users people
      do
        sed -nr "s/.*\[(\S+).*\s(uid=$user,ou=$group,.*)./$user|\1|\2/p" file*.txt | sort -t\| -k2.8nr -k2.4Mr -k2.1nr -k2.13,2.20r | grep -m1 "|uid=$user,ou=$group,"
    done
done < userid.txt

编辑:

如果日志文件已排序,则迭代每个 DN 和tail -n1.第一遍将扫描用户并userdn.txt为第二遍生成另一个输入文件

#!/bin/sh

# list of users (from logs)
grep -Fiwhf userid.txt file*.txt | grep -io 'uid=[^"]*' | sort --ignore-case -u > userdn.txt

# last login
while read -r user
  do
    grep -Fiwh "$user" file*.txt | tail -n1 | sed -nr 's/.*\[(\S+).*\suid=([^,]+)(.*)./\2|\1|uid=\2\3/p'
done < userdn.txt

或通过进程替换(仅限 bash)

#!/bin/bash

while read -r user
  do
    grep -Fiwh "$user" file*.txt | tail -n1 | sed -nr 's/.*\[(\S+).*\suid=([^,]+)(.*)./\2|\1|uid=\2\3/p'
done < <(grep -Fiwhf userid.txt file*.txt | grep -io 'uid=[^"]*' | sort --ignore-case -u)

如果日志文件未排序并且时间比磁盘空间更重要,则通过分三步仅排序一次来节省时间。

-create userdn.txt
-按日期将日志文件排序为大的单个文件-对每个 DN
进行迭代bigfile.txt

#!/bin/sh

# list of users (from logs)
grep -Fiwhf userid.txt file*.txt | grep -io 'uid=[^"]*' | sort --ignore-case -u > userdn.txt

# merge log files
grep -Fiwhf userdn.txt file*.txt | sed -nr 's/.*\[(\S+).*\suid=([^,]+)(.*)./\2|\1|uid=\2\3/p' | sort -t\| -k2.8nr -k2.4Mr -k2.1nr -k2.13,2.20r > bigfile.txt

# last login
while read -r user
  do
    grep -Fiwm1 "$user" bigfile.txt
done < userdn.txt

这仍然不是一个好的解决方案,因为最终每个日志都会为每个用户处理多次。必须有另一个解决方案awkjoinpasteuniq或某物。

理想情况下我会结合...

tac file*.txt | grep -m1 -f userdn.txt

...但这行不通,原因有两个:
tac没有按预期工作,而是一个接一个地串行处理每个日志。
-m1结合-f不搜索全部模式(来自文件),而是在之后停止任何图案。
即使,这只适用于已经排序的日志:(


您想要的是单遍处理日志文件,读取每一行,将结果写入另一个文件,其中写入事件是通过函数实现的。

这个函数应该:

-检查 DN 是否已存在
-比较日期
-更新现有条目
-仅添加新条目

#!/bin/bash

shopt -s extglob

# function compare date
compare () {
  [ -n "$2" ] || return 1

  # sort array
  for date in "$@"
    do
      echo "$date"
  done | sort -k1.8n -k1.4M -k1.1n -k1.13,1.20 | tail -n1

  return 0
}

# function write last_login.txt
update () {
  local file=$1 line=$2
  [ -n "$line" ] || return 1

  # string manipulation
  dn=${line#*\"}; dn=${dn%%\"*}; dn=${dn#*+([[:blank:]])}; [ -n "$dn" ] || return 1
  user=${dn%%,*}; user=${user#*=};
  date2=${line#*[}; date2=${date2%%]*}; date2=${date2%+([[:blank:]])*};

  [ -f "$file" ] && date1=$(grep -Fiwm1 "$dn" "$file" | cut -d\| -f2)
  if [ -n "$date1" ]
    then
      # DN already exist
      [ "$date1" = "$date2" ] && return 0
      date=$(compare "$date1" "$date2")
      if [ "$date" != "$date1" ]
        then
          # update existing entry
          sed -i "s;$user|$date1|$dn;$user|$date2|$dn;i" "$file"
      fi
    else
      # add new entries only
      echo "$user|$date2|$dn" >> "$file"
  fi

  return 0
}

# create last_login.txt
for file in file*.txt
  do
    [ -f "$file" ] || continue
    echo "processing $file"
    while read -r line
      do
        update last_login.txt "${line//;/,}"
    done < <(tac "$file")
done

# sort last_login.txt
echo -n "sorting... "
sort -o last_login.txt last_login.txt
echo "finished"

exit 0

相关内容