即使内容不同,如何找到重复的目录路径?

即使内容不同,如何找到重复的目录路径?

我到处搜索,但似乎除了一个(查找并列出重复的目录)我发现主题实际上涉及我的情况,但结果并不完全是我需要的。

编辑:这里有一些示例数据,可以帮助展示我想要完成的任务。下面是两组目录的列表。


idx1
idx1/defaultdb
idx1/defaultdb/thaweddb
idx1/defaultdb/colddb
idx1/defaultdb/db
idx1/defaultdb/db/rb_1558019513_1558019454_4_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1558019513_1558019454_4_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/rb_1541720372_1541194569_2_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1541720372_1541194569_2_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/rb_1558019538_1558019538_5_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1558019538_1558019538_5_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/db_1558019449_1558019418_3_9542F466-F8CA-49EB-8120-5409B813F147
idx1/defaultdb/db/db_1558019449_1558019418_3_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx1/defaultdb/db/rb_1558019389_1558018342_3_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1558019389_1558018342_3_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/rb_1558019898_1558019898_7_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1558019898_1558019898_7_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/db_1557947113_1557947083_0_9542F466-F8CA-49EB-8120-5409B813F147
idx1/defaultdb/db/db_1557947113_1557947083_0_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx1/defaultdb/db/rb_1549909440_1549908720_1_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1549909440_1549908720_1_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/test
idx1/defaultdb/db/rb_1558019813_1558019569_6_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1558019813_1558019569_6_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/rb_1558020652_1558020018_8_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1558020652_1558020018_8_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/db_1541720372_1541194569_2_9542F466-F8CA-49EB-8120-5409B813F147
idx1/defaultdb/db/db_1541720372_1541194569_2_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx1/defaultdb/db/GlobalMetaData
idx1/defaultdb/db/db_1558019873_1558019567_4_9542F466-F8CA-49EB-8120-5409B813F147
idx1/defaultdb/db/db_1558019873_1558019567_4_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx1/defaultdb/db/db_1558020619_1558019927_5_9542F466-F8CA-49EB-8120-5409B813F147
idx1/defaultdb/db/db_1558020619_1558019927_5_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx1/defaultdb/db/rb_1557960001_1557771284_0_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx1/defaultdb/db/rb_1557960001_1557771284_0_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx1/defaultdb/db/db_1558032446_1558018050_1_9542F466-F8CA-49EB-8120-5409B813F147
idx1/defaultdb/db/db_1558032446_1558018050_1_9542F466-F8CA-49EB-8120-5409B813F147/rawdata

idx2
idx2/defaultdb
idx2/defaultdb/thaweddb
idx2/defaultdb/colddb
idx2/defaultdb/db
idx2/defaultdb/db/db_1558019813_1558019569_6_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx2/defaultdb/db/db_1558019813_1558019569_6_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx2/defaultdb/db/rb_1557947113_1557947083_0_9542F466-F8CA-49EB-8120-5409B813F147
idx2/defaultdb/db/rb_1557947113_1557947083_0_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx2/defaultdb/db/db_1558019513_1558019454_4_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx2/defaultdb/db/db_1558019513_1558019454_4_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx2/defaultdb/db/rb_1558019449_1558019418_3_9542F466-F8CA-49EB-8120-5409B813F147
idx2/defaultdb/db/rb_1558019449_1558019418_3_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx2/defaultdb/db/db_1558019898_1558019898_7_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx2/defaultdb/db/db_1558019898_1558019898_7_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx2/defaultdb/db/db_1558019538_1558019538_5_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx2/defaultdb/db/db_1558019538_1558019538_5_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx2/defaultdb/db/rb_1541720372_1541194569_2_9542F466-F8CA-49EB-8120-5409B813F147
idx2/defaultdb/db/rb_1541720372_1541194569_2_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx2/defaultdb/db/db_1541720372_1541194569_2_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx2/defaultdb/db/db_1541720372_1541194569_2_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx2/defaultdb/db/test
idx2/defaultdb/db/rb_1558032446_1558018050_1_9542F466-F8CA-49EB-8120-5409B813F147
idx2/defaultdb/db/rb_1558032446_1558018050_1_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx2/defaultdb/db/db_1557960001_1557771284_0_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx2/defaultdb/db/db_1557960001_1557771284_0_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx2/defaultdb/db/db_1558019389_1558018342_3_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx2/defaultdb/db/db_1558019389_1558018342_3_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx2/defaultdb/db/GlobalMetaData
idx2/defaultdb/db/5_9542F466-F8CA-49EB-8120-5409B813F147
idx2/defaultdb/db/5_9542F466-F8CA-49EB-8120-5409B813F147/rawdata
idx2/defaultdb/db/db_1549909440_1549908720_1_AB8C9371-027D-4FE0-B2F3-BAF93F106480
idx2/defaultdb/db/db_1549909440_1549908720_1_AB8C9371-027D-4FE0-B2F3-BAF93F106480/rawdata
idx2/defaultdb/db/rb_1558019873_1558019567_4_9542F466-F8CA-49EB-8120-5409B813F147
idx2/defaultdb/db/rb_1558019873_1558019567_4_9542F466-F8CA-49EB-8120-5409B813F147/rawdata

假设我有以下一个:

idx1/defaultdb/db/rb_1558019513_1558019454_4_AB8C9371-027D-4FE0-B2F3-BAF93F106480

我想检查idx2目录中是否defaultdb/db/rb_1558019513_1558019454_4_AB8C9371-027D-4FE0-B2F3-BAF93F106480存在,如果存在我想打印它。

最终目标是每个人完全的目录(目录没有子目录,我不想defaultdb显示而是子目录)在所有顶级目录中都是唯一的,是两个不同顶级目录中存在的子目录列表。从那里我将删除其中之一。


Edit2:这就是我当前的工作副本的样子。可能有一些我需要修复的错误。它接受当前路径中的目录,找到相同的路径名并删除第二个目录中的路径名。

#!/bin/bash
echo 'Chosen directories must reside in the current directory.'
echo 'This will find duplicate sub directories between the two and delete the ones in the second path.'
echo ''
read -p 'First directory to compare:' DIR1
read -p 'second directory to compare:' DIR2

depth="${DIR1//[^\/]}"
depth="${#depth}"
recurse='..'

for ((i=1; i<=depth; i++)) {
    recurse="${recurse}/.."
}

cd $DIR1; find . -type d > "$recurse"/list.txt; cd "$recurse"
cd $DIR2; find . -type d >> "$recurse"/list.txt; cd "$recurse"
echo 'Paths found:'
echo ''
awk 'seen[$1]++ {print $1}' list.txt | grep -v "db$" | grep -v "\.$"
echo ''
read -p 'Delete paths in ${DIR2}? (y/n)' bool
case 'y' in
    $bool)
    echo 'deleting:'
    awk 'seen[$1]++ {print $1}' list.txt | grep -v "db$" | grep -v "\.$"    
    cd $DIR2
    awk 'seen[$1]++ {print $1}' "$recurse"/list.txt | grep -v "db$" | grep -v "\.$" | xargs rmdir   
    echo ''
esac

答案1

我会选择这样的东西:

  1. 列出 idx1 端的目录:cd idx1/defaultdb; find . -type d > path/to/list.txt

  2. 列出 idx2 端的目录:cd idx2/defaultdb; find . -type d >> path/to/list.txt

  3. 查找重复项:awk 'seen[$1]++ {print $1}' path/to/list.txt

注意:

  • 这只是一个总体概念。仍需打磨才能成为脚本;-)
  • 这两个find命令必须实际写入同一个文件。相应地选择其路径。

相关内容