如何搜索新数据并上传到 mysql 表，

2024-5-22 • tag-icon

我正享受着金发时刻。

问题陈述： 因此，我在文件中有一些带有时间戳的数据，这些数据不断地以任意间隔更新，我只想过滤掉前一个整小时的数据（即，如果现在的时间是 14:35，我的意思是 13:00 之间的时间段） -14:00，类似地如果时间 00:03 则为 23:00-00:00)。一旦获得这些数据，我想将其发送到 mysql 表。一旦完成，这将使用 crontab 自动完成

我想要实现的目标： 理想情况下，我希望获取最后一小时的数据并将其发送到 mysql 表。如果那一小时没有任何内容，我希望它发送一条消息，说明“这一小时没有更新”。所以本质上我认为这是一个 if/else 语句，如果有新的东西这样做，否则没有更新。但这不起作用。

我认为正在发生的事情： 该代码检查最后一个完整小时，并将其发送到 mysql 表。但还检查所有其他记录，并且由于较旧的记录不在最后一小时内，因此它们被放入“其他”类别中，并且我收到很多“这一小时没有更新”的信息。问：我怎样才能超越这个？

这是我的代码（我删除了一些东西 - 所以语法上它可能不正确）：

#!/bin/bash
#---------------------------------------------------------------------------------
#   Adds all the transactions for the past full hour into a log file
#   named log.csv
#   input file:     mqtt.csv
#   output file:    outfile.csv
#
#
#
#---------------------------------------------------------------------------------

current_time_hr=`date +%H`      #Get the current hour only
current_time_date=`date +%F`    #Get the full date YYYY-MM-DD
current_time_full=$current_time_date' '$current_time_hr':00:00'  #Concatenate the date & current hour
current_time=`date -d "$current_time_full" +%s`     #convert the time to seconds
echo "$current_time"                    #Print out
period=3600                                     #3600 seconds = 60 min = 1 hour
one_hr_before=$((current_time-period))    # subtrace one hour in seconds - to get one hour ago in seoonds 
echo $one_hr_before                     #Print out

count=0                             #Initialize the counteR
log=log.txt
out_file_name=outfile.csv               #Output file

`rm -f $out_file_name`              #Rmv to avoid appending

#Filter below
cat /DIR/SOME_DIR/mqtt.csv | grep SX | grep ',scan' | grep -v 'HDCU.*HDCU' | grep -v 'Sensor,Module' | grep '^[0-9]*,e,gf,STACKEXCHANGEe,HDCU.*,d,scan,.*,.*,.*,.*,.*,.*' | sed 's/,H[0-9][0-9]/,/g' | tail -n100  >>$out_file_name

cat $out_file_name | ( while read line
do
    echo $line    #Print out

    #-----
    # This is to convert $line into an array named "awk_var_array"
    #----- ----- ----- ----- ----- ---------- ----- ---------- ----- -----
    IFS="," 
    read -a awk_var_array <<< "${line}"

    record_time=${awk_var_array[0]}
    container=${awk_var_array[4]}   
    time_date=`date -d "@$record_time" +%F`' '`date -d "@$record_time" +%T`

    no_array_less_one=$((${#awk_var_array[@]}-2))

    #-----------
    # Var is created & initialized; And is used to concatenate machine name parts 
    # in the consequtive fields before direction (IN/OUT) field
    #---------- 
        var=""
    k=$((${#awk_var_array[@]}-1)) # k is the last element of the array
    j=11                  # j is the element where the part name  starts
    while [ $j -lt $k ]       # do a for a loop with while
    do
        echo "j:$j k:$k  ${awk_var_array[$j]}" 
        new_var=${awk_var_array[$j]}
                var=$var' '$new_var

        j=$((j+1))

    done
    part_name=${var:1} #gets rid of the first character as this is a ' ';

    echo -e  "\t\t\t\t\t\t\tvar $part_name"

        count=$((count+1))   #Increase counter

        echo "c-o-u-n-t $count"   #Print out

    full_date=`date -d "@$record_time" +%F`' '`date -d "@$record_time" +%T`
    echo "FD:$full_date"
    scan_id=${awk_var_array[9]}
    dir=${awk_var_array[$k]}
    dir=`echo $dir | sed 's/*//g'`
    echo "FD:$full_date, SID:$scan_id, PartName:$part_name, DIR:$dir"

  if [ "$record_time" -lt "$current_time" ] && [ "$record_time" -ge "$one_hr_before" ]   #Bouncries for permissible data count
    then

        dbq="INSERT INTO mytable.table (time, part_number, direction, stock_point_name, scan_id, ignored) VALUES ('$time_date', '$part_name', '$dir', '$container', '$scan_id', '0');"

    else


        dbq="INSERT INTO mytable.table (part_number) VALUES ('there is no updates this hour');"

    fi

mysql -uUSER -p'PASSWORD' << EOF
$dbq
EOF


done


echo "count: $count"            #Print out
time_stamp_date=`date +%F`  #Get date
time_stamp_time=`date +%T`  #Get time
time_stamp=$time_stamp_date' '$time_stamp_time  #Concatenate current DATE TIME then print out to log.csv file
printf "%20s;%8d;records added\n"  "$time_stamp" "$count" >>$log )

答案1

您只需要稍微更新一下逻辑，只计算循环内实际添加的行数。如果没有添加数据，则在循环外部添加“无事可做”消息。

这是伪代码：

counter = 0
for each line
  if date in range
    insert data
    increment counter
  else
    # do nothing
  fi
done
if counter > 0
  # some lines were imported, nothing more to do
else
  insert 'no updates' message
fi

为每一行打开一个新的数据库连接效率非常低。如果您需要导入相当大的数据块，您应该进一步重构您的代码：在主循环中，构建一个.sql包含所有插入语句的文件，然后在循环之后立即执行它们（理想情况下，在事务内））。或者不生成插入，而是以适合的格式生成数据LOAD DATA INFILE。

答案1

相关内容