我正享受着金发时刻。
问题陈述: 因此,我在文件中有一些带有时间戳的数据,这些数据不断地以任意间隔更新,我只想过滤掉前一个整小时的数据(即,如果现在的时间是 14:35,我的意思是 13:00 之间的时间段) -14:00,类似地如果时间 00:03 则为 23:00-00:00)。一旦获得这些数据,我想将其发送到 mysql 表。一旦完成,这将使用 crontab 自动完成
我想要实现的目标: 理想情况下,我希望获取最后一小时的数据并将其发送到 mysql 表。如果那一小时没有任何内容,我希望它发送一条消息,说明“这一小时没有更新”。所以本质上我认为这是一个 if/else 语句,如果有新的东西这样做,否则没有更新。但这不起作用。
我认为正在发生的事情: 该代码检查最后一个完整小时,并将其发送到 mysql 表。但还检查所有其他记录,并且由于较旧的记录不在最后一小时内,因此它们被放入“其他”类别中,并且我收到很多“这一小时没有更新”的信息。问:我怎样才能超越这个?
这是我的代码(我删除了一些东西 - 所以语法上它可能不正确):
#!/bin/bash
#---------------------------------------------------------------------------------
# Adds all the transactions for the past full hour into a log file
# named log.csv
# input file: mqtt.csv
# output file: outfile.csv
#
#
#
#---------------------------------------------------------------------------------
current_time_hr=`date +%H` #Get the current hour only
current_time_date=`date +%F` #Get the full date YYYY-MM-DD
current_time_full=$current_time_date' '$current_time_hr':00:00' #Concatenate the date & current hour
current_time=`date -d "$current_time_full" +%s` #convert the time to seconds
echo "$current_time" #Print out
period=3600 #3600 seconds = 60 min = 1 hour
one_hr_before=$((current_time-period)) # subtrace one hour in seconds - to get one hour ago in seoonds
echo $one_hr_before #Print out
count=0 #Initialize the counteR
log=log.txt
out_file_name=outfile.csv #Output file
`rm -f $out_file_name` #Rmv to avoid appending
#Filter below
cat /DIR/SOME_DIR/mqtt.csv | grep SX | grep ',scan' | grep -v 'HDCU.*HDCU' | grep -v 'Sensor,Module' | grep '^[0-9]*,e,gf,STACKEXCHANGEe,HDCU.*,d,scan,.*,.*,.*,.*,.*,.*' | sed 's/,H[0-9][0-9]/,/g' | tail -n100 >>$out_file_name
cat $out_file_name | ( while read line
do
echo $line #Print out
#-----
# This is to convert $line into an array named "awk_var_array"
#----- ----- ----- ----- ----- ---------- ----- ---------- ----- -----
IFS=","
read -a awk_var_array <<< "${line}"
record_time=${awk_var_array[0]}
container=${awk_var_array[4]}
time_date=`date -d "@$record_time" +%F`' '`date -d "@$record_time" +%T`
no_array_less_one=$((${#awk_var_array[@]}-2))
#-----------
# Var is created & initialized; And is used to concatenate machine name parts
# in the consequtive fields before direction (IN/OUT) field
#----------
var=""
k=$((${#awk_var_array[@]}-1)) # k is the last element of the array
j=11 # j is the element where the part name starts
while [ $j -lt $k ] # do a for a loop with while
do
echo "j:$j k:$k ${awk_var_array[$j]}"
new_var=${awk_var_array[$j]}
var=$var' '$new_var
j=$((j+1))
done
part_name=${var:1} #gets rid of the first character as this is a ' ';
echo -e "\t\t\t\t\t\t\tvar $part_name"
count=$((count+1)) #Increase counter
echo "c-o-u-n-t $count" #Print out
full_date=`date -d "@$record_time" +%F`' '`date -d "@$record_time" +%T`
echo "FD:$full_date"
scan_id=${awk_var_array[9]}
dir=${awk_var_array[$k]}
dir=`echo $dir | sed 's/*//g'`
echo "FD:$full_date, SID:$scan_id, PartName:$part_name, DIR:$dir"
if [ "$record_time" -lt "$current_time" ] && [ "$record_time" -ge "$one_hr_before" ] #Bouncries for permissible data count
then
dbq="INSERT INTO mytable.table (time, part_number, direction, stock_point_name, scan_id, ignored) VALUES ('$time_date', '$part_name', '$dir', '$container', '$scan_id', '0');"
else
dbq="INSERT INTO mytable.table (part_number) VALUES ('there is no updates this hour');"
fi
mysql -uUSER -p'PASSWORD' << EOF
$dbq
EOF
done
echo "count: $count" #Print out
time_stamp_date=`date +%F` #Get date
time_stamp_time=`date +%T` #Get time
time_stamp=$time_stamp_date' '$time_stamp_time #Concatenate current DATE TIME then print out to log.csv file
printf "%20s;%8d;records added\n" "$time_stamp" "$count" >>$log )
答案1
您只需要稍微更新一下逻辑,只计算循环内实际添加的行数。如果没有添加数据,则在循环外部添加“无事可做”消息。
这是伪代码:
counter = 0
for each line
if date in range
insert data
increment counter
else
# do nothing
fi
done
if counter > 0
# some lines were imported, nothing more to do
else
insert 'no updates' message
fi
为每一行打开一个新的数据库连接效率非常低。如果您需要导入相当大的数据块,您应该进一步重构您的代码:在主循环中,构建一个.sql
包含所有插入语句的文件,然后在循环之后立即执行它们(理想情况下,在事务内) )。或者不生成插入,而是以适合的格式生成数据LOAD DATA INFILE
。