首先,我不是 NAS 专家,也不是 Unix 专家。其次,我读过另一个 RSS 下载器问题,但它rsstail
是 C 源代码,并且tvrss
是 GUI 程序。我还查看了网易,但我无法按标题排序。
我买了一台 Zyxel NSA310,觉得在上面安装一个 RSS 阅读器/下载器是个好主意。上面有一个广播捕捉器,但它不存储 RSS 项目,我不需要基于 RSS 项目启动下载的额外逻辑。
它应该下载 RSS 源并将内容存储在可读文件中。当我旅行回家时,我希望看到这些源的所有 RSS 项目,而不仅仅是最新的 30 个或其他内容。由于我使用 PuTTY 连接到我的 NAS,所以我不能使用任何 GUI 程序。至于输出,我不知道要查找什么,但简单的文本就足够了。对我来说,按标题和日期对 RSS 项目进行排序是有意义的。
让您了解 NAS 上的内容:
root@NAS:~# uname -a
Linux NAS 2.6.31.8 #4 Fri Jul 20 18:10:42 CST 2012 armv5tel GNU/Linux
root@NAS:~# busybox
BusyBox v1.17.2 (2012-07-20 17:55:00 CST) multi-call binary.
Copyright (C) 1998-2009 Erik Andersen, Rob Landley, Denys Vlasenko
and others. Licensed under GPLv2.
See source distribution for full notice.
Usage: busybox [function] [arguments]...
or: function [arguments]...
BusyBox is a multi-call binary that combines many common Unix
utilities into a single executable. Most people will create a
link to busybox for each function they wish to use and BusyBox
will act like whatever it was invoked as.
Currently defined functions:
[, [[, acpid, addgroup, adduser, adjtimex, arp, ash, awk, basename,
beep, bzip2, cat, catv, chat, chgrp, chmod, chown, chroot, chrt, chvt,
cksum, clear, cmp, cp, crond, crontab, cryptpw, cttyhack, cut, date,
dd, deallocvt, delgroup, deluser, depmod, devmem, df, diff, dirname,
dmesg, dnsdomainname, du, echo, ed, egrep, eject, env, expand, expr,
false, fgrep, flock, free, freeramdisk, fsck, fsync, getopt, getty,
grep, gunzip, gzip, halt, head, hexdump, hostname, hwclock, id,
ifconfig, init, insmod, install, ionice, ip, ipaddr, ipcrm, ipcs,
iplink, iproute, iprule, kill, killall, less, linuxrc, ln, logger,
login, losetup, ls, lsmod, lspci, lsusb, lzop, lzopcat, man, microcom,
mkdir, mke2fs, mkfs.ext2, mknod, mkpasswd, mkswap, mktemp, modinfo,
modprobe, more, mountpoint, mv, netstat, nslookup, openvt, passwd,
pidof, ping, ping6, pivot_root, poweroff, printenv, printf, ps, pwd,
raidautorun, rdev, readahead, readlink, reboot, reset, resize, rev, rm,
rmdir, rmmod, route, rtcwake, script, scriptreplay, sed, seq, setfont,
setlogcons, setsid, sh, sha256sum, sha512sum, showkey, sleep, sort,
split, start-stop-daemon, strings, su, swapoff, swapon, sync, tac,
tail, tar, tee, telnet, telnetd, test, time, timeout, top, touch, tr,
traceroute, traceroute6, true, tty, ttysize, ubiattach, ubidetach,
udhcpc, udpsvd, uname, unexpand, uniq, unlzop, unxz, unzip, uptime,
usleep, vi, volname, wall, wc, which, who, whoami, xargs, xz, xzcat,
yes, zcat, zcip
我没有cc/gcc
或make
。我也找不到包安装程序。
Google 为我找到了使用 Python 编写的东西feedreader
,但那些需要distutils
,而我没有,而且我似乎无法安装它。
我现在能想到的唯一解决方案是wget
与 结合使用crontab
。但是,我必须在 shell 脚本中完成所有解析工作(因为我不懂 Python 或 Perl)。这似乎不方便。
我能做什么/尝试什么?任何帮助都非常感谢。
答案1
以下是我似乎觉得有效的方法。我运行了crontab
一段时间,效果不错。我认为它支持 RSS 和 Atom。但是,如果您打算使用它,请将其视为起点。如果您调整了它并且更改有用,请随时回来告诉社区。
结果是一个目录结构,其中 feed 名称是目录名称,各种条目写为文件。这些文件包含未更改的内容,底部是指向源的链接。文件名由项目标题及其时间戳组成,文件的日期设置为该时间戳。这样,您可以轻松地按名称或时间对它们进行排序。
rss-parser.awk
:
function removeFirstTag(str) {
sub("[^>]*>", "", str)
return str
}
function extractCdata(str) {
gsub(".*<!\\[CDATA\\[|]]>.*", "", str)
return str
}
function cleanup(str) {
return extractCdata(removeFirstTag(str))
}
BEGIN {
written = 0
existed = 0
if (feedTitle) {
system("mkdir -p '" outDir "/" feedTitle "'")
}
}
/<\/description>|<\/content>/ {
indescr = 0
}
/<\/summary>/ {
insummary = 0
}
/<\/item>|<\/entry>/ {
gsub(/\//, "-", title)
fname = outDir "/" feedTitle "/" title " - " dateSec
fnameEnc = fname
gsub(/\$/, "\\$", fnameEnc)
gsub(/"/, "\\\"", fnameEnc)
if (!description) {
description = summary
}
if (link != "") {
description = description "\nLink: " link
} else if (guid != "") {
description = description "\nLink: " guid
}
write = system(binPath "/stat \"" fnameEnc "\" -c \"%u\" 1> /dev/null 2>&1")
if (write == 1) {
print description > fname
system(binPath "/touch \"" fnameEnc "\" -d \"" date "\"")
written = written + 1
} else {
existed = existed + 1
}
}
{
if (indescr == 1) {
description = description "\n" $0
}
}
/<title[ >]/ {
if (!feedTitle) {
feedTitle = cleanup($0)
system("mkdir -p '" outDir "/" feedTitle "'")
} else {
title = cleanup($0)
}
}
/<item[ >]|<entry[ >]/ {
title = ""
description = ""
summary = ""
date = ""
link = ""
guid = ""
}
/<description[ >]|<content[ >]/ {
indescr = 1
description = cleanup($0)
}
/<summary[ >]/ {
insummary = 1
summary = cleanup($0)
}
/<link[ >]/ {
link = cleanup($0)
if (index(link, "magnet") == 1) {
link = ""
}
}
/<guid[ >]|<id[ >]/ {
guid = cleanup($0)
}
/<pubDate[ >]|<published[ >]|<dc:date[ >]/ {
date = cleanup($0)
binPath "/date -d '" date "' +'%s'" | getline dateSec
}
END {
print "Done : " feedTitle " (new: " written "/old: " existed ")"
if (existed == 0 && written != 0) {
print "=== NOTE === : All entries are new. You may want to check this feed more often"
}
}
该文件rss-parser.cfg
由如下行组成
http://thatonesite.com/feed-with-nice-title
http://thatothersite.com/feed=Specific feed title is better
rss-parser.sh
:
#!/bin/sh
parse() {
url="$1"
name="$2"
echo "Start: $name ($url)"
wget -q -U "$userAgent" -O rss-news.xml $url
sed -e 's/
//g' -e 's/<\([a-z/]\)/\n<\1/g' rss-news.xml | \
$binPath/awk \
-v "feedTitle=$name" \
-v "binPath=$binPath" \
-v "outDir=./data" \
-f rss-parser.awk | \
more
}
userAgent="Opera/9.80 (Windows NT 6.1; Win64; x64; U; de) Presto/2.10.289 Version/12.01"
binPath=/bin
cd `dirname $0`
if [ -z "$1" ]; then
echo "================= `$binPath/date +"%d.%m.%Y %H:%M"`"
start=`$binPath/date +%s`
while read line
do
if [ ! -z "$line" -a "${line:0:1}" != "#" ]; then
url=${line%%=*}
name=${line#*=}
if [ "$name" == "$url" ]; then
name=""
fi
parse "$url" "$name"
fi
done < "rss-parser.cfg"
echo "================= Took `$binPath/date -d "$start seconds ago" +%s` seconds"
else
parse "$1" "$2"
fi
如果没有参数,它将解析rss-parser.cfg
。否则参数是
./rss-parser.sh [<URL> [<feed name>]]
我希望它对任何人都有用......