NAS 上的 RSS 阅读器/下载器

NAS 上的 RSS 阅读器/下载器

首先,我不是 NAS 专家,也不是 Unix 专家。其次,我读过另一个 RSS 下载器问题,但它rsstail是 C 源代码,并且tvrss是 GUI 程序。我还查看了网易,但我无法按标题排序。

我买了一台 Zyxel NSA310,觉得在上面安装一个 RSS 阅读器/下载器是个好主意。上面有一个广播捕捉器,但它不存储 RSS 项目,我不需要基于 RSS 项目启动下载的额外逻辑。

它应该下载 RSS 源并将内容存储在可读文件中。当我旅行回家时,我希望看到这些源的所有 RSS 项目,而不仅仅是最新的 30 个或其他内容。由于我使用 PuTTY 连接到我的 NAS,所以我不能使用任何 GUI 程序。至于输出,我不知道要查找什么,但简单的文本就足够了。对我来说,按标题和日期对 RSS 项目进行排序是有意义的。

让您了解 NAS 上的内容:

root@NAS:~# uname -a
Linux NAS 2.6.31.8 #4 Fri Jul 20 18:10:42 CST 2012 armv5tel GNU/Linux
root@NAS:~# busybox
BusyBox v1.17.2 (2012-07-20 17:55:00 CST) multi-call binary.
Copyright (C) 1998-2009 Erik Andersen, Rob Landley, Denys Vlasenko
and others. Licensed under GPLv2.
See source distribution for full notice.

Usage: busybox [function] [arguments]...
   or: function [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use and BusyBox
        will act like whatever it was invoked as.

Currently defined functions:
        [, [[, acpid, addgroup, adduser, adjtimex, arp, ash, awk, basename,
        beep, bzip2, cat, catv, chat, chgrp, chmod, chown, chroot, chrt, chvt,
        cksum, clear, cmp, cp, crond, crontab, cryptpw, cttyhack, cut, date,
        dd, deallocvt, delgroup, deluser, depmod, devmem, df, diff, dirname,
        dmesg, dnsdomainname, du, echo, ed, egrep, eject, env, expand, expr,
        false, fgrep, flock, free, freeramdisk, fsck, fsync, getopt, getty,
        grep, gunzip, gzip, halt, head, hexdump, hostname, hwclock, id,
        ifconfig, init, insmod, install, ionice, ip, ipaddr, ipcrm, ipcs,
        iplink, iproute, iprule, kill, killall, less, linuxrc, ln, logger,
        login, losetup, ls, lsmod, lspci, lsusb, lzop, lzopcat, man, microcom,
        mkdir, mke2fs, mkfs.ext2, mknod, mkpasswd, mkswap, mktemp, modinfo,
        modprobe, more, mountpoint, mv, netstat, nslookup, openvt, passwd,
        pidof, ping, ping6, pivot_root, poweroff, printenv, printf, ps, pwd,
        raidautorun, rdev, readahead, readlink, reboot, reset, resize, rev, rm,
        rmdir, rmmod, route, rtcwake, script, scriptreplay, sed, seq, setfont,
        setlogcons, setsid, sh, sha256sum, sha512sum, showkey, sleep, sort,
        split, start-stop-daemon, strings, su, swapoff, swapon, sync, tac,
        tail, tar, tee, telnet, telnetd, test, time, timeout, top, touch, tr,
        traceroute, traceroute6, true, tty, ttysize, ubiattach, ubidetach,
        udhcpc, udpsvd, uname, unexpand, uniq, unlzop, unxz, unzip, uptime,
        usleep, vi, volname, wall, wc, which, who, whoami, xargs, xz, xzcat,
        yes, zcat, zcip

我没有cc/gccmake。我也找不到包安装程序。

Google 为我找到了使用 Python 编写的东西feedreader,但那些需要distutils,而我没有,而且我似乎无法安装它。

我现在能想到的唯一解决方案是wget与 结合使用crontab。但是,我必须在 shell 脚本中完成所有解析工作(因为我不懂 Python 或 Perl)。这似乎不方便。

我能做什么/尝试什么?任何帮助都非常感谢。

答案1

以下是我似乎觉得有效的方法。我运行了crontab一段时间,效果不错。我认为它支持 RSS 和 Atom。但是,如果您打算使用它,请将其视为起点。如果您调整了它并且更改有用,请随时回来告诉社区。

结果是一个目录结构,其中 feed 名称是目录名称,各种条目写为文件。这些文件包含未更改的内容,底部是指向源的链接。文件名由项目标题及其时间戳组成,文件的日期设置为该时间戳。这样,您可以轻松地按名称或时间对它们进行排序。

rss-parser.awk

function removeFirstTag(str) {
    sub("[^>]*>", "", str)
    return str
}
function extractCdata(str) {
    gsub(".*<!\\[CDATA\\[|]]>.*", "", str)
    return str
}
function cleanup(str) {
    return extractCdata(removeFirstTag(str))
}

BEGIN {
    written = 0
    existed = 0
    if (feedTitle) {
        system("mkdir -p '" outDir "/" feedTitle "'")
    }
}

/<\/description>|<\/content>/ {
    indescr = 0
}
/<\/summary>/ {
    insummary = 0
}
/<\/item>|<\/entry>/ {
    gsub(/\//, "-", title)
    fname = outDir "/" feedTitle "/" title " - " dateSec
    fnameEnc = fname
    gsub(/\$/, "\\$", fnameEnc)
    gsub(/"/, "\\\"", fnameEnc)
    if (!description) {
        description = summary
    }
    if (link != "") {
        description = description "\nLink: " link
    } else if (guid != "") {
        description = description "\nLink: " guid
    }

    write = system(binPath "/stat \"" fnameEnc "\" -c \"%u\" 1> /dev/null 2>&1")
    if (write == 1) {
        print description > fname
        system(binPath "/touch \"" fnameEnc "\" -d \"" date "\"")
        written = written + 1
    } else {
        existed = existed + 1
    }
}

{
    if (indescr == 1) {
        description = description "\n" $0
    }
}

/<title[ >]/ {
    if (!feedTitle) {
        feedTitle = cleanup($0)
        system("mkdir -p '" outDir "/" feedTitle "'")
    } else {
        title = cleanup($0)
    }
}
/<item[ >]|<entry[ >]/ {
    title = ""
    description = ""
    summary = ""
    date = ""
    link = ""
    guid = ""
}
/<description[ >]|<content[ >]/ {
    indescr = 1
    description = cleanup($0)
}
/<summary[ >]/ {
    insummary = 1
    summary = cleanup($0)
}
/<link[ >]/ {
    link = cleanup($0)
    if (index(link, "magnet") == 1) {
        link = ""
    }
}
/<guid[ >]|<id[ >]/ {
    guid = cleanup($0)
}
/<pubDate[ >]|<published[ >]|<dc:date[ >]/ {
    date = cleanup($0)
    binPath "/date -d '" date "' +'%s'" | getline dateSec
}

END {
    print "Done : " feedTitle " (new: " written "/old: " existed ")"
    if (existed == 0 && written != 0) {
        print "=== NOTE === : All entries are new. You may want to check this feed more often"
    }
}

该文件rss-parser.cfg由如下行组成

http://thatonesite.com/feed-with-nice-title
http://thatothersite.com/feed=Specific feed title is better

rss-parser.sh

#!/bin/sh

parse() {
    url="$1"
    name="$2"
    echo "Start: $name ($url)"
    wget -q -U "$userAgent" -O rss-news.xml $url
    sed -e 's/
//g' -e 's/<\([a-z/]\)/\n<\1/g' rss-news.xml | \
        $binPath/awk \
            -v "feedTitle=$name" \
            -v "binPath=$binPath" \
            -v "outDir=./data" \
            -f rss-parser.awk | \
        more
}

userAgent="Opera/9.80 (Windows NT 6.1; Win64; x64; U; de) Presto/2.10.289 Version/12.01"
binPath=/bin
cd `dirname $0`
if [ -z "$1" ]; then
    echo "================= `$binPath/date +"%d.%m.%Y %H:%M"`"
    start=`$binPath/date +%s`
    while read line
    do
        if [ ! -z "$line" -a "${line:0:1}" != "#" ]; then
            url=${line%%=*}
            name=${line#*=}
            if [ "$name" == "$url" ]; then
                name=""
            fi
            parse "$url" "$name"
        fi
    done < "rss-parser.cfg"
    echo "================= Took `$binPath/date -d "$start seconds ago" +%s` seconds"
else
    parse "$1" "$2"
fi

如果没有参数,它将解析rss-parser.cfg。否则参数是

./rss-parser.sh [<URL> [<feed name>]]

我希望它对任何人都有用......

相关内容