如何突出显示并提取 PDF 中带有链接的文本?

如何突出显示并提取 PDF 中带有链接的文本?

为了进行研究,我希望能够突出显示文本并将其自动复制为注释,并带有指向 PDF 文档中确切位置的链接。当 PDF 中的文本突出显示时,Sente 和 Skim PDF 会将片段复制到注释部分。但是,一旦粘贴到其他程序中,这些片段就没有链接了。Papers2 也允许提取注释,但同样没有链接。所有程序还会为每个注释添加不必要的标题和额外的元数据。

Automator 甚至无法从预览中正确提取注释。

最重要的条件是,我粘贴/提取的笔记必须有链接至之内PDF。什么程序/脚本可以让我做到这一点?

答案1

打开 AppleScript 编辑器并将此脚本保存为 /Applications/skimnoteopener.app:

on open location u
    set text item delimiters to {"=", "&"}
    do shell script "x=" & quoted form of text item 2 of u & ";printf \"${x//\\%/\\x}\""
    set f to POSIX file result
    set p to (text item 4 of u as integer)
    set s to (text item 6 of u as integer)
    set e to (text item 8 of u as integer)
    tell application "Skim"
        open f
        tell document 1
            set current page to page p
            set selection to characters s thru e of text of page p
        end tell
        activate
    end tell
end open location

然后运行defaults write /Applications/skimnoteopener.app/Contents/Info.plist CFBundleURLTypes '({CFBundleURLName=skimnoteopener;CFBundleURLSchemes=(skimnoteopener);})'。应用程序应立即注册为 URL 方案的默认处理程序。

然后您可以使用此脚本导出突出显示的注释:

do shell script "osascript -e 'tell application \"Skim\"
selection of (notes of document 1 where (its type is highlight note))
end'|tr , \\\\n|awk '{print $2,$4}'"
set ranges to paragraphs of result

set out to ""
tell application "Skim"
    set f to do shell script "ruby -e 'print ARGV[0].gsub(/[^A-Za-z0-9]/){\"%%%02X\"%$&.ord}' " & quoted form of POSIX path of (get file of document 1)
    set i to 1
    repeat with n in (notes of document 1 where (its type is highlight note))
        set {s, e} to words of item i of ranges
        set p to index of page of n
        set out to out & "<a href=skimnoteopener://?file=" & f & "&amp;page=" & p & "&amp;start=" & s & "&amp;end=" & e
        set out to out & ">" & p & "</a> " & my escapexml(text of n) & "<br>" & linefeed
        set i to i + 1
    end repeat
end tell

do shell script "printf %s " & quoted form of out & "|textutil -inputencoding UTF-8 -format html -convert rtf -stdin -stdout|LC_CTYPE=UTF-8 pbcopy"

on replace(input, search, replace)
    set text item delimiters to search
    set ti to text items of input
    set text item delimiters to replace
    ti as text
end replace

on escapexml(input)
    replace(replace(replace(input, "&", "&amp;"), "<", "&lt;"), ">", "&gt;")
end escapexml

该脚本将注释复制为富文本。您可以通过将其替换-stdout|LC_CTYPE=UTF-8 pbcopy为将注释保存为 rtf 文件-output /path/to/file.rtf

以下是另一个脚本,将 Skim 中选择的文本复制为链接:

tell application "Skim"
    set f to POSIX path of (get file of document 1)
    set p to index of current page of document 1
    set t to selection of document 1 as text
end tell
tell (do shell script "osascript -e 'tell app \"Skim\" to selection of document 1'")
    set s to word 2
    set e to word 4
end tell
do shell script "printf %s \"<a href=skimnoteopener://?file=$(ruby -e 'print ARGV[0].gsub(/[^A-Za-z0-9]/){\"%%%02X\"%$&.ord}' " & quoted form of f & ")&page=" & p & "&start=" & s & "&end=" & e & ">$(printf %s " & quoted form of t & "|sed 's/&/\\&amp;/g;s/</\\&lt;/g;s/>/\\&gt;/g')</a>\"|textutil -inputencoding UTF-8 -format html -convert rtf -stdin -stdout|LC_CTYPE=UTF-8 pbcopy"

相关内容