有没有办法检测 PDF 是否为 Adob​​e Portfolio pdf?

有没有办法检测 PDF 是否为 Adob​​e Portfolio pdf?

我试图阻止用户上传这些类型的文件(实际上并不是不同的文件类型,因为它们在技术上仍然是 pdf)。

我尝试使用 pdfinfo

$ pdfinfo portfolio-sample.pdf 
Title:          Sample PDF Portfolio
Subject:        Adobe Acrobat XI
Keywords:       adobe, acrobat, xi, pdf, portfolio, sample
Creator:        Adobe Acrobat Pro 10.1.3
Producer:       Adobe Acrobat Pro 10.1.3
CreationDate:   Thu Jun 21 15:03:15 2012 EDT
ModDate:        Fri Sep 28 17:49:50 2012 EDT
Tagged:         yes
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      no
Page size:      504 x 360 pts
Page rot:       0
File size:      3600732 bytes
Optimized:      no
PDF version:    1.7

以及 exiftool

$ exiftool -a -G1 portfolio-sample.pdf 
[ExifTool]      ExifTool Version Number         : 10.80
[System]        File Name                       : portfolio-sample.pdf
[System]        Directory                       : .
[System]        File Size                       : 3.4 MB
[System]        File Modification Date/Time     : 2019:08:05 15:23:05-04:00
[System]        File Access Date/Time           : 2019:08:05 15:25:41-04:00
[System]        File Inode Change Date/Time     : 2019:08:05 15:23:10-04:00
[System]        File Permissions                : rw-rw-r--
[File]          File Type                       : PDF
[File]          File Type Extension             : pdf
[File]          MIME Type                       : application/pdf
[PDF]           PDF Version                     : 1.7
[PDF]           Linearized                      : No
[PDF]           Create Date                     : 2012:06:21 15:03:15-04:00
[PDF]           Creator                         : Adobe Acrobat Pro 10.1.3
[PDF]           Keywords                        : adobe, acrobat, xi, pdf, portfolio, sample
[PDF]           Modify Date                     : 2012:09:28 17:49:50-04:00
[PDF]           Producer                        : Adobe Acrobat Pro 10.1.3
[PDF]           Subject                         : Adobe Acrobat XI
[PDF]           Title                           : Sample PDF Portfolio
[PDF]           Language                        : en
[PDF]           Tagged PDF                      : Yes
[PDF]           Page Count                      : 1
[XMP-x]         XMP Toolkit                     : Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03
[XMP-xmp]       Modify Date                     : 2012:09:28 17:49:50-04:00
[XMP-xmp]       Create Date                     : 2012:06:21 15:03:15-04:00
[XMP-xmp]       Metadata Date                   : 2012:09:28 17:49:50-04:00
[XMP-xmp]       Creator Tool                    : Adobe Acrobat Pro 10.1.3
[XMP-dc]        Format                          : application/pdf
[XMP-dc]        Title                           : Sample PDF Portfolio
[XMP-dc]        Creator                         : 
[XMP-dc]        Description                     : Adobe Acrobat XI
[XMP-dc]        Subject                         : adobe, acrobat, xi, pdf, portfolio, sample
[XMP-xmpMM]     Document ID                     : uuid:2d7598db-3b0a-4510-bc0a-4ac1c570a3fa
[XMP-xmpMM]     Instance ID                     : uuid:153f73de-3b2a-4d04-ab31-bb46ec3a5b79
[XMP-pdf]       Producer                        : Adobe Acrobat Pro 10.1.3
[XMP-pdf]       Keywords                        : adobe, acrobat, xi, pdf, portfolio, sample

但输出不显示任何将 pdf 标记为 Adob​​e Portfolio pdf 的标志。

答案1

您可以使用 python 模块python-poppler

from poppler import load_from_file

pdf_document = load_from_file("portfolio-sample.pdf")

if pdf_document.has_embedded_files():
    print("PDF contains Adobe Portfolio attachments")
    
    

相关内容