如何让视频和音频的时间码从零开始？

2024-6-18 • tag-icon

这个问题可能需要了解 matroska 格式。当我将视频和音频混合在一起时，音频数据包以时间码 0 开始，视频以时间码 7 开始。30fps 的视频应该以时间码 0 开始，还是以时间码 33 开始（对于时间 0.033，时间码以毫秒为单位）？我可以让音频和视频都以时间码 0 开始吗？

您不必阅读下面的内容，其余内容只是附加信息。

我尝试添加过滤器[v]setpts=N/(30*TB)[v];[a]asetpts=N/SR/TB[a]作为最后两个过滤器，但没有任何改变。

这是我使用的完整命令。它只是对视频进行两遍 vp8 编码，对音频进行一遍 opus 编码。它还会将视频分成几个 3 秒长的片段，然后将它们连接在一起。

ffmpeg -threads 1 -i 480P_600K_71149981.mp4 -force_key_frames 00:00:03.000 -filter_complex [0:v]scale=320x180:force_original_aspect_ratio=decrease,fps=30[vid];[vid]split[vid][vid_copy];[vid_copy]trim=start=0:duration=3,setpts=PTS-STARTPTS[c0v];[0:a]atrim=start=0:duration=3,asetpts=PTS-STARTPTS[c0a];[vid]split[vid][vid_copy];[vid_copy]trim=start=90:duration=3,setpts=PTS-STARTPTS[c1v];[0:a]atrim=start=90:duration=3,asetpts=PTS-STARTPTS[c1a];[vid]split[vid][vid_copy];[vid_copy]trim=start=180:duration=3,setpts=PTS-STARTPTS[c2v];[0:a]atrim=start=180:duration=3,asetpts=PTS-STARTPTS[c2a];[vid]trim=start=270:duration=3,setpts=PTS-STARTPTS[c3v];[0:a]atrim=start=270:duration=3,asetpts=PTS-STARTPTS[c3a];[c0v][c0a][c1v][c1a][c2v][c2a][c3v][c3a]concat=n=4:v=1:a=1[v][a];[v]setpts=N/(30*TB)[v];[a]asetpts=N/SR/TB[a] -map [v] -map [a] -c:v vp8 -b:v 200k -crf 54 -profile:v 1 -an -pass 1 -passlogfile jump_passlogfile -f null NUL


ffmpeg -threads 1 -i 480P_600K_71149981.mp4 -force_key_frames 00:00:03.000 -filter_complex [0:v]scale=320x180:force_original_aspect_ratio=decrease,fps=30[vid];[vid]split[vid][vid_copy];[vid_copy]trim=start=0:duration=3,setpts=PTS-STARTPTS[c0v];[0:a]atrim=start=0:duration=3,asetpts=PTS-STARTPTS[c0a];[vid]split[vid][vid_copy];[vid_copy]trim=start=90:duration=3,setpts=PTS-STARTPTS[c1v];[0:a]atrim=start=90:duration=3,asetpts=PTS-STARTPTS[c1a];[vid]split[vid][vid_copy];[vid_copy]trim=start=180:duration=3,setpts=PTS-STARTPTS[c2v];[0:a]atrim=start=180:duration=3,asetpts=PTS-STARTPTS[c2a];[vid]trim=start=270:duration=3,setpts=PTS-STARTPTS[c3v];[0:a]atrim=start=270:duration=3,asetpts=PTS-STARTPTS[c3a];[c0v][c0a][c1v][c1a][c2v][c2a][c3v][c3a]concat=n=4:v=1:a=1[v][a];[v]setpts=N/(30*TB)[v];[a]asetpts=N/SR/TB[a] -map [v] -map [a] -c:v vp8 -b:v 200k -crf 54 -profile:v 1 -c:a libopus -b:a 32k -vbr on -compression_level 7 -ac 1 -ar 48000 -pass 2 -passlogfile jump_passlogfile -f webm -reserve_index_space 512 480P_600K_71149981_vthumb.webm

这里是480P_600K_71149981.mp4我正在测试这个文件。

结果如下，480P_600K_71149981_vthumb.webm和它的文本转储使用 libwebm 创建webm_parser_demo.exe（略作修改以显示十六进制而不是十进制）。文本转储中有趣的部分是搜索“timecode:”和“Cluster”。

  Cluster                            header: [309, 311)  body: [311, 3487)
    Timecode: 0
    SimpleBlock                      header: [31a, 31c)  body: [31c, 323)
      track number: 2
      frames: 1
      timecode: 0
      lacing: 0 (none)
      flags: visible, key frame
      frame byte range: [320, 323)
    SimpleBlock                      header: [323, 325)  body: [325, 395)
      track number: 1
      frames: 1
      timecode: 7
      lacing: 0 (none)
      flags: visible, key frame
      frame byte range: [329, 395)
    SimpleBlock                      header: [395, 397)  body: [397, 39e)
      track number: 2
      frames: 1
      timecode: 15
      lacing: 0 (none)
      flags: visible, key frame
      frame byte range: [39b, 39e)
    SimpleBlock                      header: [39e, 3a0)  body: [3a0, 3bb)
      track number: 1
      frames: 1
      timecode: 28
      lacing: 0 (none)
      flags: visible
      frame byte range: [3a4, 3bb)
    SimpleBlock                      header: [3bb, 3bd)  body: [3bd, 3c4)
      track number: 2
      frames: 1
      timecode: 29
      lacing: 0 (none)
      flags: visible, key frame
      frame byte range: [3c1, 3c4)

源视频和目标视频的 fps 不同，这可能很重要。有setpts=PTS-STARTPTS过滤器，但奇怪的是它们不会影响起始演示时间戳。

这对我来说很重要，以确保-force_key_frames在正确的时间创建关键帧，稍后我将使用时间码在文件中仅留下 1 个提示。我只是想完善它，趁有机会时尝试学习视频格式。我稍后会为这些制作一个视频播放器，需要确保这些文件尽可能易于阅读和渲染。

如果只有视频或只有音频，时间码将正确从 0 开始。我使用此命令进行了测试：

ffmpeg -i 480P_600K_71149981.mp4 -t 3 -an -reserve_index_space 512 out.webm

另外，我想知道是否有可能让视频帧先于音频帧，这似乎是一种更自然的存储方式，尽管我对此并不十分确定。一旦解决了时间戳问题，这个问题可能就会自行解决。

如果解决方案需要更改 ffmpeg 中的任何内容并重新编译，我同意。只需先找到要更改的内容即可。

不过，我不知道我是否应该在视频之前保留音频，并使用-audio_preload。因为要搜索 opus 音频，无论如何我都必须重播 SeekPreRoll 毫秒（在我的情况下是 80 毫秒）。在这种情况下我并不关心这一点，我并不打算在 12 秒的视频中搜索，我只需要快速抓取视频第 3 秒的关键帧作为缩略图。是的，-audio_preload这里不是。此外，这里还有更多关于 matroska 中的 opus muxing 的一些令人困惑的信息，在这种情况下可能没用：https://wiki.xiph.org/MatroskaOpus。

等一下，我想我已经弄清楚了为什么视频会延迟 7 毫秒。因为在 opus 中，您必须播放音频的 CodecDelay（这里是 6.5 毫秒）并将其丢弃，然后才能听到任何音频，只是为了初始化解码器。嗯，现在我想知道我是否可以将这个丢弃的音频设置为负时间戳。https://www.matroska.org/technical/specs/index.html#simpleblock_structure这里的时间码是一个有符号的 int16，所以我可能可以。但是如何告诉 ffmpeg 这一点……

编辑1：这是 ffprobe 的输出：

ffprobe 480P_600K_71149981_vthumb.webm -show_packets -select_streams v -read_intervals %+#5 -v 0

[PACKET]
codec_type=video
stream_index=0
pts=7
pts_time=0.007000
dts=7
dts_time=0.007000
duration=33
duration_time=0.033000
convergence_duration=N/A
convergence_duration_time=N/A
size=108
pos=897
flags=K_
[/PACKET]
[PACKET]
codec_type=video
stream_index=0
pts=40
pts_time=0.040000
dts=40
dts_time=0.040000
duration=33
duration_time=0.033000
convergence_duration=N/A
convergence_duration_time=N/A
size=23
pos=1020
flags=__
[/PACKET]
[PACKET]
codec_type=video
stream_index=0
pts=74
pts_time=0.074000
dts=74
dts_time=0.074000
duration=33
duration_time=0.033000
convergence_duration=N/A
convergence_duration_time=N/A
size=23
pos=1067
flags=__
[/PACKET]
[PACKET]
codec_type=video
stream_index=0
pts=107
pts_time=0.107000
dts=107
dts_time=0.107000
duration=33
duration_time=0.033000
convergence_duration=N/A
convergence_duration_time=N/A
size=23
pos=1114
flags=__
[/PACKET]
[PACKET]
codec_type=video
stream_index=0
pts=140
pts_time=0.140000
dts=140
dts_time=0.140000
duration=33
duration_time=0.033000
convergence_duration=N/A
convergence_duration_time=N/A
size=23
pos=1152
flags=__
[/PACKET]

ffprobe 480P_600K_71149981_vthumb.webm -show_packets -select_streams a -read_intervals %+#5 -v 0

[PACKET]
codec_type=audio
stream_index=1
pts=-7
pts_time=-0.007000
dts=-7
dts_time=-0.007000
duration=20
duration_time=0.020000
convergence_duration=N/A
convergence_duration_time=N/A
size=3
pos=888
flags=K_
[/PACKET]
[PACKET]
codec_type=audio
stream_index=1
pts=14
pts_time=0.014000
dts=14
dts_time=0.014000
duration=20
duration_time=0.020000
convergence_duration=N/A
convergence_duration_time=N/A
size=3
pos=1011
flags=K_
[/PACKET]
[PACKET]
codec_type=audio
stream_index=1
pts=34
pts_time=0.034000
dts=34
dts_time=0.034000
duration=20
duration_time=0.020000
convergence_duration=N/A
convergence_duration_time=N/A
size=3
pos=1049
flags=K_
[/PACKET]
[PACKET]
codec_type=audio
stream_index=1
pts=54
pts_time=0.054000
dts=54
dts_time=0.054000
duration=20
duration_time=0.020000
convergence_duration=N/A
convergence_duration_time=N/A
size=3
pos=1058
flags=K_
[/PACKET]
[PACKET]
codec_type=audio
stream_index=1
pts=74
pts_time=0.074000
dts=74
dts_time=0.074000
duration=20
duration_time=0.020000
convergence_duration=N/A
convergence_duration_time=N/A
size=3
pos=1096
flags=K_
[/PACKET]

真的很奇怪，视频的第一个时间戳是 7，而音频的第一个时间戳是 -7。

在 matroska 中（用检查webm_parser_demo.exe），视频的时间戳为 7，音频的时间戳为 0。

我的目标是使第一个视频时间戳为 0，第一个音频时间戳为 -7（所有 CodecDelay 数据）。

编辑2：研究了一些带有vorbis音频的 webm，它们也不是从 0 开始的。在它们中，音频从 0 开始，视频从 3 开始。不知道这是怎么回事，它也是 CodecDelay 吗？

相关内容