我有一个 .MOV 文件,我想将其声音标准化。但是,当我运行以下命令时,我发现整个视频的音量都在变化(从单声道到立体声,并且 max_volume 级别各不相同):
ffmpeg -i video.avi -af "volumedetect" -f null /dev/null
我想规范化整个文件,但不知道怎么做。输出如下:
ffmpeg version N-50911-g9efcfbe Copyright (c) 2000-2013 the FFmpeg developers built on Mar 13 2013
21:26:48 with gcc 4.7.2 (GCC) configuration: --enable-gpl
--enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfi g --enable-frei0r --enable-gnutls --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libg sm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena ble-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-lib twolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxa vs --enable-libxvid --enable-zlib libavutil 52. 19.100 / 52. 19.100 libavcodec 55. 0.100 / 55. 0.100 libavformat 55. 0.100 / 55. 0.100 libavdevice 54. 4.100 / 54. 4.100 libavfilter 3. 45.103 /
3. 45.103 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 [mov,mp4,m4a,3gp,3g2,mj2 @ 020d9920] multiple edit list entries, a/v
desync might occur, patch welcome Input #0, mov,mp4,m4a,3gp,3g2,mj2,
from '.\IMG_2783.mov': Metadata:
major_brand : qt
minor_version : 0
compatible_brands: qt
creation_time : 2014-08-20 02:36:31 Duration: 00:06:06.43, start: 0.000000, bitrate: 10206 kb/s
Stream #0:0(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 78 kb/s
Metadata:
creation_time : 2014-08-20 02:36:32
handler_name : Core Media Data Handler
Stream #0:1(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720, 10116 kb/s, 30 fps, 30 tbr, 600 tbn, 12 00 tbc
Metadata:
creation_time : 2014-08-20 02:36:32
handler_name : Core Media Data Handler Output #0, null, to '/dev/nullclear': Metadata:
major_brand : qt
minor_version : 0
compatible_brands: qt
encoder : Lavf55.0.100
Stream #0:0(und): Video: rawvideo (I420 / 0x30323449), yuv420p, 1280x720, q=2-31, 200 kb/s, 90k tbn, 30 tbc
Metadata:
creation_time : 2014-08-20 02:36:32
handler_name : Core Media Data Handler
Stream #0:1(und): Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Metadata:
creation_time : 2014-08-20 02:36:32
handler_name : Core Media Data Handler Stream mapping: Stream #0:1 -> #0:0 (h264 -> rawvideo) Stream #0:0 -> #0:1 (aac ->
pcm_s16le) Press [q] to stop, [?] for help [null @ 03e023a0] Encoder
did not produce proper pts, making some up. Input stream #0:0 frame
changed from rate:44100 fmt:fltp ch:2 chl:stereo to rate:44100
fmt:fltp ch:1 chl:mono [Parsed_volumedetect_0 @ 03e02ea0] n_samples:
3346432 [Parsed_volumedetect_0 @ 03e02ea0] mean_volume: -36.2 dB
[Parsed_volumedetect_0 @ 03e02ea0] max_volume: -0.0 dB
[Parsed_volumedetect_0 @ 03e02ea0] histogram_0db: 8
[Parsed_volumedetect_0 @ 03e02ea0] histogram_1db: 18
[Parsed_volumedetect_0 @ 03e02ea0] histogram_2db: 38
[Parsed_volumedetect_0 @ 03e02ea0] histogram_3db: 58
[Parsed_volumedetect_0 @ 03e02ea0] histogram_4db: 94
[Parsed_volumedetect_0 @ 03e02ea0] histogram_5db: 58
[Parsed_volumedetect_0 @ 03e02ea0] histogram_6db: 80
[Parsed_volumedetect_0 @ 03e02ea0] histogram_7db: 196
[Parsed_volumedetect_0 @ 03e02ea0] histogram_8db: 222
[Parsed_volumedetect_0 @ 03e02ea0] histogram_9db: 236
[Parsed_volumedetect_0 @ 03e02ea0] histogram_10db: 404
[Parsed_volumedetect_0 @ 03e02ea0] histogram_11db: 522
[Parsed_volumedetect_0 @ 03e02ea0] histogram_12db: 766
[Parsed_volumedetect_0 @ 03e02ea0] histogram_13db: 738 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:1 chl:mono to rate:44100 fmt:fltp ch:2 chl:stereo [Parsed_volumedetect_0 @ 03834340]
n_samples: 626688 [Parsed_volumedetect_0 @ 03834340] mean_volume:
-39.6 dB [Parsed_volumedetect_0 @ 03834340] max_volume: -15.5 dB [Parsed_volumedetect_0 @ 03834340] histogram_15db: 17
[Parsed_volumedetect_0 @ 03834340] histogram_16db: 19
[Parsed_volumedetect_0 @ 03834340] histogram_17db: 25
[Parsed_volumedetect_0 @ 03834340] histogram_18db: 41
[Parsed_volumedetect_0 @ 03834340] histogram_19db: 88
[Parsed_volumedetect_0 @ 03834340] histogram_20db: 231
[Parsed_volumedetect_0 @ 03834340] histogram_21db: 439 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:2 chl:stereo to rate:44100 fmt:fltp ch:1 chl:mono [Parsed_volumedetect_0 @ 08f34b00]
n_samples: 192512 [Parsed_volumedetect_0 @ 08f34b00] mean_volume:
-19.7 dB [Parsed_volumedetect_0 @ 08f34b00] max_volume: 0.0 dB [Parsed_volumedetect_0 @ 08f34b00] histogram_0db: 2048 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:1 chl:mono to rate:44100 fmt:fltp ch:2 chl:stereo [Parsed_volumedetect_0 @ 08f34aa0]
n_samples: 769024 [Parsed_volumedetect_0 @ 08f34aa0] mean_volume:
-28.2 dB [Parsed_volumedetect_0 @ 08f34aa0] max_volume: -0.1 dB [Parsed_volumedetect_0 @ 08f34aa0] histogram_0db: 15
[Parsed_volumedetect_0 @ 08f34aa0] histogram_1db: 29
[Parsed_volumedetect_0 @ 08f34aa0] histogram_2db: 72
[Parsed_volumedetect_0 @ 08f34aa0] histogram_3db: 102
[Parsed_volumedetect_0 @ 08f34aa0] histogram_4db: 158
[Parsed_volumedetect_0 @ 08f34aa0] histogram_5db: 163
[Parsed_volumedetect_0 @ 08f34aa0] histogram_6db: 299 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:2 chl:stereo to rate:44100 fmt:fltp ch:1 chl:mono [Parsed_volumedetect_0 @ 08f34800]
n_samples: 96256 [Parsed_volumedetect_0 @ 08f34800] mean_volume: -65.0
dB [Parsed_volumedetect_0 @ 08f34800] max_volume: -37.3 dB
[Parsed_volumedetect_0 @ 08f34800] histogram_37db: 2
[Parsed_volumedetect_0 @ 08f34800] histogram_38db: 0
[Parsed_volumedetect_0 @ 08f34800] histogram_39db: 2
[Parsed_volumedetect_0 @ 08f34800] histogram_40db: 10
[Parsed_volumedetect_0 @ 08f34800] histogram_41db: 8
[Parsed_volumedetect_0 @ 08f34800] histogram_42db: 4
[Parsed_volumedetect_0 @ 08f34800] histogram_43db: 14
[Parsed_volumedetect_0 @ 08f34800] histogram_44db: 12
[Parsed_volumedetect_0 @ 08f34800] histogram_45db: 12
[Parsed_volumedetect_0 @ 08f34800] histogram_46db: 20
[Parsed_volumedetect_0 @ 08f34800] histogram_47db: 16 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:1 chl:mono to rate:44100 fmt:fltp ch:2 chl:stereo [Parsed_volumedetect_0 @ 08f34800]
n_samples: 533504 [Parsed_volumedetect_0 @ 08f34800] mean_volume:
-43.9 dB [Parsed_volumedetect_0 @ 08f34800] max_volume: -23.4 dB [Parsed_volumedetect_0 @ 08f34800] histogram_23db: 47
[Parsed_volumedetect_0 @ 08f34800] histogram_24db: 453
[Parsed_volumedetect_0 @ 08f34800] histogram_25db: 685 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:2 chl:stereo to rate:44100 fmt:fltp ch:1 chl:mono [Parsed_volumedetect_0 @ 08f34980]
n_samples: 98304 [Parsed_volumedetect_0 @ 08f34980] mean_volume: -16.4
dB [Parsed_volumedetect_0 @ 08f34980] max_volume: -6.2 dB
[Parsed_volumedetect_0 @ 08f34980] histogram_6db: 28
[Parsed_volumedetect_0 @ 08f34980] histogram_7db: 290 frame=10993
fps=817 q=0.0 Lsize=N/A time=00:06:38.32 bitrate=N/A video:687kB
audio:68764kB subtitle:0 global headers:0kB muxing overhead
-100.000031% [Parsed_volumedetect_0 @ 08f34920] n_samples: 13807616 [Parsed_volumedetect_0 @ 08f34920] mean_volume: -28.3 dB
[Parsed_volumedetect_0 @ 08f34920] max_volume: 0.0 dB
[Parsed_volumedetect_0 @ 08f34920] histogram_0db: 188
[Parsed_volumedetect_0 @ 08f34920] histogram_1db: 532
[Parsed_volumedetect_0 @ 08f34920] histogram_2db: 1078
[Parsed_volumedetect_0 @ 08f34920] histogram_3db: 1416
[Parsed_volumedetect_0 @ 08f34920] histogram_4db: 1948
[Parsed_volumedetect_0 @ 08f34920] histogram_5db: 2805
[Parsed_volumedetect_0 @ 08f34920] histogram_6db: 4288
[Parsed_volumedetect_0 @ 08f34920] histogram_7db: 6068
答案1
作为一名读过一些音频的人:
您可能需要通过诸如 atrim、标准化(重放增益以找到峰值)之类的方法分割这些不同的部分,然后与应用了相同处理的其他部分重新拼接在一起来解决这个问题。
如果动态范围低得多,你可以考虑使用诸如 compand 之类的东西来降低动态顶端的动态级别,从而提高整个文件的级别。Compand 可能很有用后无论如何都要正常化。
答案2
我认为你的问题已经得到解答这里。
不过,您的文件似乎有两个立体声音频流……一个是 AAC,另一个是 WAV。选择 WAV 流作为重新编码的源。