使用ffmpeg录制麦克风+桌面音频+屏幕时桌面音频落后

使用ffmpeg录制麦克风+桌面音频+屏幕时桌面音频落后

我已经使用 ffmpeg 编写了用于录制麦克风、桌面音频和屏幕的脚本:

DATE=`which date`
RESO=2560x1440
FPS=30
PRESET=ultrafast
DIRECTORY=$HOME/Video/
FILENAME=videocast`$DATE +%d%m%Y_%H.%M.%S`.mkv

ffmpeg -y -vsync 1 \
-f pulse -ac 2 -i alsa_output.pci-0000_00_1b.0.analog-stereo.monitor \
-f pulse -ac 1 -ar 25000 -i alsa_input.usb-0d8c_C-Media_USB_Headphone_Set-00-Set.analog-mono \
-filter_complex aresample=async=1,amix=duration=shortest,apad \
-f x11grab -r $FPS -s $RESO -i :0.0 \
-acodec libvorbis \
-vcodec libx264 -pix_fmt yuv420p -preset $PRESET -threads 0 \
$DIRECTORY$FILENAME

一切都被记录下来,屏幕和麦克风声音之间没有任何问题,但桌面音频严重落后。

它开始时是同步的,但随着时间的推移,在播放过程中会变得更糟,在 ffplay 中也是如此。什么应用程序播放声音并不重要:浏览器中的 Youtube 视频、桌面声音和 Rhythmbox(播放几秒钟的歌曲然后停止、等待并重复)都会不同步。

终端输出抱怨

"ALSA lib pcm.c:7843:(snd_pcm_recover) overrun occurred22.73 bitrate=10384.5kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred" 

和类似但我不知道这意味着什么。

完整的终端输出在这里:

ffmpeg version 2.0.1 Copyright (c) 2000-2013 the FFmpeg developers
  built on Aug 11 2013 14:52:28 with gcc 4.8.1 (GCC) 20130725 (prerelease)
  configuration: --prefix=/usr --disable-debug --disable-static --enable-avresample --enable-dxva2 --enable-fontconfig --enable-gpl --enable-libass --enable-libbluray --enable-libfreetype --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-libv4l2 --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxvid --enable-pic --enable-postproc --enable-runtime-cpudetect --enable-shared --enable-swresample --enable-vdpau --enable-version3 --enable-x11grab
  libavutil      52. 38.100 / 52. 38.100
  libavcodec     55. 18.102 / 55. 18.102
  libavformat    55. 12.100 / 55. 12.100
  libavdevice    55.  3.100 / 55.  3.100
  libavfilter     3. 79.101 /  3. 79.101
  libavresample   1.  1.  0 /  1.  1.  0
  libswscale      2.  3.100 /  2.  3.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  3.100 / 52.  3.100
Guessed Channel Layout for  Input Stream #0.0 : stereo
Input #0, pulse, from 'alsa_output.pci-0000_00_1b.0.analog-stereo.monitor':
  Duration: N/A, start: 0.014093, bitrate: 1536 kb/s
    Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Guessed Channel Layout for  Input Stream #1.0 : mono
Input #1, pulse, from 'alsa_input.usb-0d8c_C-Media_USB_Headphone_Set-00-Set.analog-mono':
  Duration: N/A, start: 0.006172, bitrate: 400 kb/s
    Stream #1:0: Audio: pcm_s16le, 25000 Hz, mono, s16, 400 kb/s
[x11grab @ 0x218a6e0] device: :0.0 -> display: :0.0 x: 0 y: 0 width: 2560 height: 1440
[x11grab @ 0x218a6e0] shared memory extension found
Input #2, x11grab, from ':0.0':
  Duration: N/A, start: 1379021580.184321, bitrate: N/A
    Stream #2:0: Video: rawvideo (BGR[0] / 0x524742), bgr0, 2560x1440, -2147483 kb/s, 30 tbr, 1000k tbn, 30 tbc
[libx264 @ 0x21ae560] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
[libx264 @ 0x21ae560] profile Constrained Baseline, level 5.0
[libx264 @ 0x21ae560] 264 - core 133 r2339 585324f - H.264/MPEG-4 AVC codec - Copyleft 2003-2013 - http://www.videolan.org/x264.html - options: cabac=0 ref=1 deblock=0:0:0 analyse=0:0 me=dia subme=0 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=0 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=0 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=25 scenecut=0 intra_refresh=0 rc=crf mbtree=0 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=0
Output #0, matroska, to '/home/anders/Video/videocast12092013_23.33.00.mkv':
  Metadata:
    encoder         : Lavf55.12.100
    Stream #0:0: Audio: vorbis (libvorbis) (oV[0][0] / 0x566F), 25000 Hz, mono, fltp
    Stream #0:1: Video: h264 (libx264) (H264 / 0x34363248), yuv420p, 2560x1440, q=-1--1, 1k tbn, 30 tbc
Stream mapping:
  Stream #0:0 (pcm_s16le) -> aresample (graph 0)
  Stream #1:0 (pcm_s16le) -> amix:input1 (graph 0)
  amix (graph 0) -> Stream #0:0 (libvorbis)
  Stream #2:0 -> #0:1 (rawvideo -> libx264)
Press [q] to stop, [?] for help
ALSA lib pcm.c:7843:(snd_pcm_recover) overrun occurred22.73 bitrate=10384.5kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred3.22 bitrate=10423.3kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) overrun occurred25.25 bitrate=11011.0kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred5.76 bitrate=11013.7kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) overrun occurred27.25 bitrate=11175.4kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred7.76 bitrate=11168.7kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred8.24 bitrate=11176.4kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) overrun occurred55.48 bitrate=11243.8kbits/s    
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
frame=12871 fps= 30 q=-1.0 Lsize=  542369kB time=00:07:09.31 bitrate=10349.3kbits/s    
video:539762kB audio:2363kB subtitle:0 global headers:3kB muxing overhead 0.044476%
[libx264 @ 0x21ae560] frame I:52    Avg QP:15.46  size:725888
[libx264 @ 0x21ae560] frame P:12819 Avg QP:18.26  size: 40172
[libx264 @ 0x21ae560] mb I  I16..4: 100.0%  0.0%  0.0%
[libx264 @ 0x21ae560] mb P  I16..4:  2.6%  0.0%  0.0%  P16..4: 18.1%  0.0%  0.0%  0.0%  0.0%    skip:79.3%
[libx264 @ 0x21ae560] coded y,uvDC,uvAC intra: 57.8% 49.8% 25.3% inter: 8.9% 8.7% 2.2%
[libx264 @ 0x21ae560] i16 v,h,dc,p: 23% 29% 32% 16%
[libx264 @ 0x21ae560] i8c dc,h,v,p: 45% 28% 18%  9%
[libx264 @ 0x21ae560] kb/s:10306.26

请帮助我,我真的很接近让它工作了!

更新:当跳过filter_complex和麦克风时,桌面音频也会不同步,但量较小。使用copy代替也libvorbis不会改变任何东西。

答案1

不确定这是否会为您解决问题,但我有一个没有遇到问题的脚本。比较我们的两个脚本,我看到的唯一区别是:

  • filter_complex的只是合并
  • 我强制使用4个线程
  • 我的音频编解码器是 mp3lame

我认为音频编解码器的变化是最相关的差异。我认为一些音频编解码器以某种方式与视频交织在一起,因此它们不会不同步。不幸的是,我不是视频工程师,所以我不能那么肯定。

这是我的脚本:

#!/usr/bin/bash

# video information
INRES="1920x1080"
OUTRES="1280x720"
FPS="24"
QUAL="fast"
FILE_OUT="$1"

#audio information
PULSE_IN="alsa_input.pci-0000_00_1b.0.analog-stereo"
PULSE_OUT="alsa_output.pci-0000_00_1b.0.analog-stereo.monitor"

ffmpeg -f x11grab -s "$INRES" -r "$FPS" -i :0.0 \
    -f pulse -i "$PULSE_IN" -f pulse -i "$PULSE_OUT" \
    -filter_complex amerge \
    -vcodec libx264 -crf 30 -preset "$QUAL" -s "$OUTRES" \
    -acodec libmp3lame -ab 96k -ar 44100 -threads 4 -pix_fmt yuv420p \
    -f flv "$FILE_OUT"

答案2

可能发生的情况是桌面声音以错误的采样率捕获。如果脉冲音频错误报告采样率

按照链接答案中的建议修改配置文件后(取消注释default-sample-rate设置/etc/pulse/daemon.conf并将其设置为正确的值,可能48000),您必须重新启动用户的脉冲音频守护进程和:

pulseaudio -k
pulseaudio -D

如果您通过扬声器播放一些音乐,那么采样率差异是显而易见的,因此它是从麦克风和脉冲音频监视器捕获的。不仅监听流会稍后开始,而且男高音也会转变为男中音。

相关内容