脚本中的文本转语音

Question 1

我的建议是消除 wine 并使用 Linux pico2wave 程序。

在 Ubuntu 14.04 中 pico2wave 是 libttspico-utils 的一部分

该命令看起来像：

pico2wave --wave=test.wav "$(cat filename.txt)"

Answer

我的建议是消除 wine 并使用 Linux pico2wave 程序。

在 Ubuntu 14.04 中 pico2wave 是 libttspico-utils 的一部分

该命令看起来像：

pico2wave --wave=test.wav "$(cat filename.txt)"

Question 2

如果您说您拥有的命令有效，您只需要自动化它，那么您有几个选择：

如果您有要转换的文件目录，可以执行以下操作：


#!/bin/bash
ARG=$1

for i in "${ARG}"/* ;
do
wine ttsUtil.exe "${i}" -voice=nick -output="${i}".wav
done

将其另存为文件（也许将其称为 ttsconvert.sh）并使其可执行：

chmod +x ttsconvert.sh

现在，您可以通过向脚本提供要转换的文件目录的路径来运行该脚本：

./ttsconvert.sh ~/path/to/stash/of/files

如果您希望逐个文件地使用它，您可以使用 .desktop 文件创建您自己的启动器。

例如，创建一个名为 ttsconvert.desktop 的文件：


[Desktop Entry]
Version=0.1
Name=TTSConvert
Exec=wine ttsUtil.exe %U -voice=nick -output=speech.wav
Icon=multimedia-volume-control
MimeType=text/plain;

作为 root，将此文件放置到 /usr/share/applications，然后您应该能够使用打开用右键单击菜单中的选项可使用新转换器打开文本文件。它不会通知您它正在工作；可以编写更高级的脚本来使用 GUI 通知，但这还不是实现所有这些的地方。

Answer

如果您说您拥有的命令有效，您只需要自动化它，那么您有几个选择：

如果您有要转换的文件目录，可以执行以下操作：


#!/bin/bash
ARG=$1

for i in "${ARG}"/* ;
do
wine ttsUtil.exe "${i}" -voice=nick -output="${i}".wav
done

将其另存为文件（也许将其称为 ttsconvert.sh）并使其可执行：

chmod +x ttsconvert.sh

现在，您可以通过向脚本提供要转换的文件目录的路径来运行该脚本：

./ttsconvert.sh ~/path/to/stash/of/files

如果您希望逐个文件地使用它，您可以使用 .desktop 文件创建您自己的启动器。

例如，创建一个名为 ttsconvert.desktop 的文件：


[Desktop Entry]
Version=0.1
Name=TTSConvert
Exec=wine ttsUtil.exe %U -voice=nick -output=speech.wav
Icon=multimedia-volume-control
MimeType=text/plain;

作为 root，将此文件放置到 /usr/share/applications，然后您应该能够使用打开用右键单击菜单中的选项可使用新转换器打开文本文件。它不会通知您它正在工作；可以编写更高级的脚本来使用 GUI 通知，但这还不是实现所有这些的地方。

Question 3

我会尝试使用 SAPI5 命令行实用程序，例如：http://www.nirsoft.net/articles/speak_from_command_line.html

也尝试一下：http://jampal.sourceforge.net/ptts.html

Answer

我会尝试使用 SAPI5 命令行实用程序，例如：http://www.nirsoft.net/articles/speak_from_command_line.html

也尝试一下：http://jampal.sourceforge.net/ptts.html

Question 4

这是一个拼凑，但我让它可靠地工作。它在 Xephyr 窗口中编写 TTSApp.exe 脚本并模拟鼠标和键盘输入。

安装 Ubuntu 软件包： xserver-xephyr metacity xdotool libav-tools

如果您想要默认语速以外的其他内容，请将其添加到每个文本文件的开头： <prosody rate="medium"><prosody rate="+36%">并在末尾添加相应的结束语：（ </prosody></prosody>在第 8-9 章中还有更多 XML 选项AT&T 自然语音系统开发人员指南）。

i将脚本中的替换key --delay 100 i为 TTSApp.exe 中所需的密钥以选择您喜欢的语音。

如果要在成功转换后删除源文件，请取消注释脚本末尾的 unlink()。

通过以下方式运行此脚本：

find . -name 'chapter*.txt' -print0 |xargs -0 txt2ogg

这是 Perl 脚本txt2ogg（不要忘记chmod +x）：

#!/usr/bin/perl -CS -w
#
use strict;
use warnings;
use utf8;
#

my $homeDir=$ENV{HOME};
$homeDir .= '/' if(substr($homeDir,length($homeDir)-1,1) ne '/');

my $oldDir = `pwd`;
chomp($oldDir);
$oldDir .= '/' if(substr($oldDir,length($oldDir)-1,1) ne '/');
chdir($homeDir) or die($!);
system( q(Xephyr :4 -screen 600x480 >/dev/null 2>/dev/null &) ); # using the user's display works until you try to get other work done or the screensaver starts
system( q(DISPLAY=:4 metacity >/dev/null 2>/dev/null &) ); # xdotool needs a window manager
foreach(@ARGV) {
  s|^\./||;
  my $thisArg = $_;
  my $ttsIn = $oldDir.$thisArg; # make path absolute
  (my $ttsOut = $ttsIn) =~ s|\.[^\./]*$||; # strip file extension
  $ttsOut .= '.ogg';
  my $attempt = 0;
  my $errorCodes = ""; # list of codes for recoverable errors
  my $closeDialogCmd = q(export DISPLAY=:4; xdotool search --name "File Saved" windowactivate --sync %@ key space 2>/dev/null);
  my $ExitCmd = q(export DISPLAY=:4; xdotool search --name "SAPI5 TTSAPP" windowactivate --sync %@ windowkill 2>/dev/null);
  while(1) {
    print("\r$thisArg ... ");
    unlink("ttsin");
    unlink("ttsout.wav");
    unlink("ttsout.ogg");
    symlink($ttsIn,"ttsin") or die($!);
    #xdotool is sometimes too fast, even with ''--delay 100'', so BackSpace makes sure the full name gets entered
    my $stallLimit = 10;
    my $seconds = 0;
    my $priorWavSize = 0;
    my $stalledTime = 0;
    my $wavSize = 0;
    #start TTSApp.exe in the background
    system( q(DISPLAY=:4 wine "C:\Program Files\eSpeak\TTSApp.exe" 2>/dev/null >/dev/null &) );
    #in TTSApp.exe, enable XML; select proper voice; open "ttsin"; and save as "ttsout.wav"
    system( q(export DISPLAY=:4; xdotool search --sync --name "SAPI5 TTSAPP" windowactivate --sync %@ mousemove --window %@ 36 339 click 1 mousemove --window %@ 426 233 click 1 key --delay 100 i mousemove --window %@ 500 37 click 1 key --delay 100 BackSpace BackSpace t t s i n Return mousemove --window %@ 500 288 click 1 key --delay 100 BackSpace BackSpace t t s o u t Return 2>/dev/null >/dev/null) );
    while(1) { # wait for "File Saved" dialog
      sleep(2);
      $seconds += 2;
      #check if "File Saved" dialog exists yet
      last if(system( q(export DISPLAY=:4; xdotool search --name "File Saved" >/dev/null) ) == 0);
      my $wavSizeCmd = q(stat --printf '%s' ttsout.wav 2>/dev/null);
      $wavSize = `$wavSizeCmd`;
      $wavSize = 0 if(!defined($wavSize) or length($wavSize) == 0);
      if($wavSize <= $priorWavSize) {
        $stalledTime += 2;
        if($stalledTime >= $stallLimit) {
          $errorCodes .= " 282"; # TTSApp.exe not responding
          if(system($ExitCmd) != 0) { # kill TTSApp.exe and try again
            $errorCodes .= " 443"; # TTSApp.exe still not responding
            sleep(2);
            system($ExitCmd);
          }
          last;
        }
      } else {
        $stalledTime = 0;
      }
      $priorWavSize = $wavSize;
      print("\r$thisArg ...$wavSize bytes");
    }
    if(($stalledTime < $stallLimit)) { # above loop not stalled
      if($wavSize == 11639) {
        $errorCodes .= " 639"; # size of .wav is exactly the size for "Enter text you whish spoken here" in the default voice
      } else {
        last; # success
      }
    }
    if($attempt++ >= 5) {
      die("unable to process file with TTSApp.exe");
    }
  }
  #close "File Saved" dialog and exit TTSApp.exe
  if(system($closeDialogCmd) != 0) {
    $errorCodes .= " 934"; # closing dialog failed
    sleep(2);
    if(system($closeDialogCmd) != 0) {
      $errorCodes .= " 818"; # closing dialog failed again
      sleep(2);
      system($closeDialogCmd);
    }
  }
  if(system($ExitCmd) != 0) {
    $errorCodes .= " 245"; # closing TTSApp.exe failed
    sleep(2);
    if(system($ExitCmd) != 0) {
      $errorCodes .= " 871"; # closing TTSApp.exe failed again
      sleep(2);
      system($ExitCmd);
    }
  }
  print("\r$thisArg ... converting to .ogg  ");
  #''-qscale 0'' (24Kbps) has noticable whisper-like overtones and ''1'' (30Kbps) and ''2'' (35Kbps) are quite close, so I decided on ''-qscale 1''
  system('cat ttsout.wav |avconv -i pipe:0 -codec:a libvorbis -qscale 1 ttsout.ogg 2>/dev/null >/dev/null') == 0 or die($!);
  unlink("ttsin");
  unlink("ttsout.wav");
  rename("ttsout.ogg",$ttsOut) or die($!);
  if(length($errorCodes) == 0) {
    print("\r$thisArg ... done                \n");
  } else {
    print("\r$thisArg ... done (recovered from: $errorCodes)            \n");
  }
  #unlink($ttsIn); # delete original only after .ogg is in place
}

Answer