Text-to-speech with Piper in Ubuntu 24 and Linux Mint 22

Piper is a text-to-speech engine based on local neural nets. At this time of writing, it produces significantly better quality speech than traditional local systems, such as Festival, Espeak, and Pico.

However, I couldn’t find instructions to integrate Piper with the speech dispatcher on Linux, so it would be used as the TTS engine in Firefox. After some research, I figured out the required steps. They are documented below.

Step 1: Installing Piper

Piper is distributed as a Python package that can be installed using the standard package manager pip. The preferred approach is to install it in a virtual environment, which keeps it separate it from other installed Python packages. I’ve installed it in the directory /opt:

mkdir /opt/piper-tts
cd /opt/piper-tts
python3 -m venv venv
source venv/bin/activate
pip install --require-virtualenv piper-tts

We then still need to install voices. We can list them first:

python3 -m piper.download_voices

We can install the ones we’re interested in in a separate directory voices:

mkdir voices
cd voices
python3 -m piper.download_voices en_US-ryan-medium
python3 -m piper.download_voices en_US-john-medium
python3 -m piper.download_voices en_US-amy-medium

We can install different voices, for different languages. In its simplest form, Piper runs as a command that reads a snippet of text aloud.

Testing Piper

We can already test the Piper command:

/opt/piper-tts/venv/bin/piper \
  -m en_US-ryan-medium \
  --data-dir /opt/piper-tts/voices \
  'This is a test.'

You should hear the speech through your speakers. Great first step!

Step 2: Installing the Piper web server

Piper needs some time to start up, due to the size of its neural net. If you plan to have it synthesize a lot of speech, it’s recommended to install it as a web server. The server waits for text in the background, has it synthesized by the Piper engine, and sends the resulting audio stream to the requester. We can install the separate package:

pip install 'piper-tts[http]'

The web server returns the synthesized speech as a WAV stream. We’ll be playing it with paplay, a command to play audio streams on the standard PulseAudio sound server. If you don’t have it, you can install it with apt:

sudo apt install pulseaudio-utils

Testing the Piper web server

We can now start the Piper web server, listening on the default port 5000. We add the debug option to get some logging output in the terminal:

/opt/piper-tts/venv/bin/python3 \
  -m piper.http_server \
  -m en_US-ryan-medium \
  --data-dir /opt/piper-tts/voices \
  --debug

We can then check that the server also works, by sending a speech command from a different terminal window. Using curl to send the HTTP POST request with a short JSON payload:

curl -X POST \
  -H 'Content-Type: application/json' \
  -d '{ "voice": "en_US-ryan-medium", "length_scale": "1.0", "text": "This is a test." }' \
  -o - \
  localhost:5000 \
| paplay

You should hear the same speech as before, this time coming from the web server. Progress!

Step 3: Setting up the speech dispatcher

The speech dispatcher is the standard shared interface for programs to synthesize speech. For example, Firefox calls the speech dispatcher when it reads web pages aloud.

The speech dispatcher supports various speech synthesizers. For Piper, we need to install a new configuration file piper-generic.conf:

sudo cp piper-generic.conf /etc/speech-dispatcher/modules

You should still edit this file to match the voices and languages that you have downloaded.

We then need to edit the main configuration file /etc/speech-dispatcher/speechd.conf of the speech dispatcher to use this Piper configuration file, by adding these lines:

AddModule "piper" "sd_generic" "piper-generic.conf"
DefaultModule "piper"

This setup lets the speech dispatcher forward requests to the Piper web server.

Testing the speech dispatcher

The speech dispatcher is managed by systemd. We can restart it with the new configuration:

systemctl --user restart speech-dispatcher

Assuming the Piper web server is still running from our previous test, we can now send our speech through the speech dispatcher, to the Piper web server, to Piper:

spd-say -y en_US-ryan-medium -r 0 'This is a test.'

You should again hear the same speech, this time through the speech dispatcher and the web server.

Debugging problems with the speech dispatcher

Should anything fail, the systemd logs may contain more details:

journalctl -xeu speech-dispatcher.service
systemctl --user status speech-dispatcher.service

If that isn’t sufficient to resolve the problem, you can temporarily run the speech dispatcher in debug mode from the command line:

systemctl --user stop speech-dispatcher
systemctl --user disable speech-dispatcher
rm -fr /tmp/speechd-debug
/usr/bin/speech-dispatcher -s -D -t 0

The following log files then contain details about the speech commands:

After resolving any problems, you can let systemd manage the speech dispatcher again:

killall -9 speech-dispatcher
systemctl --user enable speech-dispatcher
systemctl --user start speech-dispatcher

Step 4: Setting up the Piper web server in systemd

So far, we’ve started the Piper web server manually. The proper approach is to have systemd start it automatically. We need to install two configuration files: piper-tts.service and piper-tts.socket:

sudo cp piper-tts.{service,socket} /etc/systemd/user

systemd will then start the server when speech synthesis is invoked for the first time.

Testing the Piper web server in systemd

Make sure the Piper web server isn’t running anymore, since systemd will be managing it now.

We can let systemd verify the configuration files:

sudo systemd-analyze verify /etc/systemd/user/piper-tts.{service,socket}

We can then enable and start the service:

systemctl --user daemon-reload
systemctl --user enable piper-tts
systemctl --user start piper-tts

We can finally test the chain of the speech dispatcher (managed by systemd) to the Piper web server (also managed by systemd) to Piper:

spd-say -y en_US-ryan-medium -r 0 'This is a test.'

At this point, the speech dispatcher will delegate to Piper for all text-to-speech conversion.


Copyright © 2026 Eric Lafortune.