README.md (21396B)
1 # FrugalVox 2 3 A tiny VoIP IVR framework by hackers and for hackers. 4 5 ## Features 6 7 - Small and nimble: the kernel is a single Python 3 file (~270 SLOC), and the configuration is a single YAML file 8 - Hackable: the kernel is well-commented and not so big, all the actions are full-featured Python scripts 9 - Written with plain telephony in mind, supporting both out-of-band and in-band DTMF command detection, as well as DTMF audio clip generation 10 - Comes with PIN-based authentication and action access control out of the box (optional but recommended) 11 - Comes with TTS integration out of the box, configured for eSpeakNG by default (optional) 12 - Container-ready 13 - Released into public domain 14 15 ## Limitations 16 17 - Only UDP transport is supported for now (pyVoIP limitation) 18 - Only a single SIP server registration per instance is supported by design (if you need to receive incoming calls via multiple SIP accounts, you must spin up multiple FrugalVox instances) 19 - The format for top-level action commands is hardcoded (see the workflow section of this README) 20 21 ## Running FrugalVox as a host application 22 23 ### Dependencies 24 25 - Python 3.8 or higher (3.10 recommended) 26 - pyVoIP 1.6.4, patched according to [this comment](https://github.com/tayler6000/pyVoIP/issues/107#issuecomment-1440231926) (also available as a `.whl` file in this repo) 27 - NumPy (mandatory, required for DTMF detection and generation) 28 - eSpeakNG (optional but used by the default TTS engine configuration) 29 30 For Python-side dependencies, just run `pip install -r requirements.txt` from the project directory. eSpeakNG (or other TTS engine of your choice, see the FAQ section) must be installed separately with your host OS package manager. 31 32 ### Usage 33 34 Just run `python /path/to/fvx.py [your-config.yaml]`. Press Ctrl+C or otherwise terminate the process when you don't need it. 35 36 Make sure your `python` command is pointing to Python 3.8 or higher. 37 38 ## Running FrugalVox in Docker 39 40 The Docker image encapsulates all the dependencies (including Python 3.10, three different TTS engines (see the FAQ section) and the patched pyVoIP package) but requires you to provide all the configuration and action scripts in a volume mounted from the host. In addition to this, the configuration file itself must be called `config.yaml` since the container is only going to be looking for this name. 41 42 ### Building 43 44 From the source directory, run: `docker build -t frugalvox .` and the image called `frugalvox` will be built. 45 46 Alternatively, you can build the "slim" version of the image based on Alpine Linux, which will only contain the Pico TTS engine. For `x86_64` architecture, such an image will only weigh around 180 MiB. Do this using this command: `docker build -t frugalvox:slim -f Dockerfile.slim .` 47 48 ### Running from the command line 49 50 The command to run the `frugalvox` image locally is: 51 52 ``` 53 docker run -d --rm -v /path/to/configdir:/opt/config --name frugalvox frugalvox 54 ``` 55 56 Note that the `/path/to/configdir` must be absolute. Use `$(pwd)` command to get the current working directory if your configuration directory path is relative to it. 57 58 ### Running from Docker Compose 59 60 Add this to your `compose.yaml`, replacing `$PWD/example-config` with your configuration directory path: 61 62 ```yaml 63 services: 64 # ... other services here ... 65 frugalvox: 66 image: frugalvox:latest 67 container_name: frugalvox 68 restart: on-failure:10 69 volumes: 70 - "$PWD/example-config:/opt/config" 71 ``` 72 73 Then, on the next `docker compose up -d` run, the FrugalVox container should be up. Note that you can attach this service to any network you already created in Compose file, as long as it allows the containers to have Internet access. 74 75 ## Typical FrugalVox workflow 76 77 Without auth: 78 79 1. User calls the IVR. 80 2. User is prompted for the command. 81 3. User enters the DTMF command in the form `[action ID]*[param1]*[param2]*...#`. 82 4. The IVR checks the action ID. If it exists, the corresponding action script is run. If it doesn't, an error message is played back to the user. 83 5. User is prompted for the next command, and so on. 84 85 With auth (recommended): 86 87 1. User calls the IVR. 88 2. User is prompted for the PIN (internal user ID) followed by `#` key. 89 3. The IVR checks the PIN. If it doesn't exist in the user list, the caller is warned and the call is terminated. Otherwise, go to the next step. 90 4. User is prompted for the command. 91 5. User enters the DTMF command in the form `[action ID]*[param1]*[param2]*...#`. 92 6. The IVR checks the action ID. If it exists in the list of actions allowed for the user, the corresponding action script is run. If it doesn't, an error message is played back to the user. 93 7. User is prompted for the next command, and so on. 94 95 ## Configuration 96 97 All FrugalVox configuration is done in a single YAML file that's passed to the kernel on the start. If no file is passed, FrugalVox will look for `config.yaml` file in the current working directory. 98 99 The entire config is done in four different sections of the YAML file: `sip`, `tts`, `clips` and `ivr`. 100 101 ### SIP client configuration: `sip` 102 103 This section lets you configure which SIP server FrugalVox will connect to in order to start receiving incoming calls. The fields are: 104 105 - `sip.host`: your SIP provider hostname or IP address 106 - `sip.port`: your SIP provider port number (usually 5060) 107 - `sip.transport`: optional, not used for now, added here for the forward compatibility with future versions (the only supported transport is now `udp`) 108 - `sip.username`: your SIP account auth username (only the username itself, no domain or URI parts) 109 - `sip.password`: your SIP account auth password 110 - `sip.rtpPortLow` and `sip.rtpPortHigh`: your UDP port range for RTP communication, usually the default of 10000 and 20000 is fine 111 112 All the fields in this section, except `transport`, are currently mandatory. If unsure about `rtpPortLow` and `rtpPortHigh`, just leave the values provided in the example config. 113 114 ### Text-to-speech engine settings: `tts` 115 116 This section allows you to configure your TTS engine, for FrugalVox to be able to generate audio clips from your text. The fields are: 117 118 - `tts.cmd`: the TTS synth command template, please just leave the default values there unless you want to switch to a different TTS engine other than eSpeakNG 119 - `tts.phrases`: a dictionary where every key is the clip name and the value is the phrase text to be rendered to that clip on the kernel start 120 121 ### Static audio clips list: `clips` 122 123 This section determines which static audio clips are to be loaded into memory in addition to the synthesized voice. The clips must be in the unsigned 8-bit 8KHz PCM WAV format. The fields are: 124 125 - `clips.dir`: path to the directory (relative to the configuration file directory) containing the audio clips to load 126 - `clips.files`: a dictionary mapping the clip names to the `.wav` file names in the `clips.dir` directory 127 128 Both fields are required to fill, but if you're not planning on using any static audio, just set `clips.dir` value to `'.'` and `clips.files` to `{}`. 129 130 When naming the clips, a single limitation holds: a clip must not be named `dtmf`. Because when the reference is passed to action scripts, the `clips.dtmf` object will hold the audio clips generated for all 16 DTMF digits. 131 132 ### IVR configuration: `ivr` 133 134 This section lets you set up PIN-based authentication, access control and, most importantly, IVR actions themselves and the scripts that implement them. 135 136 These two fields are mandatory to fill (although can be left as empty arrays): 137 138 - `ivr.cmdpromptclips`: an array with a sequence of the clip names to play back to prompt the caller for a command 139 - `ivr.cmdfailclips`: an array with a sequence of the clip names to play back to alert the caller about an invalid command 140 141 Note that any clip name in this section can refer to both static and synthesized voice audio clips, they all are populated at the same place at this point. 142 143 To turn the authentication part on and off, use the `ivr.auth` field. If and only if this field is set to `true`, the following fields are required: 144 145 - `ivr.authpromptclips`: an array with a sequence of the clip names to play back to prompt for the caller's PIN 146 - `ivr.authfailclips`: an array with a sequence of the clip names to play back to alert the caller about the invalid PIN before hanging up the call 147 - `ivr.users`: a dictionary that maps valid user IDs (PINs) to the lists (arrays) of action IDs they are authorized to run, or `'*'` string if the user is authorized to run all registered actions on this instance 148 149 Also, if the user is authenticated but not authorized to run a particular action, the same "invalid command" message sequence from `ivr.cmdfailclips` will be played back as if the command was not found. This is implemented by design as a security measure. FrugalVox itself will log a different message to the console though. 150 151 Finally, all the action mapping is done in the mandatory `ivr.actions` dictionary. The key is the action ID (without parameters, i.e. commands `22*5#` and `22*44*99#` both mean an action with ID 22, just with different parameter lists) and the value is the path to the action script file, relative to the configuration file directory. 152 153 ## Action scripts 154 155 An action script is a regular Python module file referenced in the `ivr.actions` section of the configuration YAML file. A single script may implement one or more actions based on the action ID. The module must implement the `run_action` call in order to work as an action script, as follows: 156 157 ``` 158 def run_action(action_id, params, call_obj, user_id, config, clips, calls): 159 ... 160 ``` 161 162 where: 163 164 - `action_id` is a string action identifier, 165 - `params` is an array of string parameters to the action, 166 - `call_obj` is an instance of `pyVoIP.VoIP.VoIPCall` class passed from the main FrugalVox call processing loop, 167 - `user_id` is the ID (PIN) string of the user running the action (if authentication is turned off, it's always `0000`), 168 - `config` is the dictionary containing the entire FrugalVox configuration object (as specified in the YAML), 169 - `clips` is the object containing all in-memory audio clips (in the unsigned 8-bit 8KHz PCM format) ready to be fed into the `write_audio` method of the `call_obj` (with `clips['dtmf']` being a dictionary with the pre-rendered DTMF digits), 170 - `calls` is a dictionary with all currently active calls on the instance (keyed with the `call_obj.call_id` value). 171 172 _Protip:_ if we use `*` as parameter separator and `#` as command terminator, why are `action_id`, `user_id` and all the action parameters still treated as strings as opposed to numbers? Because `A`, `B`, `C` and `D` still are valid DTMF digits and can be legitimately used in the actions or their parameters. Of course, if you target normal phone users, you should avoid using the "extended" digits, but there still is a possibility to do so. If you need to treat your action parameters or any IDs as numbers only, please do this yourself in your action script code. 173 174 The action script may import any other Python modules at your disposal, including the main `fvx.py` kernel to use its helper methods, and all the modules available in the configuration file directory (in case it differs from the default one). An example action script that implements three demonstration actions, `32` for echo test, `24` for caller ID readback and `22*[times]` for beep, is shipped in this repo at `example-config/actions/echobeep.py`. 175 176 ### Useful methods, variables and objects exposed by the `fvx` kernel module 177 178 - `fvx.load_yaml(filename)`: a wrapper method to read a YAML file contents into a Python variable (useful if your action scripts have their own configuration files) 179 - `fvx.load_audio(filename)`: a method to read a WAV PCM file into the audio buffer in memory, automatically resampling it if necessary 180 - `fvx.logevent(msg)`: a drop-in replacement for Python's `print` function that outputs a formatted log message with the timestamp 181 - `fvx.audio_buf_len`: the recommended length (in bytes) of a raw audio buffer to be sent to or received from the call object the action is operating on 182 - `fvx.emptybuf`: a buffer of empty audio data, `fvx.audio_buf_len` bytes long 183 - `fvx.detect_dtmf(buf)`: a method to detect a DTMF digit in the audio data buffer (see `example-config/actions/echobeep.py` for an example of how to use it correctly) 184 - `fvx.tts_to_buf(text, ttsconfig)`: a method to directly render your text into an audio data buffer (pass `config['tts']` as the second parameter if you don't want to change anything in the TTS settings) 185 - `fvx.tts_to_file(text, filename, ttsconfig)`: same as `fvx.tts_to_buf` method but writes the result to a WAV PCM file 186 - `fvx.get_caller_addr(call_obj)`: a method to extract the caller's SIP address from a `VoIPCall` object (e.g. the one passed to the action) 187 - `fvx.get_callee_addr(call_obj)`: a method to extract the destination SIP address from a `VoIPCall` object (e.g. the one passed to the action) 188 - `fvx.flush_input_audio(call_obj)`: a method to ensure any excessive audio is not collected in the call audio buffer, recommended to use at the start of any actions that perform incoming audio processing 189 - `fvx.playbuf(buf, call_obj)`: a method to properly play back any audio buffer to the call's line 190 - `fvx.kernelroot`: a string that contains the FrugalVox kernel directory path 191 - `fvx.configroot`: a string that contains the config file directory path 192 193 ## FAQ 194 195 **How was this created?** 196 197 Initially, FrugalVox was created to answer a simple question: "given a VoIP provider with a DID number I pay for monthly anyway, and a cheap privacy-centric VPS already filled with other stuff, how can I combine these two things together to control various things on the Internet from a Nokia 1280 class feature phone without the Internet access?" 198 199 **Why not Asterisk/FreeSWITCH/etc then? What's wrong with existing solutions?** 200 201 Nothing wrong at the first glance, but... Asterisk's code base was, as of November 2016, as large as 1139039 SLOC. If you don't see a problem with that, I envy your innocence. Anyway, I doubt that any of those would be able to comfortably run on that cheap VPS or on my Orange Pi with 256 MB RAM. For my goals, it would be like hunting sparrows with ballistic missiles. 202 203 FrugalVox kernel, on the other hand, is around 270 SLOC in 2023. Despite being written in Python, it is really frugal in terms of resource consumption, both while running and while writing and debugging its code. Yet, thanks to full exposure of the action scripts to the Python runtime, it can be as flexible as you want it to be. Not to mention that such a small piece of code is much easier to audit and discover and promptly mitigate any subtle errors or security vulnerabilities. 204 205 **So, is it a PBX or just a scriptable IVR system?** 206 207 I'd rather think of FrugalVox not as a turnkey solution, but as a framework. If you look at the `fvx.py` kernel alone, you'll see nothing but a scriptable IVR with user authentication and TTS integration. However, its the action scripts that give such a system its meaning. FrugalVox, along with the underlying pyVoIP library, exposes all the tooling you need to create your own interactive menus, connect and bridge calls, dial into other FrugalVox instances or other services, and so on. It's a relatively simple building block that, while being useful alone, can also be used to build VoIP systems of arbitrary complexity when properly combined with other similar blocks. 208 209 **If it's meant to be flexible, why hardcode the top level command format?** 210 211 Because such a format is the only format that allows to run actions with any amount of parameters more or less efficiently using a simple phone keypad. The goal here isn't to replace something like VoiceXML, but to give the caller ability to get to the action as quickly as possible. The `PIN#` and then `action_id*param1*param2*..#` sequence is as complex as it should be. Multi-level voice menus waste the caller's time, but you can implement them as well if you really need to. 212 213 **Does FrugalVox offer a way to fully replace DTMF commands with speech recognition?** 214 215 Currently, there is no such way, but you surely can integrate speech recognition into your action scripts. It is not an easy thing to do even in Python, and in no way frugal on computing resource consumption, but definitely is possible, see [SpeechRecognition](https://pypi.org/project/SpeechRecognition/) module reference for more information. 216 217 **Why do you need a patched pyVoIP version instead of a vanilla one?** 218 219 Because vanilla pyVoIP 1.6.4 has a bug its maintainers don't even seem to recognize as a bug. Its `RTPClient` instance creates two sockets to communicate with the same host and port. As a result, when the client is behind a NAT and tries exchanging audio data using both `read_audio` and `write_audio` methods, only the latter works correctly because it's sending datagrams out to the server. Patching `RTPClient` to only use the single socket made things work the way they should. 220 221 **I understand the importance of eSpeakNG but it sounds terrible even with MBROLA. Which else open source TTS engines can you recommend to use with FrugalVox?** 222 223 The first obvious choice would be ~~Festival~~ [Flite](https://github.com/festvox/flite). With an externally downloaded `.flitevox` voice, of course. It has a number of limitations: only English and Indic languages support, no way to adjust the volume, but the output quality is definitely a bit better. If you use the Docker image of FrugalVox, Flite is also included but you have to ship your own `.flitevox` files located somewhere inside your config directory. 224 225 The second obvious choice would be [Pico TTS](https://github.com/naggety/picotts) which is (or was) used as a built-in offline TTS engine in Android. It supports more European languages (besides two variants of English, there also are Spanish, German, French and Italian) but has a single voice per language and absolutely no parameters to configure. Also, it requires autotools to build but the process looks straightforward: `./autogen.sh && ./configure && make && sudo make install`. After this, we're interested in the `pico2wave` command. Please note that its current version has some bug retrieving the text from the command line, so we use an "echo to the pipe" approach. For your convenience, this engine also comes pre-installed in the FrugalVox Docker image. 226 227 The third (not so obvious) choice **might** be [Mimic 1](https://github.com/MycroftAI/mimic1) which is basically Flite on steroids. That's why, unlike Mimic 2 and 3, it still is pretty lightweight and suitable for our IVR purposes. It supports all the `.flitevox` voice files as well as the `.htsvoice` format. However, there is a "small" issue: currently, Mimic 1 still only supports sideloading `.flitevox` and not `.htsvoice` files by specifying the arbitrary path into the `-voice` option, all HTS voices must be either compiled in or put into the `$prefix/share/mimic/voices` (where `$prefix` usually is `/usr` or `/usr/local`) or the current working directory, and then referenced in the `-voice` option without the `.htsvoice` suffix. For me, this inconsistency kinda rules Mimic 1 out of the recommended options. 228 229 Another approach to the same problem would be to build the HTS Engine API and then a version of Flite 2.0 with its support, both sources taken from [this project page](https://hts-engine.sourceforge.net/). The build process is not so straightforward but you should be left with a `flite_hts_engine` binary with a set of command line options totally different from the usual Flite or Mimic 1. If you understand how FrugalVox is configured to use Pico TTS, then you'll have no issues configuring it for `flite_hts_engine`. The voice output quality is debatable compared to the usual `.flitevox` packages, so I wouldn't include this into my recommended list either. 230 231 Alas, that looks like it. The great triad of lightweight and FOSS TTS engines consists of eSpeakNG, Flite with variations and Pico TTS. All other engines, not counting the online APIs, are too heavy to fit into the scenario. Of course, nothing prevents you from integrating them as well if you have enough resources. In that case, I'd recommend [Mimic 3](https://github.com/MycroftAI/mimic3) but that definitely is out of this FAQ's scope. 232 233 Note that for both Flite and Mimic 1 the output voice must support a sample rate that is divisible by 8000 Hz in order to sound correctly. Since version 0.0.2, FrugalVox uses an internal resampler that has this limitation. A way to mitigate this in the future versions is being investigated. 234 235 To recap, here are all the example TTS configurations for all the reviewed engines: 236 237 eSpeakNG + MBROLA: 238 239 ```yaml 240 tts: 241 cmd: 'espeak -v us-mbrola-2 -a 70 -p 60 -s 130 -w %s "%s"' # parameter order: filename, text 242 ... 243 ``` 244 245 Flite/Mimic 1: 246 247 ```yaml 248 tts: 249 cmd: 'flite -voice tts/cmu_us_rms.flitevox --setf int_f0_target_mean=100 --setf duration_stretch=1 -o %s -t "%s"' # parameter order: filename, text 250 ... 251 ``` 252 253 Pico TTS: 254 255 ```yaml 256 tts: 257 cmd: OUTF=%s sh -c 'echo "%s" | pico2wave -l en-US -w $OUTF' # parameter order: filename, text 258 ... 259 ``` 260 261 ## Version history 262 263 - 0.0.2 (2023-02-28, current): fully got rid of SoX dependency, simplified TTS configuration 264 - 0.0.1 (2023-02-26): initial release 265 266 ## Credits 267 268 Created by Luxferre in 2023. 269 270 Made in Ukraine.