init - frugalvox - A tiny VoIP IVR framework by hackers, for hackers

commit 8fb1f900364600fc19fbebe861b695fae44870dd
Author: Luxferre <lux@ferre>
Date:   Sun, 26 Feb 2023 00:19:35 +0200

init

Diffstat:
A .dockerignore  | 6 ++++++
A .gitignore  | 3 +++
A COPYING  | 26 ++++++++++++++++++++++++++
A Dockerfile  | 12 ++++++++++++
A README.md  | 279 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A example-config/actions/echobeep.py  | 38 ++++++++++++++++++++++++++++++++++++++
A example-config/clips/beep.wav  | 0 
A example-config/config.yaml  | 46 ++++++++++++++++++++++++++++++++++++++++++++++
A fvx.py  | 277 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A pyVoIP-1.6.4.patched-py3-none-any.whl  | 0 
A requirements.txt  | 3 +++

11 files changed, 690 insertions(+), 0 deletions(-)
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,6 @@
+__pycache__
+*.pyc
+private-config
+example-config
+compose.yaml
+README.md
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,3 @@
+__pycache__
+*.pyc
+private-config
diff --git a/COPYING b/COPYING
@@ -0,0 +1,26 @@
+This is free and unencumbered software released into the public domain.
+
+Anyone is free to copy, modify, publish, use, compile, sell, or
+distribute this software, either in source code form or as a compiled
+binary, for any purpose, commercial or non-commercial, and by any
+means.
+
+In jurisdictions that recognize copyright laws, the author or authors
+of this software dedicate any and all copyright interest in the
+software to the public domain. We make this dedication for the benefit
+of the public at large and to the detriment of our heirs and
+successors. We intend this dedication to be an overt act of
+relinquishment in perpetuity of all present and future rights to this
+software under copyright law.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+
+For more information, please refer to <https://unlicense.org>
+
+
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,12 @@
+FROM python:3.10-slim-bullseye
+USER root
+WORKDIR /usr/src/app
+RUN sed -i -e's/ main/ main contrib non-free/g' /etc/apt/sources.list
+RUN apt-get update -y && apt-get install -y libttspico-utils espeak-ng espeak-ng-data espeak-ng-espeak flite sox
+COPY requirements.txt ./
+COPY pyVoIP-1.6.4.patched-py3-none-any.whl ./
+RUN pip install --no-cache-dir -r requirements.txt
+COPY fvx.py ./
+VOLUME /opt/config
+WORKDIR /opt/config
+ENTRYPOINT ["python", "-u", "/usr/src/app/fvx.py", "/opt/config/config.yaml"]
diff --git a/README.md b/README.md
@@ -0,0 +1,279 @@
+# FrugalVox
+
+A tiny VoIP IVR framework by hackers and for hackers.
+
+## Features
+
+- Small and nimble: the kernel is a single Python 3 file (~250 SLOC), and the configuration is a single YAML file
+- Hackable: the kernel is well-commented and not so big, all the actions are full-featured Python scripts
+- Written with plain telephony in mind, supporting both out-of-band and in-band DTMF command detection
+- Comes with PIN-based authentication and action access control out of the box (optional but recommended)
+- Comes with TTS integration out of the box, configured for eSpeakNG by default (optional)
+- Container-ready
+- Released into public domain
+
+## Limitations
+
+- Only UDP transport is supported for now (pyVoIP limitation)
+- Only a single SIP server registration per instance is supported by design (if you need to receive incoming calls via multiple SIP accounts, you must spin up multiple FrugalVox instances)
+- The format for top-level action commands is hardcoded (see the workflow section of this README)
+
+## Running FrugalVox as a host application
+
+### Dependencies 
+
+- Python 3.8 or higher (3.10 recommended)
+- pyVoIP 1.6.4, patched according to [this comment](https://github.com/tayler6000/pyVoIP/issues/107#issuecomment-1440231926) (also available in a `.whl` file in this repo)
+- NumPy (mandatory, required for DTMF detection)
+- eSpeakNG and SoX(optional but required for the default TTS engine configuration)
+
+### Usage
+
+Just run `python /path/to/fvx.py [your-config.yaml]`. Press Ctrl+C or otherwise terminate the process when you don't need it.
+
+Make sure your `python` command is pointing to Python 3.8 or higher.
+
+## Running FrugalVox in Docker
+
+The Docker image encapsulates all the dependencies (including Python 3.10, three different TTS engines (see the FAQ section), SoX and the patched pyVoIP package) but requires you to provide all the configuration and action scripts in a volume mounted from the host. In addition to this, the configuration file itself must be called `config.yaml` since the container is only going to be looking for this name.
+
+### Building
+
+From the source directory, run: `docker build -t frugalvox .` and the image called `frugalvox` will be built.
+
+### Running from the command line
+
+The command to run the `frugalvox` image locally is:
+
+```
+docker run -d --rm -v /path/to/configdir:/opt/config --name frugalvox frugalvox
+```
+
+Note that the `/path/to/configdir` must be absolute. Use `$(pwd)` command to get the current working directory if your configuration directory path is relative to it.
+
+### Running from Docker Compose
+
+Add this to your `compose.yaml`, replacing `$PWD/example-config` with your configuration direcory path:
+
+```yaml
+services:
+  # ... other services here ...
+  frugalvox:
+    image: frugalvox:latest
+    container_name: frugalvox
+    restart: on-failure:10
+    volumes:
+      - "$PWD/example-config:/opt/config"
+```
+
+Then, on the next `docker compose up -d` run, the FrugalVox container should be up. Note that you can attach this service to any network you already created in Compose file, as long as it allows the containers to have Internet access. 
+
+## Typical FrugalVox workflow
+
+Without auth:
+
+1. User calls the IVR.
+2. User is prompted for the command.
+3. User enters the DTMF command in the form `[action ID]*[param1]*[param2]*...#`.
+4. The IVR checks the action ID. If it exists, the corresponding action script is run. If it doesn't, an error message is played back to the user.
+5. User is prompted for the next command, and so on.
+
+With auth (recommended):
+
+1. User calls the IVR.
+2. User is prompted for the PIN (internal user ID) followed by `#` key.
+3. The IVR checks the PIN. If it doesn't exist in the user list, the caller is warned and the call is terminated. Otherwise, go to the next step.
+4. User is prompted for the command.
+5. User enters the DTMF command in the form `[action ID]*[param1]*[param2]*...#`.
+6. The IVR checks the action ID. If it exists in the list of actions allowed for the user, the corresponding action script is run. If it doesn't, an error message is played back to the user.
+7. User is prompted for the next command, and so on.
+
+## Configuration
+
+All FrugalVox configuration is done in a single YAML file that's passed to the kernel on the start. If no file is passed, FrugalVox will look for `config.yaml` file in the current working directory.
+
+The entire config is done in four different sections of the YAML file: `sip`, `tts`, `clips` and `ivr`.
+
+### SIP client configuration: `sip`
+
+This section lets you configure which SIP server FrugalVox will connect to in order to start receiving incoming calls. The fields are:
+
+- `sip.host`: your SIP provider hostname or IP address
+- `sip.port`: your SIP provider port number (usually 5060)
+- `sip.transport`: optional, not used for now, added here for the forward compatibility with future versions (the only supported transport is now `udp`)
+- `sip.username`: your SIP account auth username (only the username itself, no domain or URI parts)
+- `sip.password`: your SIP account auth password
+- `sip.rtpPortLow` and `sip.rtpPortHigh`: your UDP port range for RTP communication, must match the port range opened in Docker if using the image, usually the default of 10000 and 20000 is fine
+
+All the fields in this section, except `transport`, are currently mandatory. If unsure about `rtpPortLow` and `rtpPortHigh`, just leave the values provided in the example config.
+
+### Text-to-speech engine settings: `tts`
+
+This section allows you to configure your TTS engine, for FrugalVox to be able to generate audio clips from your text. The fields are:
+
+- `tts.voice`: the name of the voice supported by your TTS engine (eSpeakNG by default)
+- `tts.rate`: words per minute speech rate of the voice
+- `tts.volume`: voice volume (0 to 200 for eSpeakNG)
+- `tts.pitch`: voice pitch (check with your TTS program which one is optimal for you, 60 is the default in the example config)
+- `tts.cmd`: a dictionary with the TTS synth and transcoder command templates, please just leave the default values there unless you want to switch to a different TTS engine other than eSpeakNG or a different encoder other than SoX
+- `tts.phrases`: a dictionary where every key is the clip name and the value is the phrase text to be rendered to that clip on the kernel start
+
+### Static audio clips list: `clips`
+
+This section determines which static audio clips are to be loaded into memory in addition to the synthesized voice. The clips must be in the unsigned 8-bit 8KHz PCM WAV format. The fields are:
+
+- `clips.dir`: path to the directory (relative to the configuration file directory) containing the audio clips to load
+- `clips.files`: a dictionary mapping the clip names to the `.wav` file names in the `clips.dir` directory 
+
+Both fields are required to fill, but if you're not planning on using any static audio, just set `clips.dir` value to `'.'` and `clips.files` to `{}`.
+
+### IVR configuration: `ivr`
+
+This section lets you set up PIN-based authentication, access control and, most importantly, IVR actions themselves and the scripts that implement them.
+
+These two fields are mandatory to fill (although can be left as empty arrays):
+
+- `ivr.cmdpromptclips`: an array with a sequence of the clip names to play back to prompt the caller for a command
+- `ivr.cmdfailclips`: an array with a sequence of the clip names to play back to alert the caller about an invalid command 
+
+Note that any clip name in this section can refer to both static and synthesized voice audio clips, they all are populated at the same place at this point.
+
+To turn the authentication part on and off, use the `ivr.auth` field. If and only if this field is set to `true`, the following fields are required:
+
+- `ivr.authpromptclips`: an array with a sequence of the clip names to play back to prompt for the caller's PIN
+- `ivr.authfailclips`: an array with a sequence of the clip names to play back to alert the caller about the invalid PIN before hanging up the call
+- `ivr.users`: a dictionary that maps valid user IDs (PINs) to the lists (arrays) of action IDs they are authorized to run, or `'*'` string if the user is authorized to run all registered actions on this instance
+
+Also, if the user is authenticated but not authorized to run a particular action, the same "invalid command" message sequence from `ivr.cmdfailclips` will be played back as if the command was not found. This is implemented by design as a security measure. FrugalVox itself will log a different message to the console though.
+
+Finally, all the action mapping is done in the mandatory `ivr.actions` dictionary. The key is the action ID (without parameters, i.e. commands `22*5#` and `22*44*99#` both mean an action with ID 22, just with different parameter lists) and the value is the path to the action script file, relative to the configuration file directory.
+
+## Action scripts
+
+An action script is a regular Python module file referenced in the `ivr.actions` section of the configuration YAML file. A single script may implement one or more actions based on the action ID. The module must implement the `run_action` call in order to work as an action script, as follows:
+
+```
+def run_action(action_id, params, call_obj, user_id, config, clips, calls):
+    ...
+```
+
+where:
+
+- `action_id` is a string action identifier,
+- `params` is an array of string parameters to the action,
+- `call_obj` is an instance of `pyVoIP.VoIP.VoIPCall` class passed from the main FrugalVox call processing loop,
+- `user_id` is the ID (PIN) string of the user running the action (if authentication is turned off, it's always `0000`),
+- `config` is the dictionary containing the entire FrugalVox configuration object (as specified in the YAML),
+- `clips` is the object containing all in-memory audio clips (in the unsigned 8-bit 8KHz PCM format) ready to be fed into the `write_audio` method of the `call_obj`,
+- `calls` is a dictionary with all currently active calls on the instance (keyed with the `call_obj.call_id` value).
+
+_Protip:_ if we use `*` as parameter separator and `#` as command terminator, why are `action_id`, `user_id` and all the action parameters still treated as strings as opposed to numbers? Because `A`, `B`, `C` and `D` still are valid DTMF digits and can be legitimately used in the actions or their parameters. Of course, if you target normal phone users, you should avoid using the "extended" digits, but there still is a possibility to do so. If you need to treat your action parameters or any IDs as numbers only, please do this yourself in your action script code.
+
+The action script may import any other Python modules at your disposal, including the main `fvx.py` kernel to use its helper methods, and all the modules available in the configuration file directory (in case it differs from the default one). An example action script that implements three demonstration actions, `32` for echo test, `24` for caller ID readback and `22*[times]` for beep, is shipped in this repo at `actions/echobeep.py`.
+
+### Useful methods, variables and objects exposed by the `fvx` kernel module
+
+- `fvx.load_yaml(filename)`: a wrapper method to read a YAML file contents into a Python variable (useful if your action scripts have their own configuration files)
+- `fvx.load_audio(filename)`: a method to read a WAV PCM file into the audio buffer in memory (note that it must be unsigned 8-bit 8Khz in order to work with pyVoIP calls)
+- `fvx.logevent(msg)`: a drop-in replacement for Python's `print` function that outputs a formatted log message with the timestamp
+- `fvx.audio_buf_len`: the recommended length (in bytes) of a raw audio buffer to be sent to or received from the call object the action is operating on
+- `fvx.emptybuf`: a buffer of empty audio data, `fvx.audio_buf_len` bytes long
+- `fvx.detect_dtmf(buf)`: a method to detect a DTMF digit in the audio data buffer (see `actions/echobeep.py` for an example of how to use it correctly)
+- `fvx.tts_to_buf(text, ttsconfig)`: a method to directly render your text into an audio data buffer (pass `config['tts']` as the second parameter if you don't want to change anything in the TTS settings)
+- `fvx.tts_to_file(text, filename, ttsconfig)`: same as `fvx.tts_to_buf` method but writes the result to a WAV PCM file
+- `fvx.get_caller_addr(call_obj)`: a method to extract the caller's SIP address from a `VoIPCall` object (e.g. the one passed to the action)
+- `fvx.get_callee_addr(call_obj)`: a method to extract the destination SIP address from a `VoIPCall` object (e.g. the one passed to the action)
+- `fvx.flush_input_audio(call_obj)`: a method to ensure any excessive audio is not collected in the call audio buffer, recommended to use at the start of any actions that perform incoming audio processing
+- `fvx.playbuf(buf, call_obj)`: a method to properly play back any audio buffer to the call's line
+- `fvx.kernelroot`: a string that contains the FrugalVox kernel directory path
+- `fvx.configroot`: a string that contains the config file directory path
+
+## FAQ
+
+**How was this created?**
+
+Initially, FrugalVox was created to answer a simple question: "given a VoIP provider with a DID number I pay for monthly anyway, and a cheap privacy-centric VPS already filled with other stuff, how can I combine these two things together to control various things on the Internet from a Nokia 1280 class feature phone without the Internet access?"
+
+**Why not Asterisk/FreeSWITCH/etc then? What's wrong with existing solutions?**
+
+Nothing wrong at the first glance, but... Asterisk's code base was, as of November 2016, as large as 1139039 SLOC. If you don't see a problem with that, I envy your innocence. Anyway, I doubt that any of those would be able to comfortably run on that cheap VPS or on my Orange Pi with 256 MB RAM. For my goals, it would be like hunting sparrows with ballistic missiles.
+
+FrugalVox kernel, on the other hand, is around 250 SLOC in 2023. Despite being written in Python, it is really frugal in terms of resource consumption, both while running and while writing and debugging its code. Yet, thanks to full exposure of the action scripts to the Python runtime, it can be as flexible as you want it to be. Not to mention that such a small piece of code is much easier to audit and discover and promptly mitigate any subtle errors or security vulnerabilities.
+
+**So, is it a PBX or just a scriptable IVR system?**
+
+I'd rather think of FrugalVox not as a turnkey solution, but as a framework. If you look at the `fvx.py` kernel alone, you'll see nothing but a scriptable IVR with user authentication and TTS integration. However, its the action scripts that give such a system its meaning. FrugalVox, along with the underlying pyVoIP library, exposes all the tooling you need to create your own interactive menus, connect and bridge calls, dial into other FrugalVox instances or other services, and so on. It's a relatively simple building block that, while being useful alone, can also be used to build VoIP systems of arbitrary complexity when properly combined with other similar blocks.
+
+**If it's meant to be flexible, why hardcode the top level command format?**
+
+Because such a format is the only format that allows to run actions with any amount of parameters more or less efficiently using a simple phone keypad. The goal here isn't to replace something like VoiceXML, but to give the caller ability to get to the action as quickly as possible. The `PIN#` and then `action_id*param1*param2*..#` sequence is as complex as it should be. Multi-level voice menus waste the caller's time, but you can implement them as well if you really need to.
+
+**Does FrugalVox offer a way to fully replace DTMF commands with speech recognition?**
+
+Currently, there is no such way, but you surely can integrate speech recognition into your action scripts. It is not an easy thing to do even in Python, and in no way frugal on computing resource consumption, but definitely is possible, see [SpeechRecognition](https://pypi.org/project/SpeechRecognition/) module reference for more information.
+
+**Why do you need a patched pyVoIP version instead of a vanilla one?**
+
+Because vanilla pyVoIP 1.6.4 has a bug its maintainers don't even seem to recognize as a bug. Its `RTPClient` instance creates two sockets to communicate with the same host and port. As a result, when the client is behind a NAT and tries exchanging audio data using both `read_audio` and `write_audio` methods, only the latter works correctly because it's sending datagrams out to the server. Patching `RTPClient` to only use the single socket made things work the way they should.
+
+**I understand the importance of eSpeakNG but it sounds terrible even with MBROLA. Which else open source TTS engines can you recommend to use with FrugalVox?**
+
+The first obvious choice would be ~~Festival~~ [Flite](https://github.com/festvox/flite). With an externally downloaded `.flitevox` voice, of course. It has a number of limitations: only English and Indic languages support, no way to adjust the volume, but the output quality is definitely a bit better. If you use the Docker image of FrugalVox, Flite is also included but you have to ship your own `.flitevox` files located somewhere inside your config directory. Also, current Flite versions already generate 16 KHz PCM files instead of 8 KHz, so the transcoder command still needs to be in place.
+
+The second obvious choice would be [Pico TTS](https://github.com/naggety/picotts) which is (or was) used as a built-in offline TTS engine in Android. It supports more European languages (besides two variants of English, there also are Spanish, German, French and Italian) but has a single voice per language and absolutely no parameters to configure. Also, it requires autotools to build but the process looks straightforward: `./autogen.sh && ./configure && make && sudo make install`. After this, we're interested in the `pico2wave` command. Please note that its current version has some bug retrieving the text from the command line, so we use an "echo to the pipe" approach. For your convenience, this engine also comes pre-installed in the FrugalVox Docker image.
+
+The third (not so obvious) choice **might** be [Mimic 1](https://github.com/MycroftAI/mimic1) which is basically Flite on steroids. That's why, unlike Mimic 2 and 3, it still is pretty lightweight and suitable for our IVR purposes. It supports all the `.flitevox` voice files as well as the `.htsvoice` format. However, there is a "small" issue: currently, Mimic 1 still only supports sideloading `.flitevox` and not `.htsvoice` files by specifying the arbitrary path into the `-voice` option, all HTS voices must be either compiled in or put into the `$prefix/share/mimic/voices` (where `$prefix` usually is `/usr` or `/usr/local`) or the current working directory, and then referenced in the `-voice` option without the `.htsvoice` suffix. For me, this inconsistency kinda rules Mimic 1 out of the recommended options. The [Mycroft 4.0](https://github.com/MycroftAI/mimic1/blob/development/voices/mycroft_voice_4.0.flitevox) voice though, which is shipped in the same Mimic 1 repo, still can be used with the vanilla Flite with no issues. 
+
+Another approach to the same problem would be to build the HTS Engine API and then a version of Flite 2.0 with its support, both sources taken from [this project page](https://hts-engine.sourceforge.net/). The build process is not so straightforward but you should be left with a `flite_hts_engine` binary with a set of command line options totally different from the usual Flite or Mimic 1. If you understand how FrugalVox is configured to use Pico TTS, then you'll have no issues configuring it for `flite_hts_engine`. The voice output quality is debatable compared to the usual `.flitevox` packages, so I wouldn't include this into my recommended list either.
+
+Alas, that looks like it. The great triad of lightweight and FOSS TTS engines consists of eSpeakNG, Flite with variations and Pico TTS. All other engines, not counting the online APIs, are too heavy to fit into the scenario. Of course, nothing prevents you from integrating them as well if you have enough resources. In that case, I'd recommend [Mimic 3](https://github.com/MycroftAI/mimic3) but that definitely is out of this FAQ's scope.
+
+To recap, here are all the example TTS configurations for all the reviewed engines:
+
+eSpeakNG + MBROLA:
+
+```yaml
+tts:
+  voice: 'us-mbrola-2'
+  rate: 130 # words per minute
+  volume: 70 # from 0 to 200
+  pitch: 60
+  cmd:
+    synth: 'espeak -v %s -a %d -p %d -s %d -w %s "%s"' # parameter order: voice, volume, pitch, rate, filename, text
+    transcode: 'sox %s -r 8000 -b 8 -c 1 -D %s' # parameter order: inputfile, outputfile
+  ...
+```
+
+Flite/Mimic 1:
+
+```yaml
+tts:
+  voice: 'tts/mycroft_voice_4.0.flitevox'
+  rate: 1 # Flite uses a factor instead of absolute value
+  volume: 0 # Flite doesn't support volume adjustment
+  pitch: 100 # Flite uses slightly different pitch scale
+  cmd:
+    synth: 'flite -voice %s --setf vol=%d --setf int_f0_target_mean=%d --setf duration_stretch=%d -o %s -t "%s"' # parameter order: voice, volume, pitch, rate, filename, text
+    transcode: 'sox %s -r 8000 -b 8 -c 1 -D %s' # parameter order: inputfile, outputfile
+  ...
+```
+
+Pico TTS:
+
+```yaml
+tts:
+  voice: 'en-US'
+  rate: 0 # Pico doesn't support it
+  volume: 0 # Pico doesn't support it
+  pitch: 0 # Pico doesn't support it
+  cmd:
+    synth: VOICE=%s UU=%d%d%d OUTF=%s sh -c 'echo "%s" | pico2wave -l $VOICE -w $OUTF' # parameter order: voice, volume, pitch, rate, filename, text
+    transcode: 'sox %s -r 8000 -b 8 -c 1 -D %s' # parameter order: inputfile, outputfile
+  ...
+```
+
+## Credits
+
+Created by Luxferre in 2023.
+
+Made in Ukraine.
diff --git a/example-config/actions/echobeep.py b/example-config/actions/echobeep.py
@@ -0,0 +1,38 @@
+# An example action script for FrugalVox
+# Implements three actions: echo test, caller address readback and parameterized beep
+
+import os, sys # example of using commonly available Python modules
+from pyVoIP.VoIP import CallState # example of using an installation-specific module
+from fvx import tts_to_buf, detect_dtmf, audio_buf_len, emptybuf, get_caller_addr, flush_input_audio, playbuf # example of using the FrugalVox kernel module (fvx)
+
+def run_action(action_id, params, call_obj, user_id, config, clips, calls):
+    if action_id == '32': # echo test: just enter 32#
+        playbuf(tts_to_buf('Entering the echo test, press pound to return', config['tts']), call_obj)
+        flush_input_audio(call_obj)
+        cache_digit = None # in-band digit cache
+        while call_obj.state == CallState.ANSWERED: # main event loop
+            audiobuf = call_obj.read_audio(audio_buf_len, True) # blocking audio buffer read
+            call_obj.write_audio(audiobuf) # echo the audio
+            digit = call_obj.get_dtmf() # get a single out-of-band DTMF digit
+            if digit == '' and audiobuf != emptybuf: # no out-of-band digit, try in-band detection
+                ib_digit = detect_dtmf(audiobuf)
+                if ib_digit != cache_digit:
+                    if ib_digit == None: # digit transmission ended
+                        digit = cache_digit # save the digit
+                        cache_digit = None  # reset the cache
+                    else: # digit transmission started
+                        cache_digit = ib_digit
+            if digit == '#':
+                playbuf(tts_to_buf('Echo test ended', config['tts']), call_obj)
+                return
+    elif action_id == '24': # Caller ID readback: enter 24#
+        playbuf(tts_to_buf('Your caller ID is %s' % get_caller_addr(call_obj), config['tts']), call_obj) # demonstration of the on-the-fly TTS
+    else: # beep command: 22*3# tells to beep 3 times
+        times = 1 # how many times we should beep
+        if len(params) > 0:
+            times = int(params[0])
+        if times > 10: # limit beeps to 10
+            times = 10
+        # send the beeps
+        playbuf((clips['beep']+(emptybuf*10)) * times, call_obj)
+        return
diff --git a/example-config/clips/beep.wav b/example-config/clips/beep.wav
Binary files differ.
diff --git a/example-config/config.yaml b/example-config/config.yaml
@@ -0,0 +1,46 @@
+---
+# SIP client configuration for incoming calls
+sip:
+  host: sip.example.com
+  port: 5060
+  transport: udp # not used for now, UDP is the only option
+  username: 'exampleuser'
+  password: 'examplepass123'
+  rtpPortLow: 10000
+  rtpPortHigh: 20000
+
+# TTS engine configuration
+tts:
+  voice: 'us-mbrola-2'
+  rate: 130 # words per minute
+  volume: 70 # from 0 to 200
+  pitch: 60
+  cmd: # command templates, do not modify them unless you're fully changing the engine
+    synth: 'espeak -v %s -a %d -p %d -s %d -w %s "%s"' # parameter order: voice, volume, pitch, rate, filename, text
+    transcode: 'sox %s -r 8000 -b 8 -c 1 -D %s' # parameter order: inputfile, outputfile
+  phrases: # key is the clip name, value is the text
+    passprompt: 'Please enter your pin followed by pound after the beep.'
+    cmd: 'Please enter your command, ending with pound.'
+    invalidpass: 'Invalid pin, bye!'
+    nocmd: 'Command not found.'
+
+# static audio clips configuration
+clips:
+  dir: './clips'
+  files: # in addition to the ones generated from phrases
+    beep: 'beep.wav'
+
+# IVR configuration
+ivr:
+  auth: true # set to false to disable user PINs and go straight to actions (not recommended)
+  authpromptclips: [passprompt, beep] # sequence of authentication prompt audio clips
+  authfailclips: [invalidpass]
+  cmdfailclips: [nocmd]
+  cmdpromptclips: [cmd]
+  users: # user PINs and supported actions, '*' means all
+    '3105': '*'
+    '3246': ['32']
+  actions: # all registered actions: command => script filename
+    '32': './actions/echobeep.py'
+    '22': './actions/echobeep.py'
+    '24': './actions/echobeep.py'
diff --git a/fvx.py b/fvx.py
@@ -0,0 +1,277 @@
+#!/usr/bin/env python3
+
+# FrugalVox: experimental, straightforward, no-nonsense IVR framework on top of pyVoIP (patched) and TTS engines
+# Created by Luxferre in 2023, released into public domain
+# Deps: PyYAML, NumPy, espeak-ng/flite/libttspico, SoX, patched pyVoIP (see https://github.com/tayler6000/pyVoIP/issues/107#issuecomment-1440231926)
+# All configuration is in config.yaml
+
+import sys
+import os
+import signal
+import tempfile
+import yaml
+import wave
+import time
+from datetime import datetime # for logging
+import traceback # for logging
+import socket # for local IP detection
+import numpy as np # for in-band DTMF detection
+import importlib.util # for action modules import
+from pyVoIP.VoIP import VoIPPhone, InvalidStateError, CallState
+
+# global parameters
+progname = 'FrugalVox v0.0.1'
+config = {} # placeholder for config object
+configfile = './config.yaml' # default config yaml path (relative to the workdir)
+if len(sys.argv) > 1:
+    configfile = sys.argv[1]
+configfile = os.path.realpath(configfile)
+kernelroot = os.path.realpath(os.path.dirname(__file__)) # absolute path to the kernel directory
+configroot = os.path.dirname(configfile)
+sys.path.append(kernelroot) # make the kernel module findable
+if configroot != kernelroot:
+    sys.path.append(configroot) # make the modules in configuration directory findable
+audio_buf_len = 160 # analyze this amount of raw audio data bytes
+emptybuf = b'\x80' * audio_buf_len
+DTMF_TABLE = {
+    '1': [1209, 697],
+    '2': [1336, 697],
+    '3': [1477, 697],
+    'A': [1633, 697],
+    '4': [1209, 770],
+    '5': [1336, 770],
+    '6': [1477, 770],
+    'B': [1633, 770],
+    '7': [1209, 852],
+    '8': [1336, 852],
+    '9': [1477, 852],
+    'C': [1633, 852],
+    '*': [1209, 941],
+    '0': [1336, 941],
+    '#': [1477, 941],
+    'D': [1633, 941]
+}
+ivrconfig = None # placeholder for IVR auth config
+calls = {} # placeholder for all realtime call instances
+
+# helper methods
+
+def logevent(msg):
+    dts = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
+    print('[%s] %s' % (dts, msg))
+
+def load_audio(fname): # load audio data from a WAV PCM file
+    f = wave.open(fname, 'rb')
+    frames = f.getnframes()
+    data = f.readframes(frames)
+    f.close()
+    return data
+
+def load_yaml(fname): # load an object from a YAML file
+    yf = open(fname, 'r')
+    yc = yf.read()
+    yf.close()
+    return yaml.safe_load(yc)
+
+def tts_to_file(text, fname, conf): # render the text to a file
+    fh, tname = tempfile.mkstemp('.wav', 'fvx-')
+    os.close(fh)
+    rate = int(conf['rate'])
+    volume = int(conf['volume'])
+    pitch = int(conf['pitch'])
+    ecmd = conf['cmd']['synth'] % (conf['voice'], volume, pitch, rate, tname, text)
+    os.system(ecmd) # render to the temporary file
+    # now, resample the synthesized file to Unsigned 8-bit 8Khz mono PCM
+    smpcmd = conf['cmd']['transcode'] % (tname, fname)
+    os.system(smpcmd)
+    os.remove(tname)
+
+def tts_to_buf(text, conf): # render the text directly to a buffer
+    fh, fname = tempfile.mkstemp('.wav', 'fvx-')
+    os.close(fh)
+    tts_to_file(text, fname, conf)
+    buf = load_audio(fname)
+    os.remove(fname)
+    return buf
+
+def get_caller_addr(call): # extract caller's SIP address from the call request headers
+    return call.request.headers['From']['address']
+
+def get_callee_addr(call): # extract destination SIP address from the call request headers
+    return call.request.headers['To']['address']
+
+def flush_input_audio(call): # clear the call's RTP input buffer
+    abuf = None
+    for i in range(625): # because 625 * 160 = 100000 (pyVoIP's internal buffer size)
+        abuf = call.read_audio(audio_buf_len, False)
+
+def playbuf(buf, call): # properly play audio buffer on the call
+    blen = len(buf) / 8000
+    call.write_audio(buf)
+    time.sleep(blen)
+
+def playclips(clipset, call): # properly play clips on the call
+    for clipname in clipset:
+        playbuf(clips[clipname], call)
+
+def hangup(call): # call hangup wrapper
+    global calls
+    if call.call_id in calls:
+        del calls[call.call_id]
+    try:
+        call.hangup()
+    except InvalidStateError:
+        pass
+    logevent('Call with %s terminated' % get_caller_addr(call))
+
+# in-band DTMF detector
+
+def isNumberInArray(array, number):
+    offset = 5
+    for i in range(number - offset, number + offset):
+        if i in array:
+            return True
+    return False
+
+def detect_dtmf(buf): # Detect a DTMF digit in the audio buffer using FFT
+    data = np.frombuffer(buf, dtype=np.uint8)
+    ftdata = np.fft.fft(data)
+    ftlen = len(ftdata)
+    for i in range(ftlen):
+        ftdata[i] = int(np.absolute(ftdata[i]))
+    lb = 20 * np.average(ftdata) # lower bound for filtering
+    freqs = []
+    for i in range(ftlen):
+        if ftdata[i] > lb:
+            freqs.append(i)
+    for d, fpair in DTMF_TABLE.items(): # Detect and return the digit
+        if isNumberInArray(freqs, fpair[0]) and isNumberInArray(freqs, fpair[1]):
+            return d
+
+# IVR command handler (for authenticated and authorized action runs)
+
+def command_handler(act, modulefile, call, userid):
+    global clips
+    global calls
+    global config
+    actid = act[0]
+    params = act[1:]
+    logevent('Running action %s from the module %s with params (%s)' % (actid, modulefile, ', '.join(params)))
+    (modname, ext) = os.path.splitext(os.path.basename(modulefile))
+    spec = importlib.util.spec_from_file_location(modname, modulefile)
+    actionmodule = importlib.util.module_from_spec(spec)
+    sys.modules[modname] = actionmodule
+    spec.loader.exec_module(actionmodule)
+    actionmodule.run_action(actid, params, call, userid, config, clips, calls)
+
+# main call handler
+
+def main_call_handler(call): # call object as the argument
+    global clips
+    global ivrconfig
+    global calls
+    calls[call.call_id] = call # register the call in the list
+    logevent('New incoming call from %s' % get_caller_addr(call))
+    try:
+        call.answer()
+        authdone = True
+        userid = '0000' # default for the unauthorized
+        actionsallowed = '*'
+        if ivrconfig['auth'] == True: # drop all permissions and prompt for the PIN
+            authdone = False
+            actionsallowed = {}
+            playclips(ivrconfig['authpromptclips'], call)
+        else: # prompt for the first command
+            playclips(ivrconfig['cmdpromptclips'], call)
+        cmdbuf = '' # command buffer
+        cache_digit = None # in-band digit cache
+        while call.state == CallState.ANSWERED: # main event loop
+            audiobuf = call.read_audio(audio_buf_len, False) # nonblocking audio buffer read
+            digit = call.get_dtmf() # get a single out-of-band DTMF digit
+            if digit == '' and audiobuf != emptybuf: # no out-of-band digit, try in-band detection
+                ib_digit = detect_dtmf(audiobuf)
+                if ib_digit != cache_digit:
+                    if ib_digit == None: # digit transmission ended
+                        digit = cache_digit # save the digit
+                        cache_digit = None  # reset the cache
+                    else: # digit transmission started
+                        cache_digit = ib_digit
+            if digit == '#': # end of the command
+                if authdone: # we're authenticated, let's authorize the action
+                    actionparts = cmdbuf.split('*')
+                    actionroot = actionparts[0]
+                    letthrough = False
+                    if actionsallowed == '*' or (actionroot in actionsallowed):
+                        letthrough = True
+                    if letthrough: # authorized
+                        if actionroot in ivrconfig['actions']: # command exists
+                            actionmod = os.path.realpath(os.path.join(configroot, ivrconfig['actions'][actionroot])) # resolve the action module file
+                            command_handler(actionparts, actionmod, call, userid) # pass control to the command handler along with the call instance
+                        else: # command doesn't exist, notify the caller
+                            playclips(ivrconfig['cmdfailclips'], call)
+                            logevent('Attempt to execute a non-existing action %s with the user ID %s' % (cmdbuf, userid))
+                    else: # notify the caller that the command doesn't exist and log the event
+                        playclips(ivrconfig['cmdfailclips'], call)
+                        logevent('Attempt to execute an unauthorized action %s with the user ID %s' % (cmdbuf, userid))
+                    playclips(ivrconfig['cmdpromptclips'], call) # prompt for the next command
+                    flush_input_audio(call)
+                else: # we expect the first command to be our user PIN
+                    if cmdbuf in ivrconfig['users']: # PIN found, confirm auth and prompt for the command
+                        authdone = True
+                        userid = cmdbuf
+                        actionsallowed = ivrconfig['users'][userid]
+                        playclips(ivrconfig['cmdpromptclips'], call) # prompt for the next command
+                    else: # PIN not found, alert the caller, log the failed entry and hang up
+                        playclips(ivrconfig['authfailclips'], call)
+                        logevent('Attempt to enter with invalid PIN %s' % cmdbuf)
+                        hangup(call)
+                cmdbuf = '' # clear command buffer
+            elif digit != '': # append the digit to the command buffer
+                cmdbuf += digit
+        hangup(call)
+    except InvalidStateError: # usually this means the call was hung up mid-action
+        hangup(call)
+    except SystemExit: # in case the service has been stopped or restarted
+        hangup(call)
+    except Exception as e:
+        print('Unknown error: ', sys.exc_info())
+        traceback.print_exc()
+        hangup(call)
+
+# signal handler for graceful process termination
+
+def sighandler(signum, frame):
+    global phone
+    logevent('Stopping the SIP client...')
+    phone.stop()
+    logevent('SIP client stopped, bye!')
+
+# entry point
+
+if __name__ == '__main__':
+    logevent('Starting %s' % progname)
+    config = load_yaml(configfile)
+    ivrconfig = config['ivr']
+    logevent('Configuration loaded from %s' % configfile)
+    clipDir = os.path.realpath(os.path.join(configroot, config['clips']['dir']))
+    logevent('Loading clips and compiling TTS phrases')
+    clips = config['clips']['files']
+    for k, fname in clips.items():
+        clips[k] = load_audio(os.path.join(clipDir, fname))
+    for pname, phrase in config['tts']['phrases'].items():
+        clips[pname] = tts_to_buf(phrase, config['tts'])
+    logevent('All clips loaded to memory buffers from %s' % clipDir)
+    logevent('Initializing SIP phone part')
+    sip = config['sip']
+    sipport = int(sip['port'])
+    localname = socket.gethostname()
+    localip = (([ip for ip in socket.gethostbyname_ex(localname)[2] if not ip.startswith('127.')] or [[(s.connect((sip['host'], sipport)), s.getsockname()[0], s.close()) for s in [socket.socket(socket.AF_INET, socket.SOCK_DGRAM)]][0][1]]) + [None])[0]
+    if localip == None:
+        localip = socket.gethostbyname(localname)
+    logevent('Local IP detected: %s' % localip)
+    phone = VoIPPhone(sip['host'], sipport, sip['username'], sip['password'], myIP=localip, rtpPortLow=int(sip['rtpPortLow']), rtpPortHigh=int(sip['rtpPortHigh']), callCallback=main_call_handler)
+    # register the SIGINT and SIGTERM handlers to gracefully stop the phone instance
+    signal.signal(signal.SIGINT, sighandler)
+    signal.signal(signal.SIGTERM, sighandler)
+    phone.start()
+    logevent('SIP client started')
diff --git a/pyVoIP-1.6.4.patched-py3-none-any.whl b/pyVoIP-1.6.4.patched-py3-none-any.whl
Binary files differ.
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,3 @@
+numpy>=1.24.2
+PyYAML>=6.0
+./pyVoIP-1.6.4.patched-py3-none-any.whl

	frugalvox A tiny VoIP IVR framework by hackers, for hackers
	git clone git://git.luxferre.top/frugalvox.git
	Log \| Files \| Refs \| README \| LICENSE

A	.dockerignore	\|	6	++++++
A	.gitignore	\|	3	+++
A	COPYING	\|	26	++++++++++++++++++++++++++
A	Dockerfile	\|	12	++++++++++++
A	README.md	\|	279	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	example-config/actions/echobeep.py	\|	38	++++++++++++++++++++++++++++++++++++++
A	example-config/clips/beep.wav	\|	0
A	example-config/config.yaml	\|	46	++++++++++++++++++++++++++++++++++++++++++++++
A	fvx.py	\|	277	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	pyVoIP-1.6.4.patched-py3-none-any.whl	\|	0
A	requirements.txt	\|	3	+++