Initial upload - databankr - Casio Databank/Telememo record format encoder/decoder

commit d23672269e3ebcbf1361d022eb631591927e84f9
Author: Luxferre <lux@ferre>
Date:   Tue, 21 May 2024 14:51:05 +0300

Initial upload

Diffstat:
A README  | 138 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A config.json  | 34 ++++++++++++++++++++++++++++++++++
A databankr.py  | 187 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

3 files changed, 359 insertions(+), 0 deletions(-)
diff --git a/README b/README
@@ -0,0 +1,138 @@
+Databankr: store arbitrary data inside Casio Databank/Telememo watches
+----------------------------------------------------------------------
+This is a Python utility program that allows you to encode arbitrary pieces
+of information into the format that could be entered into Casio watches with
+the databank/telememo function support, as well as retrieve and decode this
+information later. It supports both raw and hexadecimal data, as well as both
+file and standard input/output.
+
+== Usage ==
+
+The program can be run like this (as always, use -h flag to see live help):
+
+python3 databankr.py [enc/dec] [-t TYPE] [-i INPUT_FILE] [-o OUTPUT_FILE]
+                     [-c CONFIG] [-m MODULE] [-l EXPECTED_LENGTH]
+
+where the mode (enc/dec) parameter is mandatory, and the optional ones are:
+
+* -t: data type to be encoded or decoded (bin or hex, bin by default,
+      which means raw data)
+* -i: source input file path (default "-", which means stdin)
+* -o: result output file path (default "-", which means stdout)
+* -c: configuration file path (default "config.json" in the current dir)
+* -m: module configuration code according to your watch (default 2515-lat)
+* -l: expected decoded data length in bytes (default is 0 - no limits)
+
+In the encoding mode, the program outputs (to a file or the standard output)
+a set of double newline separated records that you need to enter into the
+databank/telememo function of your watch. Note that the amount of characters
+is fixed for every model, so if the record name/number contains less of them,
+then you must enter whitespaces into the rest. The input to the encoding mode
+can also be a file or the standard input, and the data type flag (-t) defines
+whether this is a raw binary file or a hexadecimal string.
+
+In the decoding mode, the program expects a set of double newline separated
+databank/telememo records from a file or the standard input and outputs the
+reconstructed data into a file or the standard output according to the data
+type flag. By default, the output is bit-aligned with the number of records
+and the amount of bits held by each of them, so the excess data are filled
+with null bytes. If you know the exact byte count of the source data, you can
+pass the -l flag to strip the restored information to the desired length.
+
+If you choose to enter the data manually via the standard input, press Ctrl+D
+when done. This works in both modes.
+
+Keep in mind that the records are case-sensitive: you must only enter the
+letters in exactly the same case it has been specified in the configuration
+section for the module of your choice. Most of the time, it will be upper case
+for the name parts of the records.
+
+== Configuration format ==
+
+The basic configuration file is shipped with Databankr and is suitable for
+several popular databank-enabled Casio models, but you can always extend it
+to support other ones if you know the structure of their records. The file is
+a normal JSON object where the keys are module identifiers (not necessarily
+matching the only module it can work on), and the values are module config
+objects. Each such object contains the following fields:
+
+* description: a human-readable module description displayed by Databankr
+* namelen: the name field length in a databank record (in characters)
+* numberlen: the number field length in a databank record (in characters)
+* alpha: the entire character set that can be entered into a name field
+* digit: the entire character set that can be entered into a number field
+* index: a subset of the "alpha" charset sorted alphabetically that's used
+         for record indexing; its length must be equal to the total amount
+         of records in the watch
+
+The namelen and numberlen fields are integers, all others are strings. The
+"index" field is necessary because all Databank/Telememo-enabled Casio watches
+utilize automatic sorting, so, to preserve the data order, the first character
+in the name part of the record actually is used to index the records and not
+store the data payload itself.
+
+== FAQ ==
+
+- How and why this was invented?
+
+Databankr started in early 2023 as a JavaScript library with a different name,
+Telememer, that only catered to Casios with 2757 and 5574 modules (like AW-80,
+AMW-870 and so on). It was created as an attempt to turn the Telememo function
+of these watches into a kind of universal storage for arbitrary binary data,
+as such a storage is pretty much unhackable and only accessible to those who
+physically uses the watch. Besides, a phone or even a paper notebook are much
+more likely to be stolen, searched or confiscated than a cheap wristwatch from
+Casio. Then, in mid-2024, Databankr was created in its current form of a CLI
+application written in Python 3, supporting several different Casio modules
+out of the box.
+
+- How much data can we store this way?
+
+The overall formula of bits per record looks like this:
+
+bits = |number_len * log2(digits)| + |(name_len - 1) * log2(chars)|,
+
+where "digits" is how many different digits we can enter into the number part,
+"chars" is how many different characters we can enter into the name part, and
+"number_len" and "name_len" are the length of the number and name fields
+respectively. Then we can multiply this number by the amount of records and it
+will be the total storage. For example, with the default "2515-lat" module
+configuration (which corresponds to a Casio DB-36/DB-360 watch set to English 
+or Dutch language), we can store 95 bits per record which translates to 2850
+bits or 356 bytes of information in the entire databank.
+
+- What kind of information can I store in such limited space?
+
+If you happen to own an old and more advanced Casio Databank model (with 50,
+100 or even 150 records), you'll find even more possibilities (after creating
+your own configuration section for that model, of course). However, even 2850
+bits is still over 2048, which means you can store several cryptographic keys,
+important URLs and passwords (in an encrypted fashion) or other information
+that you don't need to glance at but need to be able to recover if you're only
+storing it on this particular Casio. Besides databank capacity, the only real
+tradeoff is your own readiness to manually enter the records into the watch
+and then retype them into the program (or a file) whenever you need to recover
+the information.
+
+- What happens if I enter more data that can be stored on encoding?
+
+It will be truncated prior to converting to records. Only one record set is
+supported at the moment.
+
+- Which modules are supported as of now?
+
+Currently, Databankr comes with the configurations for the following modules:
+
+* 2747: Casio modules 2747 and 5574
+* 2515-lat: Casio module 2515, basic Latin characters
+* 2515-cyr: Casio module 2515, Cyrillic characters
+* 2515-por: Casio module 2515, Portuguese characters 
+
+Even though the program itself is considered complete, the configuration list
+is expected to grow in the future. Of course, everyone is encouraged to append
+their own configurations according to the "Configuration format" section.
+
+== Credits ==
+
+Created by Luxferre in 2024. Released into public domain with no warranties.
+
diff --git a/config.json b/config.json
@@ -0,0 +1,34 @@
+{
+  "2747": {
+    "description": "Casio 2747/5574 modules",
+    "namelen": 8,
+    "numberlen": 16,
+    "alpha": " ABCDEFGHIJKLMNOPQRSTUVWXYZ@!?',.;:()/+-0123456789",
+    "digit": " 0123456789()+-",
+    "index": "ABCDEFGHIJKLMNOPQRSTUVWXY12345"
+  },
+  "2515-lat": {
+    "description": "Casio 2515 module - basic Latin (English/Dutch)",
+    "namelen": 8,
+    "numberlen": 15,
+    "alpha": " ABCDEFGHIJKLMNOPQRSTUVWXYZ@!?'.:/+-0123456789",
+    "digit": "-0123456789() ",
+    "index": "ABCDEFGHIJKLMNOPQRSTUVWXY12345"
+  },
+  "2515-cyr": {
+    "description": "Casio 2515 module - Cyrillic",
+    "namelen": 8,
+    "numberlen": 15,
+    "alpha": " АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ@!?'.:/+-0123456789",
+    "digit": "-0123456789() ",
+    "index": "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЭЯ12345"
+  },
+  "2515-por": {
+    "description": "Casio 2515 module - Portuguese",
+    "namelen": 8,
+    "numberlen": 15,
+    "alpha": " AÁÀÂÃBCÇDEÉÊFGHIÍJKLMNOÓÔÕPQRSTUÚVWXYZ@!?'.:/+-0123456789",
+    "digit": "-0123456789() ",
+    "index": "ABCDEFGHIJKLMNOPQRSTUVWXY12345"
+  }
+}
diff --git a/databankr.py b/databankr.py
@@ -0,0 +1,187 @@
+#!/usr/bin/env python3
+# Databankr: a CLI tool to encode arbitrary data to Casio Databank/Telememo 
+# watches and restore it from there
+# Created by Luxferre in 2024, released into public domain
+
+import sys, math, json, re
+
+# universal base conversion methods
+
+def to_base(number, base, charset):
+  if not number:
+    return charset[0]
+  res = ''
+  while number > 0:
+    res = charset[number % base] + res
+    number //= base
+  return res
+
+def from_base(number, base, charset):
+  if number == charset[0]:
+    return 0
+  res = 0
+  for c in number:
+    ind = charset.find(c)
+    if ind > -1:
+      res = res * base + ind
+    else:
+      return 0
+  return res
+
+# record preparation methods
+
+# create a padded field from a numeric value
+def val_to_field(n, charset, padlen):
+  return to_base(n, len(charset), charset).rjust(padlen, charset[0])
+
+# main encoding method
+def encode(data: bytes, config):
+  effective_namelen = config['namelen'] - 1
+  alphabase = len(config['alpha'])
+  numbase = len(config['digit'])
+  indexsize = len(config['index'])
+  # calculate record estimation
+  name_part_bits = int(effective_namelen * math.log2(alphabase))
+  number_part_bits = int(config['numberlen'] * math.log2(numbase))
+  record_bits = name_part_bits + number_part_bits # single record capacity
+  max_bits = record_bits * indexsize # overall databank capacity
+  # start processing
+  bitstr = ''.join(f'{c:08b}' for c in data) # create an aligned bitstring
+  bitlen = len(bitstr) # overall bitstring length
+  if bitlen > max_bits: # truncate the excess
+    bitlen = max_bits
+    bitstr = bitstr[:max_bits]
+  rec_len = int(math.ceil(bitlen / record_bits)) # message length in records
+  records = [] # list of lists
+  pos = 0 # current position tracker
+  for i in range(0, rec_len): # slice over the bitstring
+    namebin = bitstr[pos:pos+name_part_bits].ljust(name_part_bits, '0')
+    pos += name_part_bits
+    numbin = bitstr[pos:pos+number_part_bits].ljust(number_part_bits, '0')
+    pos += number_part_bits
+    # now we only got binary representation of both parts
+    # let's convert them to bigints and then to the actual records
+    namefield = config['index'][i] + val_to_field(int(namebin, 2), 
+                  config['alpha'], effective_namelen)
+    numfield = val_to_field(int(numbin, 2), 
+                  config['digit'], config['numberlen'])
+    records.append([namefield, numfield])
+  return records
+
+# main decoding method
+def decode(records, config, expected=0):
+  alphabase = len(config['alpha'])
+  numbase = len(config['digit'])
+  effective_namelen = config['namelen'] - 1
+  name_part_bits = int(effective_namelen * math.log2(alphabase))
+  number_part_bits = int(config['numberlen'] * math.log2(numbase))
+  bitstr = '' # bit string storage
+  for rec in records: # iterate over records
+    nameval = from_base(rec[0][1:], alphabase, config['alpha'])
+    numval = from_base(rec[1], numbase, config['digit'])
+    bitstr += format(nameval, '0b').zfill(name_part_bits)
+    bitstr += format(numval, '0b').zfill(number_part_bits)
+  # reconstruct the raw data from the bitstring and return it
+  datalen = int(math.ceil(len(bitstr)/8)) # estimate the data length in bytes
+  if expected > 0: # truncate if expected length is specified
+    datalen = expected
+  data = b'' # raw data placeholder
+  for i in range(0, datalen): # iterate over byte slices
+    ind = i << 3
+    data += int(bitstr[ind:ind+8].ljust(8, '0'), 2).to_bytes(1, 'big')
+  return data
+
+def auto_int(x): # helps to convert from any base natively supported in Python
+    return int(x,0)
+
+if __name__ == '__main__': # main app start
+  from argparse import ArgumentParser
+  parser = ArgumentParser(description='Databankr: Casio Databank/Telememo record format encoder/decoder', epilog='(c) Luxferre 2024 --- No rights reserved <https://unlicense.org>')
+  parser.add_argument('mode', help='Operation mode (enc/dec)')
+  parser.add_argument('-t', '--type', type=str, default='bin', help='Data type (bin/hex, default bin)')
+  parser.add_argument('-i', '--input-file', type=str, default='-', help='Source input file (default "-", stdin)')
+  parser.add_argument('-o', '--output-file', type=str, default='-', help='Result output file (default "-", stdout)')
+  parser.add_argument('-c', '--config', type=str, default='config.json', help='Configuration JSON file path (default config.json in current working directory)')
+  parser.add_argument('-m', '--module', type=str, default='2515-lat', help='Module configuration code according to your watch (default 2515-lat)')
+  parser.add_argument('-l', '--expected-length', type=auto_int, default=0, help='Expected decoded data length in bytes (default 0 - no limits)')
+  args = parser.parse_args()
+
+  # detect the mode
+  if args.mode == 'enc':
+    flow = 'enc'
+  elif args.mode == 'dec':
+    flow = 'dec'
+  else:
+    print('Invalid mode! Please specify enc or dec!')
+    exit(1)
+
+  # load the configuration file
+  try:
+    f = open(args.config)
+    confdata = json.load(f)
+    f.close()
+  except:
+    print('Config file missing or invalid!')
+    exit(1)
+
+  # load the module config
+  if args.module in confdata:
+    moduleconfig = confdata[args.module]
+    print('Loaded the configuration for %s' % moduleconfig['description'])
+  else:
+    print('Module configuration %s not found in the config file!' % args.module)
+    exit(1)
+
+  # load the input data
+  try:
+    if args.input_file == '-':
+      infd = sys.stdin
+    else:
+      infd = open(args.input_file, mode='rb')
+    indata = infd.read()
+    if infd != sys.stdin:
+      infd.close()
+  except:
+    print('Error reading the input data!')
+    print(sys.exc_info())
+    exit(1)
+
+  # run the selected flow
+  if flow == 'enc': # encoding flow
+    if args.type == 'hex': # convert the input data if the type is hex
+      indata = bytes.fromhex(re.sub(r"[^0-9a-fA-F]", "" ,indata.decode('utf-8')))
+    records = encode(indata, moduleconfig)
+    outdata = ''
+    for rec in records: # separate each record with double newline
+      outdata += rec[0] + '\n' + rec[1] + '\n\n'
+  else: # decoding flow
+    # parse the records
+    rawrecs = indata.decode('utf-8').split('\n\n')
+    records = [] # records will be stored here
+    for pairstr in rawrecs:
+      if len(pairstr) > 0: # exclude empty records
+        pair = pairstr.split('\n') # get raw pair and then left-adjust
+        records.append([pair[0].ljust(moduleconfig['namelen'], ' '),
+                        pair[1].ljust(moduleconfig['numberlen'], ' ')])
+    # decode the records
+    outdata = decode(records, moduleconfig, args.expected_length)
+    if args.type == 'hex': # convert the output data if the type is hex
+      outdata = outdata.hex()
+  # now, write the output file
+  try:
+    if args.output_file == '-':
+      outfd = sys.stdout
+      outfd.write(outdata)
+    else:
+      outfd = open(args.output_file, mode='wb')
+      if type(outdata) == 'str':
+        outdata = outdata.encode('utf-8')
+      outfd.write(outdata)
+    if outfd != sys.stdout:
+      outfd.close()
+  except:
+    print('Error writing the output file!')
+    print(sys.exc_info())
+    exit(1)
+
+  print('\nOperation complete')

	databankr Casio Databank/Telememo record format encoder/decoder
	git clone git://git.luxferre.top/databankr.git
	Log \| Files \| Refs \| README

A	README	\|	138	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	config.json	\|	34	++++++++++++++++++++++++++++++++++
A	databankr.py	\|	187	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++