tii

Tcl-based suite for working with ii/idec protocol
git clone git://git.luxferre.top/tii.git
Log | Files | Refs | README

commit a86bb5a0e60a4f41f7bb7da75c5f4983deba0e4e
parent 5c1b6570df8d339553b025fedbb686a66bc485e2
Author: Luxferre <lux@ferre>
Date:   Wed, 23 Oct 2024 20:27:03 +0300

Migration to sqlite3 started

Diffstat:
MREADME | 39+++++++++++++--------------------------
Mtiifetch.tcl | 64+++++++++++++++++++++++++++++++++++++++-------------------------
Mtiiview.tcl | 134++++++++++++++++++++++++++-----------------------------------------------------
3 files changed, 96 insertions(+), 141 deletions(-)

diff --git a/README b/README @@ -14,7 +14,7 @@ The tii repo consists of the following parts: * tiifetch.tcl: the core ii/IDEC message fetching library and CLI utility * tiipost.tcl: the core ii/IDEC message posting library and CLI utility * tiiview.tcl: the CLI viewer of the fetched ii/IDEC messages and conferences -* tiidb: the (overridable) directory that contains all messages and echo lists +* tii.db: the SQLite3 database that contains all messages and echo lists * config.txt: the set of parameters for all HTTP requests by the tii scripts * stations.txt: the list of stations to be auto-fetched by tiifetch when none of its command-line parameters is passed @@ -28,7 +28,7 @@ Readiness status * tiifetch.tcl: ready/tested * tiipost.tcl: ready/tested * tiiview.tcl: ready/tested -* tiidb (format): ready/tested +* tii.db (format): ready/tested * config.txt (format/fields): ready/tested * stations.txt (format): ready/tested * auth.txt (format): ready/tested @@ -42,12 +42,12 @@ developed. ### Fetching the messages (tiifetch.tcl): ### -tclsh tiifetch.tcl [station_url] [echos] [dbdir] +tclsh tiifetch.tcl [station_url] [echos] [dbfile] -This command will fetch all messages into the dbdir ("tiidb" in the script dir -by default) from the station_url (can be empty, see below) based on the echo -conference names (can be delimited with slash /, comma (,) or semicolon (;)) -and create the corresponding file structure if it's missing. +This command will fetch all messages into the dbfile ("tii.db" in the script +dir by default) from the station_url (can be empty, see below) based on the +echo conference names (can be delimited with slash /, comma (,) or semicolon +(;)) and create the corresponding file structure if it's missing. Fetching is supported for the following station URL schemes and protocols: @@ -68,7 +68,7 @@ station to temporarily stop fetching from it by prepending the # sign. ### Viewing the messages from CLI (tiiview.tcl): ### -tclsh tiiview.tcl [echo_name] [filter_string] [line_width] [dbdir] +tclsh tiiview.tcl [echo_name] [filter_string] [line_width] [dbfile] If the echo_name parameter is passed, this command will write all formatted messages from the coresponding echo conference to the standard output. @@ -84,22 +84,20 @@ have no effect at all if you pass it. If the line_width parameter is omitted, the text reflows to 80 chars per line. -If the dbdir is ommitted, it defaults to "tiidb" in the script directory. +If the dbfile is ommitted, it defaults to "tii.db" in the script directory. This component is fully offline and can only work with a compatible message -database that tiifetch.tcl can generate (see "Message database format"). +database that tiifetch.tcl can generate. The filter string can take one of the following basic forms: -* h[number]: only take [number] messages from the head (start) of the list -* t[number]: only take [number] messages from the tail (end) of the list -* rh[number]: same as h but output messages from newest to oldest in the list -* rt[number]: same as t but output messages from newest to oldest in the list +* [number]: only take [number] messages from the head (start) of the list +* r[number]: same but output messages from newest to oldest in the list If [number] is 0 then it means no message limit. The reverse operation is always done after limiting the results. -e.g. rt50 will output 50 newest messages in the conference, starting from the +e.g. r50 will output 50 newest messages in the conference, starting from the most recent one. The default basic value is h0, so no filter applied will mean outputting all messages from the oldest to the newest. @@ -147,17 +145,6 @@ Any of the fields can be omitted, as well as the file itself. You can also use torsocks with any script invocation in order to fully cloak your originating IP address. -Message database format ------------------------ -The tiidb format is based upon the official ii/IDEC developer recommendations -and is fully plaintext-based, portable and very simple: - -* Message contents are stored decoded in the "msg/" subdirectory. The file - names are their plain 20-character hash IDs. -* Echo contents are stored in the "echo/" subdirectory. The file names are the - conference names verbatim, containing newline-separated message IDs that - belong to those conferences, in order they were published there. - Every echo file ends with a blank line. FAQ --- diff --git a/tiifetch.tcl b/tiifetch.tcl @@ -1,7 +1,7 @@ #!/usr/bin/env tclsh # tiifetch: fetch all data from an ii/idec station into the local text db # (see https://github.com/idec-net/new-docs/blob/master/protocol-en.md) -# Usage: tiifetch.tcl [station_url] [echos] [db_dir] +# Usage: tiifetch.tcl [station_url] [echos] [db_file] # The echo list should be delimited with slash (/), comma (,) or semicolon (;) # if no echos are specified (or "" is passed), then list.txt will be fetched # and then all missing echo content from it will be downloaded @@ -9,11 +9,12 @@ # tiidb directory in the program root with echoconfs and messages respectively # This component only fetches the messages, doesn't parse or display them # Supported protocols: HTTP, HTTPS, Gemini, Spartan, Gopher/Finger/Nex -# Depends on Tcllib for URI parsing +# Depends on Tcllib for URI parsing and SQLite3 for data storage # Created by Luxferre in 2024, released into public domain package require http package require uri +package require sqlite3 # autodetect TclTLS support and enable HTTPS request support if detected set tls_support 0 @@ -188,15 +189,18 @@ proc listcomp {a b} { } # main logic proc -proc fetchiidb {url echos dbdir dolog} { +proc fetchiidb {url echos dbfile dolog} { # trim the parameters set url [string trim $url] set echos [string trim $echos] - set dbdir [file normalize [string trim $dbdir]] - set echodir [file join $dbdir "echo"] - set msgdir [file join $dbdir "msg"] - # ensure that the necessary dirs exist - file mkdir $dbdir $echodir $msgdir + set dbfile [file normalize [string trim $dbfile]] + # prepare starting script + sqlite3 msgdb $dbfile + msgdb eval { + CREATE TABLE IF NOT EXISTS `msg` (`id` INTEGER PRIMARY KEY AUTOINCREMENT, `msgid` VARCHAR(20), `timestamp` INT, `echoname` VARCHAR(120), + `repto` TEXT, `msgfrom` TEXT, `msgfromaddr` TEXT, `msgto` TEXT, `subj` TEXT, `body` TEXT); + } + # attempt to fetch the echolist if echos are empty if {$echos eq {}} { if {$dolog eq 1} {puts "Fetching echolist..."} @@ -237,23 +241,18 @@ proc fetchiidb {url echos dbdir dolog} { # now, process the map we've built dict for {echoname msgids} $echomap { if {![string match *.* $echoname]} {continue} + if {[llength msgids] eq 0} {continue} # get the existing message IDs in the echo - set echofile [file join $echodir $echoname] - set oldmsgids "" - if [file exists $echofile] { - set oldmsgids [lmap s [split [readfile $echofile] "\n"] {string trim $s}] - } + set oldmsgids [msgdb eval {SELECT `msgid` FROM `msg` WHERE `echoname` = $echoname ORDER BY `id` ASC;}] # pre-filter the new message IDs to fetch set newmsgids [listcomp $msgids $oldmsgids] - # save the echo index file with all message IDs - set msgids [list {*}$oldmsgids {*}$newmsgids] - writefileln $echofile [string cat [string trimright [join $msgids "\n"]] "\n"] if {$dolog eq 1} {puts "Fetching [llength $newmsgids] new messages from $echoname..."} set idgroups "" set grcount 0 set localcount 0 foreach nmid $newmsgids { # iterate over new messages to group them if {$nmid ne ""} { + # insert new message ID to the echo mapping dict lappend idgroups $grcount $nmid incr localcount if {$localcount > $maxids} { @@ -270,15 +269,30 @@ proc fetchiidb {url echos dbdir dolog} { set parts [split $bline ":"] if {[llength $parts] > 1} { # valid message set mid [lindex $parts 0] - set mdata [binary decode base64 [lindex $parts 1]] - writefileln [file join $msgdir $mid] [encoding convertfrom utf-8 $mdata] + set mdata [encoding convertfrom utf-8 [binary decode base64 [lindex $parts 1]]] + set msglines [split $mdata "\n"] + set replyto "" + set tags [split [lindex $msglines 0] "/"] + if {[dict exists $tags repto]} { + set replyto [dict get $tags repto] + } else {set replyto ""} + set echoarea [string trim [lindex $msglines 1]] + set timestamp [string trim [lindex $msglines 2]] + set msgfrom [string trim [lindex $msglines 3]] + set msgfromaddr [string trim [lindex $msglines 4]] + set msgto [string trim [lindex $msglines 5]] + set subj [string trim [lindex $msglines 6]] + set msgbody [string trimright [lrange $msglines 8 end]] + msgdb eval {INSERT INTO `msg` (`msgid`, `timestamp`, `echoname`, `repto`, `msgfrom`, `msgfromaddr`, `msgto`, `subj`, `body`) + VALUES ($mid, $timestamp, $echoarea, $replyto, $msgfrom, $msgfromaddr, $msgto, $subj, $msgbody);} } } } } + msgdb close } -proc massfetch {echos dbdir dolog} { +proc massfetch {echos db dolog} { global appdir if {$dolog eq 1} {puts "No ii/idec station URL specified, using stations.txt"} set stfile [file join $appdir "stations.txt"] @@ -288,7 +302,7 @@ proc massfetch {echos dbdir dolog} { set station [string trim $station] if {$station ne "" && ![string match "#*" $station]} { if {$dolog eq 1} {puts "Fetching from $station"} - fetchiidb $station $echos $dbdir $dolog + fetchiidb $station $echos $db $dolog } } } else { @@ -305,7 +319,7 @@ set appdir [file dirname $scriptpath] if [string match *app-tiifetch $appdir] { set appdir [file normalize [file join $appdir ".." ".." ".." ]] } -set localdbdir [file join $appdir "tiidb"] +set localdb [file join $appdir "tii.db"] # populate general HTTP configuration set cfgfile [file join $appdir "config.txt"] @@ -324,19 +338,19 @@ if {[file exists $cfgfile]} { if {$argc > 0} { if {$argc > 2} { - set localdbdir [lindex $argv 2] + set localdb [lindex $argv 2] } puts "Fetching messages, please wait..." set sturl [string trim [lindex $argv 0]] if {$sturl eq ""} { - massfetch [lindex $argv 1] $localdbdir 1 + massfetch [lindex $argv 1] $localdb 1 } else { - fetchiidb $sturl [lindex $argv 1] $localdbdir 1 + fetchiidb $sturl [lindex $argv 1] $localdb 1 } puts "Messages fetched" } else { puts "Fetching messages, please wait..." - massfetch "" $localdbdir 1 + massfetch "" $localdb 1 puts "Messages fetched" } diff --git a/tiiview.tcl b/tiiview.tcl @@ -1,16 +1,9 @@ #!/usr/bin/env tclsh # tiiview: view ii/idec messages from the local text db -# Usage: tiiview.tcl [echo_name] [filter_string] [termwidth] [dbdir] +# Usage: tiiview.tcl [echo_name] [filter_string] [termwidth] [dbfile] # Created by Luxferre in 2024, released into public domain -# file read helper -proc readfile {fname} { - set fp [open $fname r] - fconfigure $fp -encoding utf-8 - set data [read $fp] - close $fp - return $data -} +package require sqlite3 # basic text reflow helper # list in, string out @@ -38,42 +31,6 @@ proc tiiflow {lines width} { return $outtext } -# parse and pretty-print the found message -proc formatmessage {msgdata msgid globalwidth} { - set globalline [string repeat = $globalwidth] - set hdrline [string repeat - $globalwidth] - set msglines [lmap s [split $msgdata "\n"] {string trimright $s}] - # parsing according to the spec, first 7 lines are: - # tags, echoarea, timestamp, msgfrom, msgfrom_addr, msgto, subj - # and then an empty line and the message body follows - set tags [split [lindex $msglines 0] "/"] - if {[dict exists $tags repto]} { - set replyto [dict get $tags repto] - } else {set replyto ""} - set echoarea [lindex $msglines 1] - set timestamp [lindex $msglines 2] - set msgfrom [lindex $msglines 3] - set msgfromaddr [lindex $msglines 4] - set msgto [lindex $msglines 5] - set subj [lindex $msglines 6] - set msgbody [tiiflow [lrange $msglines 8 end] $globalwidth] - set tz "" - set renderedts "" - catch { # because some servers don't provide timestamps - set renderedts [clock format $timestamp -format {%Y-%m-%d %H:%M:%S} -timezone $tz] - } - catch { # because pipe can be broken anytime - puts "\[$renderedts\] ii://$msgid" - puts "$echoarea - $msgfrom ($msgfromaddr) to $msgto" - if {$replyto ne ""} { - puts "Replied to: ii://$replyto" - } - puts "Subj: $subj" - puts $hdrline - puts "$msgbody$globalline\n" - } -} - # entry point set scriptpath [file normalize [info script]] set appdir [file dirname $scriptpath] @@ -81,7 +38,7 @@ set appdir [file dirname $scriptpath] if [string match *app-tiiview $appdir] { set appdir [file normalize [file join $appdir ".." ".." ".." ]] } -set localdbdir [file join $appdir "tiidb"] +set localdb [file join $appdir "tii.db"] set echoname "" set filterstr "" set twidth 80 @@ -93,58 +50,55 @@ if {$argc > 0} { set twidth [expr {int([lindex $argv 2])}] } if {$argc > 3} { - set localdbdir [lindex $argv 3] + set localdb [lindex $argv 3] } set echoname [string trim [lindex $argv 0]] } if {$twidth < 20} {set twidth 80} if {$filterstr eq ""} {set filterstr "h0"} -set msgdir [file join $localdbdir "msg"] -set echodir [file join $localdbdir "echo"] -if {$echoname eq ""} { # list the echodir - set echos [glob -tails -directory $echodir -nocomplain -types f "*.*"] +# open the message db +sqlite3 msgdb $localdb -readonly true +if {$echoname eq ""} { # list the echonames + set echos [msgdb eval {SELECT DISTINCT `echoname` FROM `msg`;}] puts [join [lsort $echos] "\n"] } else { # fetch the actual contents - set echofile [file join $echodir $echoname] - if {[file exists $echofile]} { - set msglist [split [readfile $echofile] "\n"] - set filters [split $filterstr "/"] - set basicmod [string trim [lindex $filters 0]] - set filterregex {} - if {[llength $filters] > 1} { - set filterregex [string trim [lindex $filters 1]] - } - set doreverse 0 - if {[string first r $basicmod] > -1} {set doreverse 1} - set dotail 0 - if {[string first t $basicmod] > -1} {set dotail 1} - set numitems 0 - if {[regexp {\d+} $basicmod foundnum]} {set numitems $foundnum} - # perform the element filtering - if {$numitems > 0} { - incr numitems -1 - if {$dotail eq 1} { - set msglist [lrange $msglist end-$numitems end] - } else { - set msglist [lrange $msglist 0 $numitems] - } - } - if {$doreverse eq 1} { - set msglist [lreverse $msglist] + set filters [split $filterstr "/"] + set basicmod [string trim [lindex $filters 0]] + set filterregex {} + if {[llength $filters] > 1} { + set filterregex [string trim [lindex $filters 1]] + } + set doreverse 0 + if {[string first r $basicmod] > -1} {set doreverse 1} + set numitems 0 + if {[regexp {\d+} $basicmod foundnum]} {set numitems $foundnum} + set query {SELECT * FROM `msg` WHERE `echoname` = $echoname} + if {$filterregex ne {}} { + append query { AND (`body` LIKE $filterregex OR `subj` LIKE $filterregex) } + } + append query { ORDER BY `timestamp` } + if {$doreverse eq 1} {append query DESC} else {append query ASC} + if {$numitems > 0} {append query { LIMIT $numitems}} + append query ";" + msgdb eval $query msg { + set globalline [string repeat = $twidth] + set hdrline [string repeat - $twidth] + set tz "" + set renderedts "" + catch { # because some servers don't provide timestamps + set renderedts [clock format $msg(timestamp) -format {%Y-%m-%d %H:%M:%S} -timezone $tz] } - foreach msgid $msglist { # iterate over the list after filtering - set msgid [string trim $msgid] - if {$msgid ne ""} { - set msgfile [file join $msgdir $msgid] - if {[file exists $msgfile]} { - set msgdata [readfile $msgfile] - set pass 1 - if {$filterregex ne {}} { - set pass [regexp -line -nocase -- $filterregex $msgdata] - } - if {$pass eq 1} {formatmessage $msgdata $msgid $twidth} - } + catch { # because pipe can be broken anytime + puts "\[$renderedts\] $msg(msgid)" + puts "$msg(echoname) - $msg(msgfrom) ($msg(msgfromaddr)) to $msg(msgto)" + if {$msg(repto) ne ""} { + puts "Replied to: $msg(repto)" } + puts "Subj: $msg(subj)" + puts $hdrline + puts "[tiiflow $msg(body) $twidth]\n\n$globalline\n" } - } else {puts "This echo conference doesn't exist in the local DB!"} + } } +# close the db +msgdb close