Headline:Clay-Pidgin: A TCP based message queuing for Tcl/Tk
Date:Saturday, November 23, 2019
Posted By:Sean Woods

I've been doing a lot of work in the past few weeks with developing a common architecture for my various Tcl/Tk projects. What they all have in common is the need for a low-rent system to access a collective data store. I'm going to explain a little about the technology, first, and then I'll get into the backstory of the applications I'm working on that are going to need them. And finally I'll defend why this was worth the effort. This blog post also spawned a few other discussions in my head:

The new library is a code module is called {Clay Pidgin}, it is distributed with my Clay library distribution. The concept is that all of the processes and threads in an application (our distributed computing project) exchange notes in a central database. While that database is sql based, the API restricts most interactions to getting data or putting data into individual records. It weights in at ~1000 lines of Tcl code.

What does Clay Pidgin do?

Clay Pidgin, at its heart, is a TCP interface to an Sqlite database. But it's a stylized interface designed to enforce a particular data model. In that data model, every row has an integer key. (Which you get for free from Sqlite anyway.) Tables fall into two different styles: containers and key/values. Each row in a container table is a "record". The other fields in the container table are the fields that are most commonly indexed or searched. Each row in at key/value table is a triplet of a rowid from the container table, a key, and a value for that key.

create table monster (
monsterid integer primary key,
uuid string unique,
name string,
type string,
armor_class string,
hitpoints string
);

create table monster_keyvalue (
monsterid integer REFERENCES monster(monsterid) ON UPDATE CASCADE,
field string,
value string,
primary key (monsterid,field) ON CONFLICT replace;
);

But what about links between records? Well each classification system has it's own rules for that sort if thing. A model for a human heredity would have one structure. A model for the organization of parts in an inventory would have a different structure. In essence, links are just another form of table. For a generic case we could just get away with:

create table link (
linkid integer primary key,
table_a string,
nodeid_a integer,
table_b string,
nodeid_b integer,
type string
);

create table link_keyvalue (
linkid integer REFERENCES monster(monsterid) ON UPDATE CASCADE,
field string,
value string,
primary key (linkid,field) ON CONFLICT replace;
);

Note that in this system the links themselves are database records in their own right. The concept is rich enough that I have a separate blog entry to cover the concept in-depth.

How is it used?

The concept is that each client maintains a connection to a server on an agreed upon port. If nobody is listening on that port yet, AND if the host is "localhost", the client will spawn a background thread to act as that server. Because the data is stored as an sqlite database, if that process exits, the next client that tries to connect will pick up the mantle and become the server itself. If the server isn't localhost, the client will throw up its hands. (More on why at: Database Recoverability: Single server Vs. Replication.)

Let's suppose I'm writing a database interface to a dataset for a popular role playing game. But in addition to the database with the rules and the monsters, I also want a database of the players, their characters, and a journal of their adventures so far.

First we connect to the database.

package require clay-pidgin
set DBFILE [file normalize ~/data/dnd.sqlite]
if {![file exists $DBFILE]} {
  # Use an outside tool to seed the database
  package require initrad
  initrad::generate_database $DBFILE us-en
}
::clay-pidgin::listener PIDGIN host localhost port 2929 filename $DBFILE

Then we add a few tables:

###
# Encyclopedia of races backgrounds and classes that store human readable information
# and
PIDGIN create_table lore {
  loreid   {type integer primary_key 1}
  uuid      {type guuid unique 1}
  type     {type string}
  name     {type string unique 1}
  rules    {type dictionary}
  entry    {type text}
} keyvalue_table lore_data

PIDGIN create_table monster {
  monsterid   {type integer primary_key 1}
  uuid        {type guuid unique 1}
  name        {type string}
  type        {type string}
  cr          {type integer}
  attribute   {type dict description {Dictionary of attributes like wis, dex, etc}}
  entry       {type text  description {Human readable}}
} keyvalue_table monster_data

Now the notation may seem, at a passing glance to be similar to the SQL notation for creating tables, there's a key difference. The notation for Pidgin's create table each field as a "key" in a virtual dictionary, and all of the properties for that field are also described in a dictionary. Flags like "primary key" are given as a boolean value rather than inferred by their presence or absence. Underneath the sheets, yes, this notation is ultimately converted to an Sqlite "create table" statement. But along the way we express extra information that we can introspect later. Thinks like datatype. Not "Integer" vs. "Varchar". Basically because Sqlite doesn't care. But meaningful thing to the program.

But what about non-container uses of "CREATE TABLE". Well they get their own notation that, yes, is converted to CREATE TABLE internally, but also includes data the rest of the API can use:

# List of what climes one might encounter a monster
# Basically a table, but with no key/value list
PIDGIN create_table monster_climate {
  monsterid   {type uuid references {monster uuid}}
  climate     {type select values {abyssal artic coastline desert forest marine underdark urban}}
} type index

# Table of what monsters are likely to be on the same side
# in an encounter as a subordinate
PIDGIN create_table monster_ally {
  monsterid   {type uuid references {monster uuid}}
  minionid    {type uuid references {monster uuid}}
  probability {type probability}
} type index

Note that we don't limit indexes to one table. The "References" just developer annotations. If we had a list of legendary monsters in lore we could develop an index like:

# Index lore entries by the type of monster
PIDGIN create_table monster_legendary {
  monsterid   {type uuid references {monster uuid}}
  loreid      {type uuid references {lore uuid}}
} type index primary_key {monsterid loreid} on_conflict replace

The developer is just signalling his or her intent, the Pidgin engine does the hard work of translating that into a proper implementation. For instance, Sqlite indexes integers far more efficiently than strings. For the purposes of our index, we may decide that internally we store the integer key, and then translate that to the uuid for external api calls. In this application, because data is found across disparate sources, and likely to be replicated multiple times, we chose to link by UUID, as the rows may be inserted in any order.

The process of adding data is done through an interaction with the container:

PIDGIN put table monster  select {uuid 23951adb-3f1d-4bd3-89c2-26e52bbc8620}  data {
    name "Chromatic Dragon"
    cr 10
    attribute {
      hitpoints     {10d20}
      armor_class   {25}
      int           {18}
    }
  }

PIDGIN put table lore  select {uuid 8a52e099-7050-4fc6-97ee-862edd0501d3}  data {
    name "Tiamat"
    entry {Tiamat, head goddess of the evil dragons.}
    reference https://en.wikipedia.org/wiki/Tiamat_(Dungeons_%26_Dragons)
  }
PIDGIN insert table monster_legendary {
  monsterid 23951adb-3f1d-4bd3-89c2-26e52bbc8620
  loreid    8a52e099-7050-4fc6-97ee-862edd0501d3
}

Basically, in the monster manual we need an entry for a Chromatic Dragon. But in reality, there is only one Chromatic Dragon, Tiamat. For record keeping purposes, we should note explicitly that the lore UUID for Tiamat should direct readers to the link to the monster manual Chromatic Dragon, and Vice Versa. We use different UUIDs just in case we decide, at some point, there may be another Chromatic Dragon.

Note that our database storage engine is largely uninvolved in interpreting the data. It just stores what the application asked it to store, and its on the application to interpret that data. PIDGIN is just ensuring that my little application can do this in a way that fits the scripting language I intend to use it in.

Most applications read far more data than they write, so how will this data be used? Let us imagine our lazy developer is writing a web application that allows his adventurers to fuse data from the RPG database with their own adventures. The next snippet will be from one of the Dungeon Master's blog entries.

title: {The Ababtat}
---BEGIN CONTENT---
if {[info commands PIDGIN] eq {}} {
  ::clay-pidgin::listener create PIDGIN host localhost port 2525 filename /var/log/iritrad.sqlite
}
proc LORE {string} {
  set row [PIDGIN query {uuid,name from lore where uuid=lower(:string) or lower(name)=lower(:string) limit 1}]
  return [link /wiki/[dict get $row uuid] [dict get $row name]]
}
para {In the times of old the wicked dragon queen [LORE Tiamat] decended with her army onto
the land of [LORE {Eldritch}] and laid to waste the [LORE {Fifth Elven Dynasty}]}

# Table of random encounters
set encounters {
  1 {
    monster    {Kobold}
    count      {10}
    boss       {Kobold War Chief}
    boss_count 1
  }
  ...
}

my tag H1 {Random Encounters}
set TABLE [my tag table]
dict for {roll info} $encounters {
  set ROW [$TABLE tag row]
  set header [$ROW column content $roll]
  set items 0
  if {[dict exists $info boss]} {
    $ROW column content [dict get $info boss_count]
    $ROW column content [MONSTER_INFO [dict get $boss]]
    incr items
    set ROW [$TABLE tag row]
  }
  incr items
  $ROW column content [dict get $info monster_count]
  $ROW column content [MONSTER_INFO [dict get $info monster]]
  if {[dict exists $info ally]} {
    set ROW [$TABLE tag row]
    $ROW column content [dict get $info ally_count]
    $ROW column content [MONSTER_INFO [dict get $ally]]
    incr items
  }
  $header configure rowspan $items
}

I realize that's a lot of code for a pretty meager example. But if your eyes glazed over, we are making a system by which a blog author can trivially call on a live database within the confines of a webview. Some understanding of my Cunieform Layout System is probably also required.

What is powerful is understanding that webview can be taking place inside an object that will be destroyed after this view is done, running side by side with a video server or a fossil mirror, and inside of a thread. We may or may not have a dedicated service to manage this database. But none of those details matter. At the same time we are not clogging up our Server's IO stream with a zillion and four connections to an sqlite database that will require some time to establish and disestablish.

Most importantly, this system allows multiple threads in the same process to access that database. The problem with sqlite is that while it's brilliant at managing locks across multiple processes, managing locks across multiple threads using a pre-compiled Tcl interface is, shall we say, tricky[1]. Yes, it can be done. But it requires a hell of a lot of mutexes on different levels of the program, and that selection of architecture may not fit well with other parts of the program that may use Sqlite for other purposes.

So far the example is read-only. The case for PIDGIN is far from persuasive when we are dealing with strictly read-only data. But let is say our web sessions need to WRITE. Let us take our RPG and start to device a way for players to edit their character sheets. Or.. for the DM to record the outcome of a battle, and award experience points accordingly. And worse, if the result of the battle is enough experience for a player to level up, and now both the DM and the player need to start editing the same data while also consulting the rule book database for guidance.

I'm not going to start coding that example. Ok, who am I kidding, I will be coding that example. Just not in this blog installment.

1: And yes, I've spoken with D. Richard Hipp about this personally.