Adding a DOM to tcllib's HTTPD module

Created: 2020-07-31 20:25

Last Modified: 2020-07-31 21:18

Many of you may or may not know that I am the principle author of the httpd module for Tcllib. The project started ass an effort to turn an older package, tclhttpd into something that was object oriented and friendlier to modify the dispatch engine for. What I ended up producing was a nice engine that utilizes TclOO and coroutines added in Tcl 8.6. But the internals were so radically different that I couldn't rightfully call it version 4.0. No sites that were built around the old 3.5 version could use the code without throwing out all of their existing mods.

So instead of pitching this new server as a new version of Tclhttpd, it is simply considered a successor to tclhttpd. Inspired, but not interoperable. But does load nicely as a package, and able to be shipped with Tcllib.

Trying to explain web server internals is like trying to explain sex. There are people that actually do it, people that talk about doing it, and people who insist they know the best way to do it. These are usually three different populations of people.

So I'm going to focus on one aspect of web content generation.

When you read this page, your browser downloads a single Hypertext Markup Language (HTML) document. If you have ever tried to write a web page, you know that inside that document are a pile of tags. For my past projects, my server would assemble all of those tags in order. Like an assembly line.

append output <HTML>
append output {<HEAD><TITLE>MY PAGE</TITLE></HEAD>}
append output <BODY>
append output {<H1>THIS IS MY PAGE!</H1>}
append output {READ IT'S CONTENT AND DISPAIR!}
append output {</BODY>}
append output {</HTML>}
return $output

This model works beautifully if your web server has all of the information about a document to be delivered right in front of it. And, in a perfect world, this would always be the case. But as luck would have it practically every meaningful application that requires generating HTML documents on the fly are horrifically imperfect.

The bugaboo generally comes when designing site navigation. A good application adjust what the users see by where they are in the maze of documents. Also, different users have different levels of access to different functions. Fleshing out what options to display, or alarms to throw on the screen, are generally a process of discovery. Now, one approach could be to load the data all in one go, and have your logic branch during the assembly line process:

set user [my <server> session get username]
append output <HTML>
if {$user eq "sean"} {
  append output {<HEAD><TITLE>MY PAGE</TITLE></HEAD>}
} else {
  append output {<HEAD><TITLE>SEAN'S PAGE</TITLE></HEAD>}
}
append output <BODY>
if {$user eq "sean"} {
  append output {<H1>THIS IS MY PAGE!</H1>}
  append output {READ IT'S CONTENT AND DISPAIR!}
} else {
  append output {<H1>THIS IS THE PAGE OF SEAN!</H1>}
  append output {GET THE HELL OFF OF MY LAWN!}
}
append output {</BODY>}
append output {</HTML>}
return $output

I've manage to write many a website using a structure like that. And while it can get pretty tangled and nasty, it is workable. And the performance isn't too, too terrible.

The problem is the sometimes, late in the game, a later chunk of the code realizes something that would require shimming some code into an earlier part of the document. In many other languages, HTML pages are not built serially. They are constructed using something called the Document Object Model (DOM). In a DOM, each of the tags lives as a data structure that can be modified right up until the end. And only at the final step is a DOM converted into a stream of text.

set h [my <document> tag HEAD]
set title [$g tag TITLE]
set b [my <document> tag BODY]
set headline [$b tag H1]
set content [$b tag div]

set user [my <server> session get username]
if {$user eq "sean"} {
  $title      content {MY PAGE}
  $headline   content {THIS IS MY PAGE!}
  $content    content {READ IT'S CONTENT AND DISPAIR!}
} else {
  $title      content {SEAN'S PAGE}
  $headline   content {THIS IS THE PAGE OF SEAN!}
  $content    content {GET THE HELL OFF OF MY LAWN!}
}

As you can see with a DOM, I do a lot less hand grafting tags, and a lot making decisions and acting on them all in one go. For complex database projects this sort of flexibility is essential, as you may not want to hit the database multiple times across multiple decision trees.

Functionally there is also something else important. Note that my code can accept literal strings without having to worry if the string to be displayed is possibly going to be interpreted as HTML command characters. If I was to say Tcl > everything, that greater than sign could be interpreted as a premature end of a tag. As databases normally have data entered by people who have no idea it will ultimately end up online, you have to be very clear with your document about what is actually data and what is supposed to be layout code. A DOM helps because the content method makes explicit that what I intended was a value.