Sobyk Needs a Database, not a Filing System

As I'm comparing various solutions for how project index packages, I am finding that the conventional technique of laying everything out in a neat file system and tagging files with funny names is not going to work for our purposes.

While a file system can be represented in a database, not every database can be represented by is a file system. Projects try, believe me, they try. But eventually they all struggle with some esoterica that causes a near and total breakdown of law and order. And it's usually something random.

For Sobyk, I am looking more for the sort of system that a librarian would employ. Now we think: oh a library database has to be easy! I mean, look at the card catalogue. Index them bug subject, give me a field for the shelf, boom. And about half of you that know what I'm talking about rolled your eyes. It's not that easy.

Yes, libraries tracks books (actually "manuscripts) by subject. But libraries ALSO track periodicals, which require a completely different sort of filing system. Truth be told, only non-fiction books are filed by subject. Some larger books contain multiple subjects and may be in the reference section. Fiction books technically have a "Genre", rather than a subject. And one book may technically be classified in several genres. So for them we index by author instead.

And now you need a database of authors. Not all Authors were purely fiction writers, and the interested patron may what to see their offerings in either filing system. Take, for example, Isaac Asimov. Well he was prolific in BOTH fiction and nonfiction, and across many different subjects. An not only do you need books by the authors, you need books about the authors. Biographies are a completely different filing system, yet again.

Larger libraries and book stores also track the publishers of books. Editions 1 through 5 may have been published by one company. Edition 6 could be from a completely different company. And depending on your luck, one of the two may not be in business anymore if you need a new copy.

But manuscripts and periodicals are also physical objects. Librarians need a database to track where the physical copies are. Are the on the shelf? Are they checked out to a patron? Were they water damaged in a flood and disposed of years ago?

While all of these pieces of information relate to one another, they all require a separate filing systems. And as you can saw (I hope), one object can be tracked by several filing systems at once.

Software libraries are no different. If anything they are slightly more complex. At least when someone puts out a new edition of a book, or issue of a periodical, you get an ISBN number. At best, a system for tracking packages may get new version announcement. Usually though, some poor soul (or automation script) has to dig through several feeds to find the one that particular author frequents. Assuming he or she announces new versions at all. Some authors (side eye at github) just check in changes. No idea about which checkin is major or minor or even ready for exposure to daylight.

What is needed is a balance to allow a curator to clean up neglect, while at the same time provide authors of packages a clear way to update their own information. This involves a lot of record keeping without actually modifying anything. Something virtually every package manager out there completely fails at. Especially if it has to commit every change to a file system.

I'm not sure what the ultimate solution is. I have idea, and I'll be hammering them out. On one hand I know the guts will not be simple. But with the right design the human interface should be pleasant. Well, it has to be pleasent. If it's not fun to work with, or at least good at achieving its stated goal, it will not be used at all.