Headline:Database Recoverability: Single server Vs. Replication
Date:Sunday, November 24, 2019
Posted By:Sean Woods

This entry started as an offshoot of another idea presented in: Clay-Pidgin: A TCP based message queuing for Tcl/Tk.

Why have remotes fail? Why not replicate the database to everyone? Because replication introduces more complexity than its worth. Not just in development time. In reliability as well. Database replication schemes are complicated, and change as single point of failure into a steady stream of failures. For a single database server to fail, the process has to crash, or the database itself may become corrupt, or the server operating system may crash in some irrecoverable way. Those are all pretty obvious, and we actually eliminate the process failure by allowing any process that accesses the database on localhost to promote itself to the server. In this day and age complete operating system failure is somewhat rare (or if it's not, why are you running on that platform?) Low level database corruption is almost unheard of in Sqlite, and easily mitigated with a backup strategy.

Database replication, on the other hand, eliminates the hardware and corrupt file causes of problems. Our rarest causes of failure. In exchange it introduces a whole host of OTHER ways for the system to fail. Communication failures are far, far more common. All replication schemes demand a decent pipe between he replicated servers. Depending on your scheme, if a client can connect to any one of the replicating servers, we introduce the possibilities for conflicts to arise from normal operations. If we have once master and hot spares ready to take their place, is this really an advantage over having a single server with backup files?

As an exercise search for "Mysql server failure." Now search for "Mysql Replication Failure." Note that the Mysql Server Failure search results are generally installation problems, or black swan events that took out an entire operating system. Replication failure results all sound like an otherwise reliable system suddenly going rogue. Also note that recovery from Mysql server failure is basically "grab new server/fix old server" and then "Restore files from backup." Replication failure recovery is a religious right to a hateful God worthy of Lovecraft.