Hacker News has a fun discussion today about an article on The Daily WTF called The Inner JSON Effect. It’s about how a lead-programmer mad genius built an entire web application on top of svn, not by using svn to track history, but by using a separate revision number for each individual function, and having each class contain nothing but a list of the revision numbers of each function it wants as a method. It is a hilarious example of what not to do. Obviously it’s fun to laugh at something as crazy as JDSL, but it’s even more interesting to try linking it to things we might find more normal. So a few provocative thoughts:
jsonapi? I work on a lot of Ember projects, where jsonapi is the blessed format for frontend/backend communication. It seems to me awfully verbose without adding a lot of new information. Just as the WTF story adds extra indirection where you could just define functions, so jsonapi makes you list out the attributes and references—even though in regular JSON they are right there. It feels to me like the worst of the old XML excesses. I admit this is a bit trollish, but I’ve yet to understand what I get from jsonapi. Maybe someone can explain it to me.
Ansible? I’m a Chef guy and haven’t tried ansible, and I know a lot of developers-with-sysadmin-chops love it for its simplicity. Personally I’m fearful of using something so restrictive. I have similar doubts about Puppet, although at least there I know there are good “escape hatches”. I want to be able to throw in an if
statement when I need to. Here I feel my sentiment is wrong though, since these are such successful projects. So I’m curious if anyone can articulate principles that make Ansible’s YAML “work” but not TDWTF webapp’s JSON.
And going the other direction, I think sometimes—not very often—you can get huge wins out of creatively building on top of an existing tool. But it is always a risk and requires a lot of good taste. A few examples:
I built a product with user-customizable questionnaires, and we knew up front the logic for branching/skipping/computations was going to be super complicated. I could tell it was heading toward what I call a “Turing-Complete Feature”. (Enterprise Workflow Engine anyone?) Instead of trying to model and store all the rules in a database, or writing a new language (you can see the looming Inner Platform Effect), we basically made each questionnaire a server-side underscore.js template, with a simple Javascript API to access information about the current context. The feature was done quickly and we had great flexibility building the actual surveys. Our surveys were built by an internal team of not-quite-programmers, so the developers could work on other things. We even had the potential of letting customers write their own surveys, since Javascript is the lingua franca of modern programming, with good security isolation. (After all your browser uses it to run untrusted code!) Many customers told us our advanced survey functionality is what set us apart from the competition.
Another project also has user-customizable forms, with something close to an EAV pattern in the database. We wanted to offer customers reporting capabilities, so we started talking about building an API. It would be for the existing “power users”, who were doing reports and analysis using R, SAS, Tableau, and similar tools. But the API was going to be high effort, slow, and hard to join data. So we decided our API would be . . . SQL! Of course you can’t share your real tables, for security but also because if those become a public API they can never change. Plus the EAV format made them too abstract to use easily. So instead we gave each customer a private database with one table for each of their forms. This lets them ask questions in a very direct way, using tools they already know, with great performance and flexibility. (Okay I am using the past tense but this one is not actually turned on for external customers yet. Still, using a derived, flattened database for SQL reporting is not so strange, even if young CRUD developers haven’t seen it before.)
Even the idea of building on top of a version control system is appealing. (Maybe not svn though! :-) I would love a self-hosted password manager with a Firefox plugin, where my passwords are stored (encrypted) in a git repository I keep on my own server. Not only would using git save the author time, but it would be like using a “standard protocol” rather than something proprietary. Same thing for a self-hosted bookmarking service.
I think all three of these “good” examples are a little audacious, but in my opinion they all work well. So are there any general guidelines that can help us make better decisions about when to make an unexpected swerve? So much of programming comes down to taste. That leads us into pointless vi-vs-emacs wars, but it also sets good programming apart from bad. How can we make that taste teachable? Or should it be entrusted only to the old and wise? Are there guidelines we can identify, so that we can discuss them, evaluate them, and make decisions about them? (Apparently one rule is don’t write your own TCP/IP stack. :-)
One principle of my own is a “risk budget”: you have to take risks—they are a fertile source of advantage—but you should have a list of them and not take too many. Risks can include: tight deadlines, unreasonable clients/management, new clients/management, unclear requirements, new tech, new-to-me tech, high scalability requirements, etc. One risk is “cutting out a layer” like in the Javascript survey API and the SQL reporting interface. I’m not saying it’s always a good idea, just that it can be. Building your own filesystem on top of NFS is another example of cutting out a layer. I can’t speak to how well that one worked out, but it’s an interesting idea.
What are some other guidelines people use? I’m especially interested in guidelines about “when to leave CRUD behind”, or when to make a choice that seems strange at first glance but pays off. If programming were chess, when do you sac a piece?
blog comments powered by Disqus Prev: Where to put Postgres json_each? Next: Postgres custom range types for Geoip