<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="https://illuminatedcomputing.com/">
  <id>https://illuminatedcomputing.com/</id>
  <title>Illuminated Computing</title>
  <updated>2026-01-02T00:00:00Z</updated>
  <link rel="alternate" href="https://illuminatedcomputing.com/" type="text/html"/>
  <link rel="self" href="https://illuminatedcomputing.com/atom.xml" type="application/atom+xml"/>
  <author>
    <name>Paul A. Jungwirth</name>
    <uri>https://illuminatedcommputing.com/</uri>
  </author>
  <entry>
    <id>tag:illuminatedcomputing.com,2026-01-02:/posts/2026/01/nvidia-geforce-rtx-5070-ti-on-linux/</id>
    <title type="html">GeForce RTX 5070 Ti on Linux (Debian 13)</title>
    <published>2026-01-02T00:00:00Z</published>
    <updated>2026-01-02T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2026/01/nvidia-geforce-rtx-5070-ti-on-linux/" type="text/html"/>
    <content type="html">
&lt;p&gt;I got a GeForce RTX 5070 Ti GPU for Christmas. I want to do some AI stuff with it. Maybe I’ll have something about that to post soon. But getting it to work on my system was really a chore! I’m running Debian 13 with a ROG STRIX B550-F motherboard (not the wifi version).&lt;/p&gt;

&lt;p&gt;After I put it in and tried to boot, I saw Grub’s selection screen, but almost immediately my monitor went black, and within 10 or 20 seconds it lost its connection entirely.&lt;/p&gt;

&lt;p&gt;By adding an extra &lt;code&gt;echo&lt;/code&gt; line after &lt;code&gt;initrb&lt;/code&gt; in grub I could see that at least that part was running. But it was dying soon after, and I had no messages to diagnose the problem. Doing something like &lt;code&gt;journalctl -b -1&lt;/code&gt; gave nothing (or rather info from the boot prior).&lt;/p&gt;

&lt;p&gt;Adding modeset stuff to grub didn’t help. Adding &lt;code&gt;noacpi&lt;/code&gt; or &lt;code&gt;pci=noacpi&lt;/code&gt; didn’t help. Turning off ACPI altogether with &lt;code&gt;acpi=off&lt;/code&gt; let me boot though. I didn’t have working graphics, but I could Ctrl-Alt-F3 to a console and run commands.&lt;/p&gt;

&lt;p&gt;Even after installing nVidia’s driver, I didn’t have a working display. I followed &lt;a href="https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/index.html"&gt;their instructions for the open-source driver&lt;/a&gt;, which gave me version 590.48.01. (Their old closed-source driver doesn’t support newer cards like the 5070 Ti.) This &lt;a href="https://www.reddit.com/r/debian/comments/1jyzgrx/the_correct_way_to_install_newer_nvidia_drivers/"&gt;reddit thread&lt;/a&gt; helped too. Basically I did this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;apt-get install linux-headers-amd64
apt-get install linux-headers-$(uname -r)
wget https://developer.download.nvidia.com/compute/cuda/repos/debian13/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
apt-get update
apt-get install nvidia-open&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then I could run &lt;code&gt;nvidia-smi&lt;/code&gt;, and &lt;code&gt;lsmod | awk 'NR==1||/nvid/'&lt;/code&gt; showed &lt;code&gt;nvidia_uvm&lt;/code&gt;, &lt;code&gt;nvidia&lt;/code&gt;, and &lt;code&gt;drm&lt;/code&gt;. Also I could see from &lt;code&gt;mokutil --sb-state&lt;/code&gt; that I was running with SecureBoot disabled, which Google AI thinks is a good idea.&lt;/p&gt;

&lt;p&gt;But my display problems continued. When I ran &lt;code&gt;startx&lt;/code&gt; by hand I could see a segfault in &lt;code&gt;/var/log/Xorg.0.log&lt;/code&gt;. Later I got this to go away (perhaps it was adding grub commands to blacklist nouveau sooner?), and then X would start, and with &lt;code&gt;pstree&lt;/code&gt; I could see that xfce was running and even restoring commands from my session, like terminals and firefox. But the display wouldn’t switch: I was still looking at the console, with some messages printed by X.&lt;/p&gt;

&lt;p&gt;In the boot log I saw errors about not finding the DRM device, and before that there were messages about IRQ problems for the PCI device. There was a recommendation to use &lt;code&gt;pci=biosirq&lt;/code&gt;, but that didn’t help. (I think that hint is for really old systems anyway.) I figured probably &lt;code&gt;acpi=off&lt;/code&gt; was messing things up.&lt;/p&gt;

&lt;p&gt;I tried upgrading my motherboard BIOS. It was a few years out-of-date. I installed the latest version (3635, published 2025-11-06 with build date 2025-09-30). After unzipping, I renamed the file to &lt;code&gt;RSB550FG.CAP&lt;/code&gt;. Then I formatted an 8GB USB drive with &lt;code&gt;mkfs.vfat --mbr=yes -I -F 32 /dev/disk/by-id/usb-SanDisk_Cruzer_Fit_4C532000030816101033-0\:0&lt;/code&gt;. Then I copied the file to the root of the drive, ran &lt;code&gt;sync&lt;/code&gt;, and yanked the USB stick. With case power on but the computer off, I stuck the USB into the Flash BIOS port in the back and held down the Flash BIOS button for a few seconds. When I let go, the USB activity light started blinking. If I shaded the motherboard panel I could see that the Flash BIOS light was blinking too. So I let it run for 5 or 10 minutes, and when I checked the blinking had stopped. I booted, and the BIOS still worked!&lt;/p&gt;

&lt;p&gt;Somehow this made grub much slower. Every keystroke took a second or two to register, and I easily lost letters. I tried some BIOS tweaks, like turning off Legacy USB support. But the only thing that worked was changing &lt;code&gt;GRUB_GFXMODE&lt;/code&gt; to &lt;code&gt;1024x768&lt;/code&gt;. Actually &lt;code&gt;640x480&lt;/code&gt; was even better, but all the lines wrapped and it was unreadable. I’m pretty sure a lower resolution shouldn’t be needed. I think flashing the BIOS wiped all my old settings, so maybe there is some better way to restore grub’s performance, but I haven’t found it yet.&lt;/p&gt;

&lt;p&gt;Anyway, the BIOS upgrade didn’t solve my problems. Allowing ACPI still caused a failure right after starting the kernel. I suspected that a proper fix to &lt;code&gt;acpi=off&lt;/code&gt; would do the trick. I found &lt;a href="https://medium.com/@mbonsign/complete-guide-for-nvidia-rtx-5070-ti-on-linux-with-pytorch-358454521f04"&gt;this article about using the 5070 Ti on Linux&lt;/a&gt;, and I started changing BIOS settings. The issue turned out to be “Above 4G Encoding” (under “Advanced &amp;gt; PCI Subsystem Settings”). For me it was disabled. When I enabled it, suddenly everything worked.&lt;/p&gt;

&lt;p&gt;That’s it. I haven’t tried installing &lt;code&gt;cuda-toolkit&lt;/code&gt; yet, but that’s next. I’ve had an AMD card for years. A year or two ago I got ROCm compiled, after much struggle, but pytorch would just segfault. I’m excited to play around with AI a bit now.&lt;/p&gt;

&lt;p&gt;Anyway, “Above 4G Encoding” was definitely not in Google AI’s many, many recommendations as I searched for solutions, so perhaps this blog entry will go into its next training and help someone out.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-10-23:/posts/2024/10/bison-shift-reduce-conflict/</id>
    <title type="html">Solving bison shift/reduce conflicts in Postgres</title>
    <published>2024-10-23T00:00:00Z</published>
    <updated>2024-10-23T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/10/bison-shift-reduce-conflict/" type="text/html"/>
    <content type="html">
&lt;p&gt;I had to fix some shift/reduce conflicts in the Postgres bison grammar recently.&lt;/p&gt;

&lt;p&gt;I’ve never done this before, so it was a learning experience. Maybe my story will help some other new Postgres contributor—or anyone struggling with this technology that is central to computer science but for many day-to-day programmers seldom-used.&lt;/p&gt;

&lt;p&gt;Back in 2018 I read &lt;em&gt;lex &amp;amp; yacc&lt;/em&gt; by Doug Brown, John R. Levine, and Tony Mason (second edition published in 1992). Levine’s 2009 &lt;em&gt;flex &amp;amp; bison&lt;/em&gt; would have been a more practical choice, but I liked getting some history too. Re-reading some parts of that book was very helpful. So were &lt;a href="https://www.gnu.org/software/bison/manual/html_node/Algorithm.html"&gt;the bison manual&lt;/a&gt; and &lt;a href="https://stackoverflow.com/questions/26188276/why-doesnt-prec-have-an-effect-in-this-bison-grammar?noredirect=1&amp;amp;lq=1"&gt;some&lt;/a&gt; &lt;a href="https://stackoverflow.com/questions/9716917/why-does-this-simple-grammar-have-a-shift-reduce-conflict?rq=3"&gt;StackOverflow&lt;/a&gt; &lt;a href="https://stackoverflow.com/questions/76244745/bison-nonassoc-vs-token"&gt;questions&lt;/a&gt;. Now I’m working through &lt;a href="https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools"&gt;The Dragon Book&lt;/a&gt;, and that would have made a great resource too. It’s easy to write some bison without going that deep, but if you get stuck it can be frustrating.&lt;/p&gt;

&lt;p&gt;I’ve been adding syntax from SQL:2011 for &lt;a href="https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf"&gt;application-time updates and deletes&lt;/a&gt;. If you have a &lt;code&gt;PERIOD&lt;/code&gt; or range column named &lt;code&gt;valid_at&lt;/code&gt;, you can say &lt;code&gt;UPDATE t FOR PORTION OF valid_at FROM '2024-01-01' TO '2024-02-01' SET foo = bar&lt;/code&gt;. (For more details you can &lt;a href="/pages/temporal-data-theory-and-postgres/"&gt;watch my talk&lt;/a&gt;.) The &lt;code&gt;FOR PORTION OF&lt;/code&gt; bounds don’t have to be just literals. You could also say, for example, &lt;code&gt;FROM current_time TO current_time + INTERVAL '1' HOUR&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Actually I picked that example on purpose. Did you know that intervals support this syntax: &lt;code&gt;INTERVAL '1:02:03' HOUR TO MINUTE&lt;/code&gt;? That means: Interpret the string as hours, and preserve the precision down to the minute. By default if you ask for &lt;code&gt;1:02:03 HOUR&lt;/code&gt; you get just an hour. But &lt;code&gt;TO MINUTE&lt;/code&gt; means you get 1 hour and 2 minutes. (You still lose the seconds.)&lt;/p&gt;

&lt;p&gt;So how does &lt;a href="https://en.wikipedia.org/wiki/LR_parser"&gt;an LR(1) parser&lt;/a&gt; deal with &lt;code&gt;FOR PORTION OF valid_at FROM current_time + INTERVAL '1' HOUR TO MINUTE&lt;/code&gt;? When we consider the &lt;code&gt;TO&lt;/code&gt;, does it belong with the interval, or does it start the closing bound of the &lt;code&gt;FOR PORTION OF&lt;/code&gt;? This is a shift/reduce conflict. Bison can’t look ahead further than one token to guess what is correct—and I’m not sure it would help even if it could.&lt;/p&gt;

&lt;p&gt;When the next token is able to complete some piece of the grammar, called a “rule”, the parser “reduces”: it consumes those tokens and lets you run a custom “action” attached to your rule. Otherwise, bison shifts the token onto a stack of not-yet-reduced tokens, so that it can reduce a rule in the future. But bison needs to decide each time it sees a token whether to reduce or to shift.&lt;/p&gt;

&lt;p&gt;Here is the rule for &lt;code&gt;FOR PORTION OF&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;for_portion_of_clause:
  FOR PORTION OF ColId FROM a_expr TO a_expr
    {
      ForPortionOfClause *n = makeNode(ForPortionOfClause);
      n-&amp;gt;range_name = $4;
      n-&amp;gt;location = @4;
      n-&amp;gt;target_start = $6;
      n-&amp;gt;target_end = $8;
      $$ = n;
    }
;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first line is the name of the rule, so we can use it in higher-level contexts: the &lt;code&gt;UPDATE&lt;/code&gt; statement (and also &lt;code&gt;DELETE&lt;/code&gt;). The second line has the inputs we need to match to complete the rule. Some of them are “terminals”: keywords, identifiers, operators, literals, punctuation, etc.—in this case &lt;code&gt;FOR&lt;/code&gt;, &lt;code&gt;PORTION&lt;/code&gt;, &lt;code&gt;OF,&lt;/code&gt; &lt;code&gt;FROM&lt;/code&gt;, &lt;code&gt;TO&lt;/code&gt;. Some are “non-terminal”: further rules. Below all that is a block of C code: the action that gets run when we reduce. We call the name the “left side” of the rule, and the inputs the “right side” or “body”. Rules are also sometimes called “productions”. The &lt;em&gt;lex &amp;amp; yacc&lt;/em&gt; book doesn’t use that terminology, but the Dragon Book does, and so does the Postgres source code.&lt;/p&gt;

&lt;p&gt;So each bound is an &lt;code&gt;a_expr&lt;/code&gt;. The &lt;code&gt;a_expr&lt;/code&gt; rule is a complicated production with just about anything you can do in Postgres: a literal, a variable, a function call, an operator, a subquery, lots of weird SQL keywords, etc. Many contexts forbid some of these things, e.g. you can’t refer to a column in a &lt;code&gt;DEFAULT&lt;/code&gt; expression or a partition bound—but that is enforced during analysis, not by the grammar.&lt;/p&gt;

&lt;p&gt;To speak more precisely, when a non-terminal can match inputs in more than one way (like &lt;code&gt;a_expr&lt;/code&gt;), we should call &lt;em&gt;each alternative&lt;/em&gt; a rule or production. But in bison you commonly write the name once then separate each body with a pipe (&lt;code&gt;|&lt;/code&gt;), so all the rules share one name. There is not one &lt;code&gt;a_expr&lt;/code&gt; rule, but many: 68 by my count. But such terminological precision is rarely needed.&lt;/p&gt;

&lt;p&gt;Take our example, &lt;code&gt;FOR PORTION OF valid_at FROM current_time + INTERVAL '1' HOUR • TO MINUTE&lt;/code&gt;. I’ve added a dot to represent bison’s “cursor”. It is considering what to do with the &lt;code&gt;TO&lt;/code&gt;. We could reduce the &lt;code&gt;a_expr&lt;/code&gt; right now, leaving the &lt;code&gt;TO&lt;/code&gt; to become part of the &lt;code&gt;FOR PORTION OF&lt;/code&gt;. Or we could shift the &lt;code&gt;TO&lt;/code&gt; so that it eventually gets reduced as part of the &lt;code&gt;a_expr&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Actually it’s not about reducing the &lt;code&gt;a_expr&lt;/code&gt;, but reducing one of its many sub-rules, in this case the interval. An &lt;code&gt;a_expr&lt;/code&gt; can be a &lt;code&gt;c_expr&lt;/code&gt; (among other things), and a &lt;code&gt;c_expr&lt;/code&gt; can be an &lt;code&gt;AexprConst&lt;/code&gt; (among other things), and an &lt;code&gt;AexprConst&lt;/code&gt; can be a &lt;code&gt;ConstInterval Sconst opt_interval&lt;/code&gt; (among other things), and the &lt;code&gt;opt_interval&lt;/code&gt; is the problem, because it can optionally have a &lt;code&gt;TO&lt;/code&gt;. Here is that rule:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;opt_interval:
  YEAR_P
    { $$ = list_make1(makeIntConst(INTERVAL_MASK(YEAR), @1)); }
  | MONTH_P
    { $$ = list_make1(makeIntConst(INTERVAL_MASK(MONTH), @1)); }
  | DAY_P
    { $$ = list_make1(makeIntConst(INTERVAL_MASK(DAY), @1)); }
  | HOUR_P
    { $$ = list_make1(makeIntConst(INTERVAL_MASK(HOUR), @1)); }
  | MINUTE_P
    { $$ = list_make1(makeIntConst(INTERVAL_MASK(MINUTE), @1)); }
  | interval_second
    { $$ = $1; }
  | YEAR_P TO MONTH_P
    { ... }
  | DAY_P TO HOUR_P
    { ... }
  | DAY_P TO MINUTE_P
    { ... }
  | DAY_P TO interval_second
    { ... }
  | HOUR_P TO MINUTE_P
    { ... }
  | HOUR_P TO interval_second
    { ... }
  | MINUTE_P TO interval_second
    { ... }
  | /*EMPTY*/
    { $$ = NIL; }
;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(I’ve omitted most of the actions, since actions don’t affect bison’s choices.) The &lt;code&gt;opt_interval&lt;/code&gt; rule is what bison is trying to reduce.&lt;/p&gt;

&lt;p&gt;When you have a shift/reduce conflict, &lt;code&gt;make&lt;/code&gt; gives you an error like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/usr/bin/bison -d -o gram.c gram.y
gram.y: conflicts: 4 shift/reduce
gram.y: expected 0 shift/reduce conflicts
make[2]: *** [gram.c] Error 1
make[1]: *** [parser/gram.h] Error 2
make: *** [submake-generated-headers] Error 2&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That is not much to go on.&lt;/p&gt;

&lt;p&gt;By the way, do you see the &lt;code&gt;expected 0&lt;/code&gt;? A shift/reduce conflict doesn’t have to be fatal. Bison will default to shift. But this is a bit sketchy. It means you could accidentally write an ambiguous grammar that causes trouble later. So bison lets you declare how many conflicts you expect, and it only fails if it finds a different count. I like that for Postgres the expected conflict count is zero. For MariaDB &lt;a href="https://github.com/MariaDB/server/blob/4b6922a315fa5411665ac99c0b40fd7238093403/sql/sql_yacc.yy#L357-L361"&gt;it is 62&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Anyway, we have four shift/reduce conflicts. Now what? Let’s ask Bison where they are. It can take a &lt;code&gt;-v/--verbose&lt;/code&gt; option to generate a “report file”. Since Postgres’s grammar lives in &lt;code&gt;gram.y&lt;/code&gt;, the report file is &lt;code&gt;gram.output&lt;/code&gt;. (Modern versions offer more control with &lt;code&gt;-r/--report&lt;/code&gt; and &lt;code&gt;--report-file&lt;/code&gt;, but macOS only supports &lt;code&gt;-v&lt;/code&gt;.) We aren’t running bison directly, but we can control things like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;make BISONFLAGS=-v&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That gives us a file with 5.5 million lines, but right at the top we see:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;State 1454 conflicts: 1 shift/reduce
State 1455 conflicts: 1 shift/reduce
State 1456 conflicts: 1 shift/reduce
State 1459 conflicts: 1 shift/reduce&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then if we &lt;code&gt;/^state 1454&lt;/code&gt; we see this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;state 1454

  2000 opt_interval: DAY_P .
  2005             | DAY_P . TO HOUR_P
  2006             | DAY_P . TO MINUTE_P
  2007             | DAY_P . TO interval_second

    TO  shift, and go to state 2670

    TO        [reduce using rule 2000 (opt_interval)]
    $default  reduce using rule 2000 (opt_interval)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So in this state, bison has four candidate rules it could eventually reduce, numbered 2000, 2005, 2006, 2007. Below that are possible valid tokens and what to do for each one. We see &lt;code&gt;TO&lt;/code&gt; twice, which is the problem. The square brackets highlight the conflict: they mark a transition that will never happen. (The &lt;code&gt;$default&lt;/code&gt; line means if the next token is &lt;em&gt;not&lt;/em&gt; a &lt;code&gt;TO&lt;/code&gt;, we can reduce and leave that token for some higher-level rule to match.) So this is how we know one half of the problem is &lt;code&gt;opt_interval&lt;/code&gt;. The other half is &lt;code&gt;for_portion_of_clause&lt;/code&gt;. Bison doesn’t tell us that, but (1) we just added it to a previously-working grammar (2) we can see that &lt;code&gt;TO&lt;/code&gt; is the issue, and that’s where we match a &lt;code&gt;TO&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is one of the four shift/reduce conflicts. The other three are also from &lt;code&gt;opt_interval&lt;/code&gt;, caused by &lt;code&gt;YEAR_P TO MONTH_P&lt;/code&gt;, &lt;code&gt;HOUR_P TO {MINUTE_P,interval_second}&lt;/code&gt;, and &lt;code&gt;MINUTE_P TO interval_second&lt;/code&gt;. Essentially it’s all one conflict, but we can hit the &lt;code&gt;TO&lt;/code&gt; after &lt;code&gt;DAY&lt;/code&gt;, &lt;code&gt;HOUR&lt;/code&gt;, &lt;code&gt;YEAR&lt;/code&gt;, or &lt;code&gt;MINUTE&lt;/code&gt;, so that’s four different states.&lt;/p&gt;

&lt;p&gt;We can use “precedence” to resolve such ambiguities. It’s just like elementary arithmetic: multiplication has higher precedence than addition. It is stickier. We do it first. But what does that mean in bison? Bison compares the precedence of the non-terminal rule it could reduce (&lt;code&gt;opt_interval&lt;/code&gt;) vs the precedence of the token it could shift (&lt;code&gt;TO&lt;/code&gt;). Rules don’t really have precedence themselves, but they get the precedence of their final terminal token.&lt;/p&gt;

&lt;p&gt;So in state 1454, if we give &lt;code&gt;DAY_P&lt;/code&gt; a different precedence than &lt;code&gt;TO&lt;/code&gt;, bison will know whether to reduce (rule 2000), or shift (and eventually reduce rule 2005, 2006, or 2007). If &lt;code&gt;DAY_P&lt;/code&gt; is higher, we’ll reduce. If &lt;code&gt;TO&lt;/code&gt; is higher, we’ll shift.&lt;/p&gt;

&lt;p&gt;Should we shift or reduce? The only answer is to shift. If we reduce by default, then users can &lt;em&gt;never&lt;/em&gt; say &lt;code&gt;INTERVAL '1' DAY TO HOUR&lt;/code&gt; (even in a completely different context). No amount of parens will make bison do otherwise. But if we shift, then this is a syntax error: &lt;code&gt;FOR PORTION OF valid_at FROM '2013-03-01'::timestamp + INTERVAL '1' HOUR TO '2014-01-01'&lt;/code&gt; (because after shifting the &lt;code&gt;TO&lt;/code&gt; bison is still trying to reduce &lt;code&gt;opt_interval&lt;/code&gt;), but this fixes it: &lt;code&gt;FOR PORTION OF valid_at FROM ('2013-03-01'::timestamp + INTERVAL '1' HOUR) TO '2014-01-01'&lt;/code&gt;. Users can get what they want by adding parens.&lt;/p&gt;

&lt;p&gt;So to shift, we give &lt;code&gt;TO&lt;/code&gt; a higher precedence than &lt;code&gt;YEAR_P&lt;/code&gt;, &lt;code&gt;DAY_P&lt;/code&gt;, &lt;code&gt;HOUR_P&lt;/code&gt;, and &lt;code&gt;MINUTE_P&lt;/code&gt;. By default a token has no precedence, but bison lets you make a list of declarations where &lt;em&gt;lower lines&lt;/em&gt; have &lt;em&gt;higher precedence&lt;/em&gt;. So for a long time my patch added this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;%nonassoc YEAR_P DAY_P HOUR_P MINUTE_P
%nonassoc TO&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Actually I had &lt;code&gt;MONTH_P&lt;/code&gt; in there too, but that isn’t needed because you can’t have &lt;code&gt;MONTH TO ...&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;But this is frowned upon. There is a comment right above my change that gave me a guilty conscience for at least a year, maybe a few:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/*
 * Sometimes it is necessary to assign precedence to keywords that are not
 * really part of the operator hierarchy, in order to resolve grammar
 * ambiguities.  It's best to avoid doing so whenever possible, because such
 * assignments have global effect and may hide ambiguities besides the one
 * you intended to solve.  (Attaching a precedence to a single rule with
 * %prec is far safer and should be preferred.)  If you must give precedence
 * to a new keyword, try very hard to give it the same precedence as IDENT.
 * If the keyword has IDENT's precedence then it clearly acts the same as
 * non-keywords and other similar keywords, thus reducing the risk of
 * unexpected precedence effects.
 */&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I knew I had to fix this before my patch would get accepted. Those two lines had to go.&lt;/p&gt;

&lt;p&gt;What is the &lt;code&gt;%prec&lt;/code&gt; approach suggested by the comment? I said above that a rule’s precedence comes from its last terminal token. But you can override a rule’s precedence by putting &lt;code&gt;%prec token_name&lt;/code&gt; after the right side. For example &lt;code&gt;a_expr&lt;/code&gt; has this rule:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| a_expr AT TIME ZONE a_expr      %prec AT
  { ... }&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We’re saying we should reduce this rule with the precedence of &lt;code&gt;AT&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I tried all kinds of &lt;code&gt;%prec&lt;/code&gt; placements that didn’t work. My mental model of bison’s process was too vague. The reason I’m writing this story is really to record the details and thought process that finally gave me the solution.&lt;/p&gt;

&lt;p&gt;For example, putting &lt;code&gt;%prec&lt;/code&gt; on &lt;code&gt;for_portion_of_clause&lt;/code&gt; doesn’t do any good, because the conflict is lower down than that, inside &lt;code&gt;opt_interval&lt;/code&gt;. That was counter-intuitive, because I knew that adding &lt;code&gt;for_portion_of_clause&lt;/code&gt; was what caused the problem. It’s what offers bison a way to reduce early and still have a place to use the &lt;code&gt;TO&lt;/code&gt;. But despite &lt;code&gt;for_portion_of_clause&lt;/code&gt; exerting influence, at the moment of decision we are in the middle of a different rule. It’s action-at-a-distance.&lt;/p&gt;

&lt;p&gt;Another breakthrough was realizing that the comparison is between a &lt;em&gt;rule&lt;/em&gt; (to reduce) and a &lt;em&gt;token&lt;/em&gt; (to shift). Within &lt;code&gt;opt_interval&lt;/code&gt; I kept trying to give low precedence to the rules without &lt;code&gt;TO&lt;/code&gt; and high precedence to the rules with it. But the comparison isn’t between two rules. It’s between a rule and a token. The token is &lt;code&gt;TO&lt;/code&gt; itself. There isn’t any way to give precedence to a &lt;em&gt;token&lt;/em&gt; with &lt;code&gt;%prec&lt;/code&gt;. That only modifies a rule. If &lt;code&gt;TO&lt;/code&gt; has an undefined precedence, there will always be a conflict. So I &lt;em&gt;did&lt;/em&gt; have to declare a precedence for &lt;code&gt;TO&lt;/code&gt;, but following the comment above I could give it the same precedence as &lt;code&gt;IDENT&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;%nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
      SET KEYS OBJECT_P SCALAR TO VALUE_P WITH WITHOUT PATH&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then the conflicting &lt;code&gt;opt_interval&lt;/code&gt; rules needed a lower precedence, to prevent reducing early. A low-precedence keyword we use a lot is &lt;code&gt;IS&lt;/code&gt;, so I did this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;opt_interval:
  YEAR_P                                %prec IS
    { ... }
  | DAY_P                               %prec IS
    { ... }
  | HOUR_P                              %prec IS
    { ... }
  | MINUTE_P                            %prec IS
    { ... }&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now we’ll shift the &lt;code&gt;TO&lt;/code&gt; and follow those rules that include it.&lt;/p&gt;

&lt;p&gt;Finally I had a solution!&lt;/p&gt;

&lt;p&gt;It’s worth considering another approach. We can also enforce precedence with the structure of our rules, without declaring an explicit higher/lower precedence for terminals. For example for simple arithmetic we could do this (from the Dragon Book, p. 49–50):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;expr: expr + term
  | expr - term
  | term

term: term * factor
  | term / factor
  | factor

factor: digit
  | '(' expr ')'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For &lt;code&gt;n&lt;/code&gt; levels of precedence, we need &lt;code&gt;n+1&lt;/code&gt; different rules. But I think this approach gets unwieldy quickly. And anyway was I going to rewrite the Postgres grammar to do this?&lt;/p&gt;

&lt;p&gt;Postgres actually does this a bit already though. We’ve seen &lt;code&gt;a_expr&lt;/code&gt; and &lt;code&gt;c_expr&lt;/code&gt;. Of course there is also &lt;code&gt;b_expr&lt;/code&gt;. &lt;code&gt;b_expr&lt;/code&gt; is a more limited set of rules than &lt;code&gt;a_expr&lt;/code&gt;, and &lt;code&gt;c_expr&lt;/code&gt; is everything they have in common.&lt;/p&gt;

&lt;p&gt;We use &lt;code&gt;b_expr&lt;/code&gt; to solve some shift/reduce conflicts. For example a column’s &lt;code&gt;DEFAULT&lt;/code&gt; value can only take a &lt;code&gt;b_expr&lt;/code&gt;, because a &lt;code&gt;NOT&lt;/code&gt; would be a shift/reduce conflict: is it a &lt;code&gt;NOT NULL&lt;/code&gt; constraint on the column, or is it part of the &lt;code&gt;DEFAULT&lt;/code&gt; expression, e.g. &lt;code&gt;NOT LIKE&lt;/code&gt;? One rule that &lt;code&gt;b_expr&lt;/code&gt; accepts is &lt;code&gt;'(' a_expr ')'&lt;/code&gt;, so even in contexts like &lt;code&gt;DEFAULT&lt;/code&gt;, you can get whatever you want by wrapping your text in parentheses.&lt;/p&gt;

&lt;p&gt;So could I have saved myself a lot of trouble and made &lt;code&gt;FOR PORTION OF&lt;/code&gt; take &lt;code&gt;FROM b_expr TO b_expr&lt;/code&gt; instead? No, because the problem was inside &lt;code&gt;c_expr&lt;/code&gt;, which is shared by both rules.&lt;/p&gt;

&lt;p&gt;I probably could have invented a &lt;code&gt;d_expr&lt;/code&gt;, but that would have been a lot of work, only to produce a more tangled grammar that I expect no reviewer would have accepted.&lt;/p&gt;

&lt;p&gt;So that’s the story of how I fixed my four shift-reduce conflicts.&lt;/p&gt;

&lt;p&gt;But just when you think you’ve killed the zombie, he rises from the dead. Right around the same time, I realized my grammar was wrong. In SQL, you can give your table an alias when you &lt;code&gt;UPDATE&lt;/code&gt; or &lt;code&gt;DELETE&lt;/code&gt;. It can use &lt;code&gt;AS&lt;/code&gt; or not: &lt;code&gt;UPDATE tablename [[AS] t]&lt;/code&gt; and &lt;code&gt;DELETE FROM tablename [[AS] t]&lt;/code&gt;. I was putting &lt;code&gt;FOR PORTION OF&lt;/code&gt; &lt;em&gt;after&lt;/em&gt; the alias, but according to SQL:2011 it comes &lt;em&gt;before&lt;/em&gt;. I tried moving it, and I got . . . 30 shift/reduce conflicts!&lt;/p&gt;

&lt;p&gt;These looked really hairy: the problem was that &lt;code&gt;AS&lt;/code&gt; is optional and the alias can be nearly anything. It can’t be a &lt;em&gt;reserved&lt;/em&gt; keyword (unless you quote it), but many keywords are not reserved (per the standard), so there’s ambiguity there. Allowing &lt;code&gt;a_expr&lt;/code&gt;, which can be nearly anything, followed by an optional alias, which can be nearly anything, is bad news. I really thought I was in trouble.&lt;/p&gt;

&lt;p&gt;Could I just ignore the standard? I don’t think that would be acceptable, not in this matter. But it was tempting enough that I checked what MariaDB and IBM DB2 were doing. Somehow they were making it work. I should figure it out too.&lt;/p&gt;

&lt;p&gt;I think I took a walk, or maybe I slept on it, but I realized that we already have the same problem with &lt;em&gt;column&lt;/em&gt; aliases when you &lt;code&gt;SELECT&lt;/code&gt;. Each selected column is an &lt;code&gt;a_expr&lt;/code&gt;, and their aliases don’t require &lt;code&gt;AS&lt;/code&gt;. What was Postgres doing to make that work?&lt;/p&gt;

&lt;p&gt;I found this rule for &lt;code&gt;SELECT&lt;/code&gt;ing:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;target_el:  a_expr AS ColLabel { ...}
      | a_expr BareColLabel { ... }
      | a_expr { ... }
      | '*' { ... }
    ;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It turns out that &lt;code&gt;ColLabel&lt;/code&gt; allows anything (even reserved keywords!), but &lt;code&gt;BareColLabel&lt;/code&gt; is more restricted.&lt;/p&gt;

&lt;p&gt;So I could do something similar: when there is an &lt;code&gt;AS&lt;/code&gt;, permit everything, but otherwise only permit tokens that are conflict-free. If fact to keep backward-compatibility, I could leave the old grammar rule for &lt;code&gt;UPDATE&lt;/code&gt; and &lt;code&gt;DELETE&lt;/code&gt; in place (each had only one), and only get more restrictive when &lt;code&gt;FOR PORTION OF&lt;/code&gt; is present. Maybe reviewers will ask me to change things, but at the moment my solution looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;opt_alias:
  AS ColId { ... }
  | BareColLabel { ... }
  | /* empty */ %prec UMINUS { $$ = NULL; }
;

UpdateStmt: opt_with_clause UPDATE relation_expr_opt_alias
  SET set_clause_list
  from_clause
  where_or_current_clause
  returning_clause
    { ... }
  | opt_with_clause UPDATE relation_expr
  for_portion_of_clause opt_alias
  SET set_clause_list
  from_clause
  where_or_current_clause
  returning_clause
    { ... }
;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I’m not sure I like using &lt;code&gt;BareColLabel&lt;/code&gt; for non-column aliases, but the existing &lt;code&gt;relation_expr_opt_alias&lt;/code&gt; uses &lt;code&gt;ColId&lt;/code&gt;, so maybe it’s okay.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;%prec&lt;/code&gt; is necessary to resolve a conflict with &lt;code&gt;USING&lt;/code&gt;, which is permitted by &lt;code&gt;BareColLabel&lt;/code&gt;, and also allowed in &lt;code&gt;DELETE FROM ... USING&lt;/code&gt;. If I added a separate list for bare &lt;em&gt;table&lt;/em&gt; labels, we could leave out &lt;code&gt;USING&lt;/code&gt; and not use &lt;code&gt;%prec&lt;/code&gt; here, but I don’t think maintaining another keyword list would be popular.&lt;/p&gt;

&lt;p&gt;That’s it! I’m happy that at 47 I can still work out the errors in my mental model of something and correct them. Hopefully by writing this down I won’t have to do it more than once. :-)&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-10-13:/posts/2024/10/postgres-replica-identity/</id>
    <title type="html">Postgres REPLICA IDENTITY</title>
    <published>2024-10-13T00:00:00Z</published>
    <updated>2024-10-13T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/10/postgres-replica-identity/" type="text/html"/>
    <content type="html">
&lt;p&gt;Both logical decoding and logical replication use a table’s &lt;a href="https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY"&gt;&lt;code&gt;REPLICA IDENTITY&lt;/code&gt;&lt;/a&gt;. This is a way to say which row was changed by an &lt;code&gt;UPDATE&lt;/code&gt; or &lt;code&gt;DELETE&lt;/code&gt;. In other word it identifies the “old” row.&lt;/p&gt;

&lt;p&gt;Logical &lt;em&gt;decoding&lt;/em&gt; will use the replica identity to say which row was changed, if available, but if not then the information is simply omitted. No big deal.&lt;/p&gt;

&lt;p&gt;In logical &lt;em&gt;replication&lt;/em&gt;, the subscriber looks for the replica identity with each change, and it uses it to know which row to remove, either because it was replaced or because it was deleted. So you can always replicate inserts and truncates, but you can only replicate updates and deletes if you have an appropriate replica identity. In fact even on the publication side, Postgres will forbid changes without an appropriate replica identity, as we will see.&lt;/p&gt;

&lt;p&gt;Normally a table has &lt;code&gt;DEFAULT&lt;/code&gt; for its replica identity. This means it will use the table’s primary key (if present). You can’t set the replica identity when you create the table, but you can change it with &lt;code&gt;ALTER TABLE&lt;/code&gt;. Besides &lt;code&gt;DEFAULT&lt;/code&gt;, it can be &lt;code&gt;USING INDEX &amp;lt;index&amp;gt;&lt;/code&gt; or &lt;code&gt;FULL&lt;/code&gt; or &lt;code&gt;NOTHING&lt;/code&gt;. The replica identity is stored in &lt;code&gt;pg_class.relreplident&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I had a lot of questions about how each of these works. Mostly I wanted to know when things failed: creating/altering the table/publication, making the change, or receiving it. All my tests were done on Postgres 17 using the &lt;code&gt;REL_17_0&lt;/code&gt; tag after doing &lt;code&gt;make world &amp;amp;&amp;amp; make install-world&lt;/code&gt;. I wanted to replicate from one database to another in the same cluster, but &lt;a href="https://www.postgresql.org/docs/current/sql-createsubscription.html#SQL-CREATESUBSCRIPTION-NOTES"&gt;that requires some (slightly) more complicated commands&lt;/a&gt;. To keep the SQL simple, I ran separate clusters, one with a database named &lt;code&gt;sender&lt;/code&gt;, the other with a database named &lt;code&gt;receiver&lt;/code&gt;. The sending cluster runs on 5432 to keep the receiver’s connection strings simpler.&lt;/p&gt;

&lt;p&gt;On the sender I set &lt;code&gt;wal_level&lt;/code&gt; like this (on a Mac):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sed -i '' -e '/wal_level/a\
wal_level=logical' ~/local/pgsql/data/postgresql.conf&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;On Linux I think this would be just:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sed -i '/wal_level/awal_level=logical' ~/local/pgsql/data/postgresql.conf&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can find the SQL for each test in &lt;a href="https://github.com/pjungwir/replica-identity-tests"&gt;this github repo&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=""&gt;&lt;code&gt;NOTHING&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;Let’s start with &lt;code&gt;NOTHING&lt;/code&gt;. This means there is no information about the old row. If we’re looking for failures, this is nice and simple.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q1: What happens if you create a publication for it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is allowed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity nothing;
ALTER TABLE
sender=# create publication p for table t;
CREATE PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Q2: What happens if you add it to an existing publication?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is allowed too:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity nothing;
ALTER TABLE
sender=# create publication p;
CREATE PUBLICATION
sender=# alter publication p add table t;
ALTER PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Q3: What happens if you create a &lt;code&gt;FOR ALL TABLES&lt;/code&gt; publication?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And this is allowed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity nothing;
ALTER TABLE
sender=# create publication p for all tables;
CREATE PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Q4: What happens if you set a table to &lt;code&gt;NOTHING&lt;/code&gt; when it already belongs to a publication?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s allowed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# create publication p for table t;
CREATE PUBLICATION
sender=# alter table t replica identity nothing;
ALTER TABLE&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Q5: What happens if you set a table to &lt;code&gt;NOTHING&lt;/code&gt; when you already have a &lt;code&gt;FOR ALL TABLES&lt;/code&gt; publication?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s allowed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# create publication p for all tables;
CREATE PUBLICATION
sender=# alter table t replica identity nothing;
ALTER TABLE&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Q6: What happens if you change an insert-only publication to an update publication, and it has a &lt;code&gt;NOTHING&lt;/code&gt; table?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s allowed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity nothing;
ALTER TABLE
sender=# create publication p for table t with (publish = insert);
CREATE PUBLICATION
sender=# alter publication p set (publish = 'insert,update,delete,truncate');
ALTER PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Q7: What happens if you change an insert-only &lt;code&gt;FOR ALL TABLES&lt;/code&gt; publication to an update publication, and the database has a &lt;code&gt;NOTHING&lt;/code&gt; table?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s allowed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity nothing;
ALTER TABLE
sender=# create publication p for all tables with (publish = insert);
CREATE PUBLICATION
sender=# alter publication p set (publish = 'insert,update,delete,truncate');
ALTER PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Q8: What happens if you update a &lt;code&gt;NOTHING&lt;/code&gt; table that is in a publication?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s &lt;em&gt;not&lt;/em&gt; allowed!&lt;/p&gt;

&lt;p&gt;This fails on the publisher side, even if there is no subscription:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity nothing;
ALTER TABLE
sender=# create publication p for table t;
CREATE PUBLICATION
sender=# insert into t values ('a');
INSERT 0 1
sender=# update t set a = 'b';
ERROR:  cannot update table "t" because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Q9: What happens if you delete from a &lt;code&gt;NOTHING&lt;/code&gt; table that is in a publication?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This fails the same way:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity nothing;
ALTER TABLE
sender=# create publication p for table t;
CREATE PUBLICATION
sender=# insert into t values ('a');
INSERT 0 1
sender=# update t set a = 'b';
ERROR:  cannot delete from table "t" because it does not have a replica identity and publishes deletes
HINT:  To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So to summarize, Postgres doesn’t validate publications up front, but only when you try to send through them a &lt;code&gt;NOTHING&lt;/code&gt; update/delete.&lt;/p&gt;

&lt;h3 id="_2"&gt;&lt;code&gt;FULL&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;FULL&lt;/code&gt; means all columns are combined to determine uniqueness. Of course that might &lt;em&gt;still&lt;/em&gt; not be unique. So what happens if you update one of them but not the other?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q1: What happens if you have two identical records and you delete one?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On the publisher:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity full;
ALTER TABLE
sender=# create publication p for table t;
CREATE PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;On the subscriber:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;receiver=# create table t (a text);
CREATE TABLE
receiver=# create subscription s connection 'dbname=sender' publication p;
NOTICE:  created replication slot "s" on publisher
CREATE SUBSCRIPTION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Back on the publisher:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# insert into t values ('a');
INSERT 0 1
sender=# insert into t values ('a');
INSERT 0 1
sender=# select ctid, a from t;
 ctid  | a
-------+---
 (0,1) | a
 (0,2) | a
(2 rows)

sender=# delete from t where ctid = '(0,1)';
DELETE 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And the receiver sees:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;receiver=# select * from t;
 a 
---
 a
(1 row)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So we didn’t lose both rows! I guess that’s because we are replicating a delete of one row. Which one we delete doesn’t matter, but we’ll only delete one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q1: What happens if you have two identical records and you delete both?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On the publisher:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity full;
ALTER TABLE
sender=# create publication p for table t;
CREATE PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;On the subscriber:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;receiver=# create table t (a text);
CREATE TABLE
receiver=# create subscription s connection 'dbname=sender' publication p;
NOTICE:  created replication slot "s" on publisher
CREATE SUBSCRIPTION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Back on the publisher:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# insert into t values ('a');
INSERT 0 1
sender=# insert into t values ('a');
INSERT 0 1
sender=# delete from t;
DELETE 2&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And the receiver sees:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;receiver=# select * from t;
 a 
---
(0 rows)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So both rows disappeared. That makes sense, because the receiver got two messages to delete a row like &lt;code&gt;('a')&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q3: What happens if you have two identical records and you update one?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I assume updating one of two identical rows will work the same, but let’s check:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# alter table t replica identity full;
ALTER TABLE
sender=# create publication p for table t;
CREATE PUBLICATION&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;receiver=# create table t (a text);
CREATE TABLE
receiver=# create subscription s connection 'dbname=sender' publication p;
NOTICE:  created replication slot "s" on publisher
CREATE SUBSCRIPTION&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;sender=# insert into t values ('a');
INSERT 0 1
sender=# insert into t values ('a');
INSERT 0 1
sender=# select ctid, a from t;
 ctid  | a
-------+---
 (0,1) | a
 (0,2) | a
(2 rows)&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;sender=# update t set a = 'b' where ctid = '(0,1)';
UPDATE 1&lt;/code&gt;&lt;/pre&gt;&lt;pre&gt;&lt;code&gt;receiver=# select * from t;
 a 
---
 a
 b
(2 rows)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Yep!&lt;/p&gt;

&lt;p&gt;So with &lt;code&gt;FULL&lt;/code&gt; there are no errors on the publisher side nor on the subscriber side. The disadvantage is that everything is slower: logical decoding sends more data, and the subscriber must compare all the columns for equality.&lt;/p&gt;

&lt;p&gt;I wonder if the subscriber will still use an index to apply changes if there is one? I haven’t tested that yet, but if I do I will put an update here.&lt;/p&gt;

&lt;h3 id="_3"&gt;&lt;code&gt;USING INDEX &amp;lt;index&amp;gt;&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;This lets us choose a &lt;code&gt;UNIQUE&lt;/code&gt; index, as long as none of its keys are nullable. Assuming the subscriber has the same uniqueness, it can quickly locate which rows to change. Here is the happy path:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text not null unique);
CREATE TABLE
sender=# alter table t replica identity using index t_a_key;
ALTER TABLE
sender=# create publication p for table t;
CREATE PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In that case, &lt;code&gt;pg_class.relreplident&lt;/code&gt; is set to &lt;code&gt;i&lt;/code&gt;, and &lt;code&gt;pg_index.indisreplident&lt;/code&gt; is set to true for that index. You can see which index is used with &lt;code&gt;\d&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# \d t
                Table "public.t"
 Column | Type | Collation | Nullable | Default 
--------+------+-----------+----------+---------
 a      | text |           | not null | 
Indexes:
    "t_a_key" UNIQUE CONSTRAINT, btree (a) REPLICA IDENTITY&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But what can we do to mess it up?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q1: What happens if you use a not-unique index?&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# create index idx_t_a on t (a);
CREATE INDEX
sender=# alter table t replica identity using index idx_t_a;
ERROR:  cannot use non-unique index "idx_t_a" as replica identity&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We can’t even set the &lt;code&gt;REPLICA IDENTITY&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q2: What happens if you drop the index?&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text not null);
CREATE TABLE
sender=# create unique index idx_t_a on t (a);
CREATE INDEX
sender=# alter table t replica identity using index idx_t_a;
ALTER TABLE
sender=# drop index idx_t_a;
DROP INDEX
sender=# create publication p for table t;
CREATE PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Well you were able to drop it! The table’s &lt;code&gt;relreplident&lt;/code&gt; is still &lt;code&gt;i&lt;/code&gt;, but now there is no index with a true &lt;code&gt;pg_index.indisreplident&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let’s see what happens if we try to use it. On the subscriber:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;receiver=# create table t (a text);
CREATE TABLE
receiver=# create subscription s connection 'dbname=sender' publication p;
NOTICE:  created replication slot "s" on publisher
CREATE SUBSCRIPTION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Back on the sender:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# insert into t values ('a');
INSERT 0 1
sender=# update t set a = 'b';
ERROR:  cannot update table "t" because it does not have a replica identity and publishes updates&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Oops! We were able to insert, but the update failed. Even on the sender, we weren’t able to make the change.&lt;/p&gt;

&lt;p&gt;I’m dreaming of some other trickery with &lt;code&gt;ALTER INDEX&lt;/code&gt;, but nothing that seems problematic is supported, like changing a unique index to a non-unique one.&lt;/p&gt;

&lt;p&gt;So like &lt;code&gt;NOTHING&lt;/code&gt;, failures happen when we update the table. But in addition there is validation before setting &lt;code&gt;pg_index.indisreplident&lt;/code&gt;. If the index is not appropriate, we’ll fail there.&lt;/p&gt;

&lt;h3 id="_4"&gt;&lt;code&gt;DEFAULT&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;DEFAULT&lt;/code&gt; setting means to use the table’s primary key, if it has one. So if the table has a primary key, you’re all good. But what if it doesn’t? Is it the same as &lt;code&gt;NOTHING&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;Well we are allowed to set things up:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text);
CREATE TABLE
sender=# create publication p for table t;
CREATE PUBLICATION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And on the receiver:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;receiver=# create table t (a text);
CREATE TABLE
receiver=# create subscription s connection 'dbname=sender' publication p;
NOTICE:  created replication slot "s" on publisher
CREATE SUBSCRIPTION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now we make some changes:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# insert into t values ('a');
INSERT 0 1
sender=# update t set a = 'b';
ERROR:  cannot update table "t" because it does not have a replica identity and publishes updates&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So again we can insert things, but not send updates. We fail before we even send over the change. In this test we didn’t even need to run anything on the receiver to cause the failure.&lt;/p&gt;

&lt;p&gt;Since the failure is so late, there is no difference between starting with a primary key and dropping it partway through. Likewise with other ideas we tried for &lt;code&gt;NOTHING&lt;/code&gt;, like adding the publication before the table or using &lt;code&gt;FOR ALL TABLES&lt;/code&gt;. Also changing an insert-only publication to an update publication is permitted, but then the next update command fails.&lt;/p&gt;

&lt;p&gt;Still let’s just try one simple case:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q2: What happens if you drop the primary key from a table in a publication?&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# create table t (a text primary key);
CREATE TABLE
sender=# create publication p for table t;
CREATE PUBLICATION
sender=# alter table t drop constraint t_pkey;
ALTER TABLE
sender=# insert into t values ('a');
INSERT 0 1
sender=# update t set a = 'b';
ERROR:  cannot update table "t" because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;No problem dropping the key, but then we get a failure making an update. Also this test shows that no subscription is required.&lt;/p&gt;

&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;So in all four cases, the validation happens when you update/delete a row and try to push the change through a &lt;code&gt;PUBLICATION&lt;/code&gt;. Your change gets rolled back, so there is no inconsistency. I’m happy about this, because it lowers the risk of tricking Postgres into publishing changes it shouldn’t. Also it makes things easier for replicating temporal tables. (Of course that was my ulterior motive!)&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-10-13:/posts/2024/10/logical-replication/</id>
    <title type="html">Postgres Logical Replication</title>
    <published>2024-10-13T00:00:00Z</published>
    <updated>2024-10-13T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/10/logical-replication/" type="text/html"/>
    <content type="html">
&lt;p&gt;Logical &lt;a href="https://www.postgresql.org/docs/current/logicaldecoding.html"&gt;decoding&lt;/a&gt; is different from logical &lt;a href="https://www.postgresql.org/docs/current/logical-replication.html"&gt;replication&lt;/a&gt;. Logical replication is built on logical decoding.&lt;/p&gt;

&lt;h2 id="logical_decoding"&gt;Logical Decoding&lt;/h2&gt;

&lt;p&gt;Logical decoding exports the changes happening in your database. They are streamed through a &lt;a href="https://www.postgresql.org/docs/current/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS"&gt;replication slot&lt;/a&gt;. The slot keeps track of how far you’ve read, so that it doesn’t skip or repeat messages. (Btw how is it not “at least once” delivery?) Basically you are getting a stream of WAL records. But whereas a &lt;em&gt;physical&lt;/em&gt; replication slot gives you the exact binary of the WAL as it gets written, a logical replication slot includes an &lt;a href="https://www.postgresql.org/docs/current/logicaldecoding-output-plugin.html"&gt;&lt;em&gt;output plugin&lt;/em&gt;&lt;/a&gt; that encodes the WAL in a (hopefully) more accessible way.&lt;/p&gt;

&lt;p&gt;You can write output plugins yourself by implementing various callbacks to handle different kinds of database activity. The only essential callbacks are &lt;code&gt;LogicalDecodeBeginCB&lt;/code&gt;, &lt;code&gt;LogicalDecodeChangeCB&lt;/code&gt;, and &lt;code&gt;LogicalDecodeCommitCB&lt;/code&gt;, but there are many optional ones. Some of those let you implement two-phase commit (see e.g. &lt;a href="https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321"&gt;Kleppmann&lt;/a&gt; 352–359).&lt;/p&gt;

&lt;p&gt;You can also add an &lt;a href="https://www.postgresql.org/docs/current/logicaldecoding-writer.html"&gt;&lt;em&gt;output writer&lt;/em&gt;&lt;/a&gt;, but I’m not sure how it differs from a plugin. Somehow it only requires implementing three callbacks instead of many. I don’t see any way to “install” your writer or attach it to a slot. I’ll come back to this some other day.&lt;/p&gt;

&lt;p&gt;The most popular output plugin is &lt;a href="https://github.com/eulerto/wal2json"&gt;wal2json&lt;/a&gt;. When you read from the replication slot, you get each WAL record as a JSON object. Postgres also has a built-in &lt;code&gt;test_decoding&lt;/code&gt; output plugin (the default), which gives you the details as plain text.&lt;/p&gt;

&lt;p&gt;Postgres comes with a tool called &lt;code&gt;pg_recvlogical&lt;/code&gt; you can use to read from a replication slot. It can also create and drop slots. For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pg_recvlogical --create-slot --if-not-exists --slot s --dbname paul
pg_recvlogical --start --slot s --dbname paul -f -&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you let that run then go into psql and issue some inserts/updates/deletes, you will see the messages getting sent through the slot.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;paul=# create table t (a text primary key);
CREATE TABLE
paul=# insert into t values ('a'), ('b');
INSERT 0 2
paul=# update t set a = 'aa' where a = 'a';
UPDATE 1
paul=# delete from t where a = 'b';
DELETE 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Our &lt;code&gt;pg_recvlogical&lt;/code&gt; command prints:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;BEGIN 19715
COMMIT 19715
BEGIN 19716
table public.t: INSERT: a[text]:'a'
table public.t: INSERT: a[text]:'b'
COMMIT 19716
BEGIN 19717
table public.t: UPDATE: old-key: a[text]:'a' new-tuple: a[text]:'aa'
COMMIT 19717
BEGIN 19718
table public.t: DELETE: a[text]:'b'
COMMIT 19718&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you want JSON instead, you can create the slot with &lt;code&gt;-P wal2json&lt;/code&gt; (after installing it).&lt;/p&gt;

&lt;p&gt;The tricky part here is identifying which row changed for updates and deletes. Because &lt;code&gt;t&lt;/code&gt; had a primary key, Postgres used it, because it should uniquely identify a row on the other end. But you can do other things too, based on the table’s &lt;code&gt;REPLICA IDENTITY&lt;/code&gt;. You can set this to &lt;code&gt;DEFAULT&lt;/code&gt; (i.e. using the primary key if present), &lt;code&gt;NOTHING&lt;/code&gt;, &lt;code&gt;USING INDEX &amp;lt;index&amp;gt;&lt;/code&gt;, or &lt;code&gt;FULL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NOTHING&lt;/code&gt; is the same as &lt;code&gt;DEFAULT&lt;/code&gt; with no primary key: there is no identifying information available.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;USING INDEX&lt;/code&gt; takes a unique index with no nullable parts, and uses that.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;FULL&lt;/code&gt; uses all the attributes of the row. It works for any table, but it’s less performant.&lt;/p&gt;

&lt;p&gt;For more details on &lt;code&gt;REPLICA IDENTITY&lt;/code&gt; you can read my article &lt;a href="/posts/2024/10/postgres-replica-identity/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let’s drop the table, create it without a primary key, and run the same commands again:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;paul=# create table t (a text);
CREATE TABLE
paul=# insert into t values ('a'), ('b');
INSERT 0 2
paul=# update t set a = 'aa' where a = 'a';
UPDATE 1
paul=# delete from t where a = 'b';
DELETE 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now the logical decoding output doesn’t have any old-row identifiers:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;BEGIN 19720
COMMIT 19720
BEGIN 19721
table public.t: INSERT: a[text]:'a'
table public.t: INSERT: a[text]:'b'
COMMIT 19721
BEGIN 19722
table public.t: UPDATE: a[text]:'aa'
COMMIT 19722
BEGIN 19723
table public.t: DELETE: (no-tuple-data)
COMMIT 19723&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Missing a &lt;code&gt;REPLICA IDENTITY&lt;/code&gt; is no problem for logical decoding. For logical &lt;em&gt;replication&lt;/em&gt;, it will raise an error when you try to update or delete.&lt;/p&gt;

&lt;h2 id="clarification_of_streaming"&gt;Clarification of “Streaming”&lt;/h2&gt;

&lt;p&gt;In the Postgres docs, &lt;a href="https://www.postgresql.org/message-id/flat/CA%2BrenyULt3VBS1cRFKUfT2%3D5dr61xBOZdAZ-CqX3XLGXqY-aTQ%40mail.gmail.com"&gt;the word “streaming” can be misleading&lt;/a&gt;. Often it means &lt;em&gt;physical&lt;/em&gt; replication (originally the only kind of streaming replication we had), in contrast to logical—but not always.&lt;/p&gt;

&lt;p&gt;Sometimes “streaming replication” constrasts with &lt;a href="https://www.postgresql.org/docs/current/warm-standby.html"&gt;“log shipping”&lt;/a&gt;. It means the standby opens a connection to the primary and constantly pulls data via the &lt;a href="https://www.postgresql.org/docs/current/protocol-replication.html"&gt;streaming replication protocol&lt;/a&gt;. In log shipping, the primary has an &lt;code&gt;archive_command&lt;/code&gt; to copy WAL files where the standby can read them. (In fact you usually combine these methods.) That said, this page is at the same time equating streaming replication with physical. You &lt;em&gt;could&lt;/em&gt; set up a standby with logical replication, but that’s not what it’s describing.&lt;/p&gt;

&lt;p&gt;In another place, “streaming” means the streaming replication protocol in contrast to &lt;a href="https://www.postgresql.org/docs/current/logicaldecoding-sql.html"&gt;SQL commands&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Perhaps most often, “streaming replication” means “physical replication” in contrast to logical. For instance &lt;a href="https://www.postgresql.org/docs/current/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS"&gt;this note about logical replication slots&lt;/a&gt; should perhaps &lt;code&gt;s/streaming/physical/&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PostgreSQL also has streaming replication slots (see Section 26.2.5), but they are used somewhat differently there.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Likewise the three opening paragraphs of the &lt;a href="https://www.postgresql.org/docs/current/runtime-config-replication.html"&gt;page for “Replication” configuration&lt;/a&gt; seem to constrast streaming replication to logical replication.&lt;/p&gt;

&lt;p&gt;So just watch out when you’re reading! More precise language would contast “physical” with “logical”, and be clear that both are “streaming.” I’m hopeful from that mailing list thread so far that we might start moving the docs in that direction.&lt;/p&gt;

&lt;h2 id="logical_replication"&gt;Logical Replication&lt;/h2&gt;

&lt;p&gt;Logical replication is built on top of logical decoding and replication slots. But instead of working at such a low level, it replicates changes from one Postgres table to another (usually in another cluster). The publisher side uses a built-in output plugin named &lt;code&gt;pgoutput&lt;/code&gt;. To set it up, you use &lt;a href="https://www.postgresql.org/docs/current/sql-createpublication.html"&gt;&lt;code&gt;CREATE PUBLICATION&lt;/code&gt;&lt;/a&gt; on the sender and &lt;a href="https://www.postgresql.org/docs/current/sql-createsubscription.html"&gt;&lt;code&gt;CREATE SUBSCRIPTION&lt;/code&gt;&lt;/a&gt; on the receiver. Like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# CREATE PUBLICATION p FOR TABLE t1, t2;
. . .
receiver=# CREATE SUBSCRIPTION s CONNECTION 'host=yonder dbname=thisnthat' PUBLICATION p;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But first you’ll need to have tables in the receiving database named &lt;code&gt;t1&lt;/code&gt; and &lt;code&gt;t2&lt;/code&gt;, with at least enough columns to match their sources.&lt;/p&gt;

&lt;p&gt;There are lots of options. For the publication, you can create it &lt;code&gt;FOR ALL TABLES&lt;/code&gt; or &lt;code&gt;FOR TABLES IN SCHEMA s1, s2&lt;/code&gt;. You can publish a subset of event types (&lt;code&gt;insert&lt;/code&gt;, &lt;code&gt;update&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;, &lt;code&gt;truncate&lt;/code&gt;). You can publish a subset of each table’s columns. You can do special things with partitions. You can have a &lt;code&gt;WHERE&lt;/code&gt; clause to publish only certain rows (assuming the publication is for a single table).&lt;/p&gt;

&lt;p&gt;The subscription side has lots of options too. See the docs above for details.&lt;/p&gt;

&lt;p&gt;Once you have a publication and subscription created, you will see an entry in the sender’s &lt;code&gt;pg_stat_replication&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sender=# select * from pg_stat_replication \gx
-[ RECORD 1 ]----+------------------------------
pid              | 44344
usesysid         | 10
usename          | paul
application_name | s
client_addr      | NULL
client_hostname  | NULL
client_port      | -1
backend_start    | 2024-10-13 11:37:18.361684-05
backend_xmin     | NULL
state            | streaming
sent_lsn         | 0/11DE0AB8
write_lsn        | 0/11DE0AB8
flush_lsn        | 0/11DE0AB8
replay_lsn       | 0/11DE0AB8
write_lag        | NULL
flush_lag        | NULL
replay_lag       | NULL
sync_priority    | 0
sync_state       | async
reply_time       | 2024-10-13 11:37:48.419264-05&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Unlike physical replication, logical replication does not transfer DDL changes.&lt;/p&gt;

&lt;h3 id="synchronous_replication"&gt;Synchronous Replication&lt;/h3&gt;

&lt;p&gt;Both logical decoding and logical replication can be &lt;em&gt;synchronous&lt;/em&gt;. This means we can configure the primary to report successful commits only after hearing from the standby(s) that they were successful. (You can also do this with physical replication.) To enable this, first set &lt;code&gt;synchronous_standby_names&lt;/code&gt; to a comma-separated list of names you will wait for. Then set &lt;code&gt;synchronous_commit&lt;/code&gt; to &lt;code&gt;remote_apply&lt;/code&gt;, &lt;code&gt;on&lt;/code&gt;, &lt;code&gt;remote_write&lt;/code&gt;, or &lt;code&gt;local&lt;/code&gt;. (The default is &lt;code&gt;on&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;&lt;code&gt;remote_apply&lt;/code&gt; is the most thorough: the standbys must flush the commit record to disk and apply it (so that other connections can see it). &lt;code&gt;on&lt;/code&gt; also requires flushing to disk (e.g. WAL), but it needn’t be applied. &lt;code&gt;remote_write&lt;/code&gt; requires standbys to perform the write to disk, but they need not have confirmation from the OS that it’s been flushed. And &lt;code&gt;local&lt;/code&gt; will not wait for standbys. (This is sort of pointless if you put things in &lt;code&gt;synchronous_standby_names&lt;/code&gt;, but maybe it lets you disable synchronous replication temporarily without erasing that other setting.)&lt;/p&gt;

&lt;p&gt;The names to put in &lt;code&gt;synchronous_standby_names&lt;/code&gt; should match the connection’s &lt;code&gt;application_name&lt;/code&gt;, one of the parameters available when opening any connection. You can set it in the connection string if you like. Otherwise it defaults to the subscription name. (For physical replication, standbys omitting this from their connection string default to their &lt;code&gt;cluster_name&lt;/code&gt; or failing that &lt;code&gt;walsender&lt;/code&gt;.) You can see the &lt;code&gt;application_name&lt;/code&gt; in &lt;code&gt;pg_stat_replication&lt;/code&gt; above matches the subscription name.&lt;/p&gt;

&lt;p&gt;If you have more than one standby listed as synchronous, the primary will wait on confirmation from &lt;em&gt;all&lt;/em&gt; of them (by default: you can do more nuanced things with &lt;code&gt;synchronous_standby_names&lt;/code&gt; if you like).&lt;/p&gt;

&lt;p&gt;If you are using synchronous logical replication, this is an &lt;a href="https://www.postgresql.org/docs/current/logicaldecoding-synchronous.html#LOGICALDECODING-SYNCHRONOUS-OVERVIEW"&gt;important warning&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A synchronous replica receiving changes via logical decoding will work in the scope of a single database. Since, in contrast to that, &lt;code&gt;synchronous_standby_names&lt;/code&gt; currently is server wide, this means this technique will not work properly if more than one database is actively used.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;According to &lt;a href="https://github.com/postgres/postgres/commit/3cb828dbe26087e7754f49f3cfe3ed036d5af439"&gt;commit 3cb828dbe2&lt;/a&gt; the problem is a potential deadlock from locking catalog tables. The commit and the docs have the specific lock sequences that are dangerous, and they don’t seem hard to avoid.&lt;/p&gt;

&lt;p&gt;My first impression on reading the note was a different scenario though. Consider: you have two databases, both replicated with logical replication, but each to a different standby. You make a change in one database, and now you’re waiting for both standbys to confirm. But one never got the change so will never answer.&lt;/p&gt;

&lt;p&gt;But in fact Postgres is smarter than that and only waits for the standbys that are relevant. I tested these scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One database in the sender cluster replicates to a database in the receiver cluster, and I update a table in a different database in the sender cluster. (I am “actively using” it.)&lt;/li&gt;

&lt;li&gt;Two databases in the sender cluster each replicate to the same database in the receiver cluster.&lt;/li&gt;

&lt;li&gt;Two databases in the sender cluster each replicate to databases in different clusters.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id="logical_replication_combined_with_physical_replication_failover"&gt;Logical Replication combined with Physical Replication Failover&lt;/h3&gt;

&lt;p&gt;If your primary cluster has both a physical replication standby and logical replication subscribers, and then you fail over to the standby, you need a way to point the logical subscribers at the newly-promoted cluster (i.e. the former standby). You want to do that without losing data.&lt;/p&gt;

&lt;p&gt;If (before the failure) you set &lt;a href="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-SYNC-REPLICATION-SLOTS"&gt;&lt;code&gt;sync_replication_slots&lt;/code&gt;&lt;/a&gt; to true on the physical standby, it will maintain the same slots the primary has, including keeping track of how far each has been read. That way when your logical subscribers connect to the new primary, they can resume where they left off.&lt;/p&gt;

&lt;p&gt;It is also a good idea to use &lt;a href="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-SYNCHRONIZED-STANDBY-SLOTS"&gt;&lt;code&gt;synchronized_standby_slots&lt;/code&gt;&lt;/a&gt;, to make sure the logical subscribers don’t get &lt;em&gt;ahead&lt;/em&gt; of the physical standby.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-09-15:/posts/2024/09/benchmarking-temporal-foreign-keys/</id>
    <title type="html">Benchmarking Temporal Foreign Keys</title>
    <published>2024-09-15T00:00:00Z</published>
    <updated>2024-09-15T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/09/benchmarking-temporal-foreign-keys/" type="text/html"/>
    <content type="html">
&lt;p&gt;Way back in Februrary Peter Eisentraut &lt;a href="https://www.postgresql.org/message-id/7bd1c8f9-a91a-41a3-990e-0f796ba692ec%40eisentraut.org"&gt;asked me&lt;/a&gt; if I’d tested the performance of &lt;a href="https://commitfest.postgresql.org/49/4308/"&gt;my patch to add temporal foreign keys to Postgres&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Have you checked that the generated queries can use indexes and have suitable performance? Do you have example execution plans maybe?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here is a report on the tests I made. I gave &lt;a href="/posts/2024/08/benchbase-and-temporal-foreign-keys-pdxpug-talk/"&gt;a talk about this&lt;/a&gt; last month at &lt;a href="https://pdxpug.wordpress.com/"&gt;pdxpug&lt;/a&gt;, but this blog post will be easier to access, and I’ll focus just on the foreign key results.&lt;/p&gt;

&lt;h2 id="method"&gt;Method&lt;/h2&gt;

&lt;p&gt;As far as I know there are no published benchmark schemas or workflows for temporal data. Since the tables require start/end columns, you can’t use an existing benchmark like &lt;a href="https://www.tpc.org/tpch/"&gt;TCP-H&lt;/a&gt;. The tables built in to &lt;a href="https://www.postgresql.org/docs/current/pgbench.html"&gt;pgbench&lt;/a&gt; are no use either. I’m not even sure where to find a public dataset. The closest is something called “Incumben”, mentioned in the &lt;a href="https://www.zora.uzh.ch/id/eprint/62963/1/p433-dignos.pdf"&gt;“Temporal Alignment” paper&lt;/a&gt;. They authors say it has 85,857 entries for job assignments across 49,195 employees at the University of Arizona—but I can’t find any trace of it online. (I’ll update here if I hear back from them about it.)&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://github.com/pjungwir/benchbase/tree/temporal"&gt;a temporal benchmark of my own&lt;/a&gt; using &lt;a href="https://github.com/cmu-db/benchbase"&gt;CMU’s Benchbase framework&lt;/a&gt;. (Thanks to &lt;a href="https://markwkm.blogspot.com/"&gt;Mark Wong&lt;/a&gt; and &lt;a href="https://github.com/grantholly"&gt;Grant Holly&lt;/a&gt; for that recommendation!) It also uses employees and positions, both temporal tables with a &lt;code&gt;valid_at&lt;/code&gt; column (a &lt;code&gt;daterange&lt;/code&gt;). Each position has a reference to an employee, checked by a temporal foreign key. Primary and foreign keys have GiST indexes combining the integer part and the range part. Here is the DDL:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-sql"&gt;&lt;span class="class"&gt;CREATE&lt;/span&gt; &lt;span class="type"&gt;TABLE&lt;/span&gt; employees (
    id          &lt;span class="predefined-type"&gt;int&lt;/span&gt; GENERATED &lt;span class="keyword"&gt;BY&lt;/span&gt; &lt;span class="directive"&gt;DEFAULT&lt;/span&gt; &lt;span class="keyword"&gt;AS&lt;/span&gt; IDENTITY &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;,
    valid_at    daterange &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;,
    name        &lt;span class="predefined-type"&gt;text&lt;/span&gt; &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;,
    salary      &lt;span class="predefined-type"&gt;int&lt;/span&gt; &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;,
    &lt;span class="directive"&gt;PRIMARY&lt;/span&gt; &lt;span class="type"&gt;KEY&lt;/span&gt; (id, valid_at WITHOUT OVERLAPS)
);

&lt;span class="class"&gt;CREATE&lt;/span&gt; &lt;span class="type"&gt;TABLE&lt;/span&gt; positions (
    id          &lt;span class="predefined-type"&gt;int&lt;/span&gt; GENERATED &lt;span class="keyword"&gt;BY&lt;/span&gt; &lt;span class="directive"&gt;DEFAULT&lt;/span&gt; &lt;span class="keyword"&gt;AS&lt;/span&gt; IDENTITY &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;,
    valid_at    daterange &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;,
    name        &lt;span class="predefined-type"&gt;text&lt;/span&gt; &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;,
    employee_id &lt;span class="predefined-type"&gt;int&lt;/span&gt; &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;,
    &lt;span class="directive"&gt;PRIMARY&lt;/span&gt; &lt;span class="type"&gt;KEY&lt;/span&gt; (id, valid_at WITHOUT OVERLAPS),
    &lt;span class="directive"&gt;FOREIGN&lt;/span&gt; &lt;span class="type"&gt;KEY&lt;/span&gt; (employee_id, PERIOD valid_at) &lt;span class="keyword"&gt;REFERENCES&lt;/span&gt; employees (id, PERIOD valid_at)
);
&lt;span class="class"&gt;CREATE&lt;/span&gt; &lt;span class="type"&gt;INDEX&lt;/span&gt; idx_positions_employee_id &lt;span class="keyword"&gt;ON&lt;/span&gt; positions &lt;span class="keyword"&gt;USING&lt;/span&gt; gist (employee_id, valid_at);&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Naturally you can’t run that unless you’ve compiled Postgres with the temporal patches above.&lt;/p&gt;

&lt;p&gt;The benchmark has procedures that exercise foreign keys (update/delete employee, insert/update position). There are other procedures too: selecting one row, selecting many rows, inner join, outer join, semijoin, antijoin. I plan to add aggregates and set operations (union/except/intersect), as well as better queries for sequenced vs non-sequenced semantics. But right now the foreign key procedures are better developed than anything else. I also plan to change the SQL from rangetypes to standard SQL:2011 PERIODs, at least for non-Postgres RDBMSes. I’ll write more about all that later; this post is about foreign keys.&lt;/p&gt;

&lt;h3 id="_implementation"&gt;
&lt;code&gt;range_agg&lt;/code&gt; Implementation&lt;/h3&gt;

&lt;p&gt;Temporal foreign keys in Postgres are implemented like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-sql"&gt;&lt;span class="class"&gt;SELECT&lt;/span&gt; &lt;span class="integer"&gt;1&lt;/span&gt;
&lt;span class="keyword"&gt;FROM&lt;/span&gt;    (
  &lt;span class="class"&gt;SELECT&lt;/span&gt; pkperiodatt &lt;span class="keyword"&gt;AS&lt;/span&gt; r
  &lt;span class="keyword"&gt;FROM&lt;/span&gt;   [ONLY] pktable x
  &lt;span class="keyword"&gt;WHERE&lt;/span&gt;  pkatt1 = &lt;span class="error"&gt;$&lt;/span&gt;&lt;span class="integer"&gt;1&lt;/span&gt; [&lt;span class="keyword"&gt;AND&lt;/span&gt; ...]
  &lt;span class="keyword"&gt;AND&lt;/span&gt;    pkperiodatt &amp;amp;&amp;amp; &lt;span class="error"&gt;$&lt;/span&gt;n
&lt;span class="keyword"&gt;FOR&lt;/span&gt; &lt;span class="type"&gt;KEY&lt;/span&gt; SHARE &lt;span class="keyword"&gt;OF&lt;/span&gt; x
) x1
&lt;span class="keyword"&gt;HAVING&lt;/span&gt; &lt;span class="error"&gt;$&lt;/span&gt;n &amp;lt;&lt;span class="error"&gt;@&lt;/span&gt; range_agg(x1.r)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is very similar to non-temporal checks. The main difference is we use &lt;code&gt;range_agg&lt;/code&gt; to aggregate referenced records, since it may require their combination to satisfy the reference. For example if the employee got a raise in the middle of the position, neither employee record alone covers the position’s valid time:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-09/employee_position_fk.png" alt="Temporal foreign key"&gt;&lt;/p&gt;

&lt;p&gt;In our query, the &lt;code&gt;HAVING&lt;/code&gt; checks that the “sum” of the employee times covers the position time.&lt;/p&gt;

&lt;p&gt;A subquery is not logically required, but Postgres currently doesn’t allow &lt;code&gt;FOR KEY SHARE&lt;/code&gt; in a query with aggregations.&lt;/p&gt;

&lt;p&gt;I like this query because it works not just for rangetypes, but multiranges too. In fact we could easily support arbitrary types, as long as the user provides an opclass with an appropriate support function (similar to the &lt;code&gt;stratnum&lt;/code&gt; support function introduced for temporal primary keys). We would call that function in place of &lt;code&gt;range_agg&lt;/code&gt;. But how does it perform?&lt;/p&gt;

&lt;h3 id=""&gt;&lt;code&gt;EXISTS implementation&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;I compared this query with two others. The original implementation for temporal foreign keys appears on pages 128–129 of &lt;a href="https://www2.cs.arizona.edu/~rts/tdbbook.pdf"&gt;&lt;em&gt;Developing Time-Oriented Database Applications in SQL&lt;/em&gt; by Richard Snodgrass&lt;/a&gt;. I call this the “&lt;code&gt;EXISTS&lt;/code&gt; implementation”. Here is the SQL I used:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-sql"&gt;&lt;span class="class"&gt;SELECT&lt;/span&gt; &lt;span class="integer"&gt;1&lt;/span&gt;
&lt;span class="comment"&gt;-- There was a PK when the FK started:&lt;/span&gt;
&lt;span class="keyword"&gt;WHERE&lt;/span&gt; &lt;span class="keyword"&gt;EXISTS&lt;/span&gt;
  &lt;span class="class"&gt;SELECT&lt;/span&gt;  &lt;span class="integer"&gt;1&lt;/span&gt;
  &lt;span class="keyword"&gt;FROM&lt;/span&gt;    [ONLY] &amp;lt;pktable&amp;gt;
  &lt;span class="keyword"&gt;WHERE&lt;/span&gt;   pkatt1 = &lt;span class="error"&gt;$&lt;/span&gt;&lt;span class="integer"&gt;1&lt;/span&gt; [&lt;span class="keyword"&gt;AND&lt;/span&gt; ...]
  &lt;span class="keyword"&gt;AND&lt;/span&gt;     COALESCE(lower(pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;-Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
       &amp;lt;= COALESCE(lower(&lt;span class="error"&gt;$&lt;/span&gt;n), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;-Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
  &lt;span class="keyword"&gt;AND&lt;/span&gt;     COALESCE(lower(&lt;span class="error"&gt;$&lt;/span&gt;n), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;-Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
       &amp;lt;  COALESCE(upper(pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
)
&lt;span class="comment"&gt;-- There was a PK when the FK ended:&lt;/span&gt;
&lt;span class="keyword"&gt;AND&lt;/span&gt; &lt;span class="keyword"&gt;EXISTS&lt;/span&gt; (
  &lt;span class="class"&gt;SELECT&lt;/span&gt;  &lt;span class="integer"&gt;1&lt;/span&gt;
  &lt;span class="keyword"&gt;FROM&lt;/span&gt;    [ONLY] &amp;lt;pktable&amp;gt;
  &lt;span class="keyword"&gt;WHERE&lt;/span&gt;   pkatt1 = &lt;span class="error"&gt;$&lt;/span&gt;&lt;span class="integer"&gt;1&lt;/span&gt; [&lt;span class="keyword"&gt;AND&lt;/span&gt; ...]
  &lt;span class="keyword"&gt;AND&lt;/span&gt;     COALESCE(lower(pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;-Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
       &amp;lt;  COALESCE(upper(&lt;span class="error"&gt;$&lt;/span&gt;n), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
  &lt;span class="keyword"&gt;AND&lt;/span&gt;     COALESCE(upper(&lt;span class="error"&gt;$&lt;/span&gt;n), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
       &amp;lt;= COALESCE(upper(pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
)
&lt;span class="comment"&gt;-- There are no gaps in the PK:&lt;/span&gt;
&lt;span class="comment"&gt;-- (i.e. there is no PK that ends early,&lt;/span&gt;
&lt;span class="comment"&gt;-- unless a matching PK record starts right away)&lt;/span&gt;
&lt;span class="keyword"&gt;AND&lt;/span&gt; &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="keyword"&gt;EXISTS&lt;/span&gt; (
  &lt;span class="class"&gt;SELECT&lt;/span&gt;  &lt;span class="integer"&gt;1&lt;/span&gt;
  &lt;span class="keyword"&gt;FROM&lt;/span&gt;    [ONLY] &amp;lt;pktable&amp;gt; &lt;span class="keyword"&gt;AS&lt;/span&gt; pk1
  &lt;span class="keyword"&gt;WHERE&lt;/span&gt;   pkatt1 = &lt;span class="error"&gt;$&lt;/span&gt;&lt;span class="integer"&gt;1&lt;/span&gt; [&lt;span class="keyword"&gt;AND&lt;/span&gt; ...]
  &lt;span class="keyword"&gt;AND&lt;/span&gt;     COALESCE(lower(&lt;span class="error"&gt;$&lt;/span&gt;n), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;-Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
       &amp;lt;  COALESCE(upper(pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
  &lt;span class="keyword"&gt;AND&lt;/span&gt;     COALESCE(upper(pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
       &amp;lt;  COALESCE(upper(&lt;span class="error"&gt;$&lt;/span&gt;n), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
  &lt;span class="keyword"&gt;AND&lt;/span&gt;     &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="keyword"&gt;EXISTS&lt;/span&gt; (
    &lt;span class="class"&gt;SELECT&lt;/span&gt;  &lt;span class="integer"&gt;1&lt;/span&gt;
    &lt;span class="keyword"&gt;FROM&lt;/span&gt;    [ONLY] &amp;lt;pktable&amp;gt; &lt;span class="keyword"&gt;AS&lt;/span&gt; pk2
    &lt;span class="keyword"&gt;WHERE&lt;/span&gt;   pk1.pkatt1 = pk2.pkatt1 [&lt;span class="keyword"&gt;AND&lt;/span&gt; ...]
            &lt;span class="comment"&gt;-- but skip pk1.pkperiodatt &amp;amp;&amp;amp; pk2.pkperiodatt&lt;/span&gt;
    &lt;span class="keyword"&gt;AND&lt;/span&gt;     COALESCE(lower(pk2.pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;-Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
         &amp;lt;= COALESCE(upper(pk1.pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
            COALESCE(upper(pk1.pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
         &amp;lt;  COALESCE(upper(pk2.pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
  )
);&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The main idea here is that we check three things: (1) the referencing row is covered in the beginning, (2) it is covered in the end, (3) in between, the referenced row(s) have no gaps.&lt;/p&gt;

&lt;p&gt;I made a few changes to the original:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It can’t be a &lt;code&gt;CHECK&lt;/code&gt; constraint, since it references other rows.&lt;/li&gt;

&lt;li&gt;There is less nesting. The original is wrapped in a big &lt;code&gt;NOT EXISTS&lt;/code&gt; and looks for bad rows. Essentially it says “there are no invalid records.” In Postgres we check one referencing row at a time, and we give a result if it is valid. You could say we look for good rows. This also requires inverting the middle-layer &lt;code&gt;EXISTS&lt;/code&gt; and &lt;code&gt;NOT EXISTS&lt;/code&gt; predicates, and changing &lt;code&gt;OR&lt;/code&gt;s to &lt;code&gt;AND&lt;/code&gt;s. I’ve &lt;a href="https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/"&gt;often run into trouble with &lt;code&gt;OR&lt;/code&gt;&lt;/a&gt;, so this is probably fortunate.&lt;/li&gt;

&lt;li&gt;We have to “unwrap” the start/end times since they are stored in a rangetype. I could have used rangetype operators here, but I wanted to keep the adaptation as straightforward as possible, and the previous changes felt like a lot already. Unwrapping requires dealing with unbounded ranges, so I’m using plus/minus &lt;code&gt;Infinity&lt;/code&gt; as a sentinel. This is not perfectly accurate, since in ranges a null bound is “further out” than a plus/minus &lt;code&gt;Infinity&lt;/code&gt;. (Try &lt;code&gt;select '{(,)}'::datemultirange - '{(-Infinity,Infinity)}'::datemultirange&lt;/code&gt;.) But again, solving that was taking me too far from the original, and it’s fine for a benchmark.&lt;/li&gt;

&lt;li&gt;We need to lock the rows with &lt;code&gt;FOR KEY SHARE&lt;/code&gt; in the same way as above. We need to do this in each branch, since they may use different rows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given the complexity, I didn’t expect this query to perform very well.&lt;/p&gt;

&lt;h3 id="_implementation_2"&gt;
&lt;code&gt;lag&lt;/code&gt; implementation&lt;/h3&gt;

&lt;p&gt;Finally there is an implementation in &lt;a href="https://github.com/xocolatl/periods"&gt;Vik Fearing’s &lt;code&gt;periods&lt;/code&gt; extension&lt;/a&gt;. This is a lot like the &lt;code&gt;EXISTS&lt;/code&gt; implementation, except to check for gaps it uses the &lt;code&gt;lag&lt;/code&gt; window function. Here is the SQL I tested:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-sql"&gt;&lt;span class="class"&gt;SELECT&lt;/span&gt;  &lt;span class="integer"&gt;1&lt;/span&gt;
&lt;span class="keyword"&gt;FROM&lt;/span&gt;    (
  &lt;span class="class"&gt;SELECT&lt;/span&gt;  uk.uk_start_value,
          uk.uk_end_value,
          NULLIF(LAG(uk.uk_end_value) &lt;span class="keyword"&gt;OVER&lt;/span&gt;
            (&lt;span class="keyword"&gt;ORDER&lt;/span&gt; &lt;span class="keyword"&gt;BY&lt;/span&gt; uk.uk_start_value), uk.uk_start_value) &lt;span class="keyword"&gt;AS&lt;/span&gt; x
  &lt;span class="keyword"&gt;FROM&lt;/span&gt;   (
    &lt;span class="class"&gt;SELECT&lt;/span&gt;  coalesce(lower(x.pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;-Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;) &lt;span class="keyword"&gt;AS&lt;/span&gt; uk_start_value,
            coalesce(upper(x.pkperiodatt), &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;Infinity&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;) &lt;span class="keyword"&gt;AS&lt;/span&gt; uk_end_value
    &lt;span class="keyword"&gt;FROM&lt;/span&gt;    pktable &lt;span class="keyword"&gt;AS&lt;/span&gt; x
    &lt;span class="keyword"&gt;WHERE&lt;/span&gt;   pkatt1 = &lt;span class="error"&gt;$&lt;/span&gt;&lt;span class="integer"&gt;1&lt;/span&gt; [&lt;span class="keyword"&gt;AND&lt;/span&gt; ...]
    &lt;span class="keyword"&gt;AND&lt;/span&gt;     uk.pkperiodatt &amp;amp;&amp;amp; &lt;span class="error"&gt;$&lt;/span&gt;n
    &lt;span class="keyword"&gt;FOR&lt;/span&gt; &lt;span class="type"&gt;KEY&lt;/span&gt; SHARE &lt;span class="keyword"&gt;OF&lt;/span&gt; x
  ) &lt;span class="keyword"&gt;AS&lt;/span&gt; uk
) &lt;span class="keyword"&gt;AS&lt;/span&gt; uk
&lt;span class="keyword"&gt;WHERE&lt;/span&gt;   uk.uk_start_value &amp;lt; upper(&lt;span class="error"&gt;$&lt;/span&gt;n)
&lt;span class="keyword"&gt;AND&lt;/span&gt;     uk.uk_end_value &amp;gt;= lower(&lt;span class="error"&gt;$&lt;/span&gt;n)
&lt;span class="keyword"&gt;HAVING&lt;/span&gt;  &lt;span class="predefined"&gt;MIN&lt;/span&gt;(uk.uk_start_value) &amp;lt;= lower(&lt;span class="error"&gt;$&lt;/span&gt;n)
&lt;span class="keyword"&gt;AND&lt;/span&gt;     &lt;span class="predefined"&gt;MAX&lt;/span&gt;(uk.uk_end_value) &amp;gt;= upper(&lt;span class="error"&gt;$&lt;/span&gt;n)
&lt;span class="keyword"&gt;AND&lt;/span&gt;     array_agg(uk.x) FILTER (&lt;span class="keyword"&gt;WHERE&lt;/span&gt; uk.x &lt;span class="keyword"&gt;IS&lt;/span&gt; &lt;span class="keyword"&gt;NOT&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;) &lt;span class="keyword"&gt;IS&lt;/span&gt; &lt;span class="predefined-constant"&gt;NULL&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Again I had to make some adaptations to &lt;a href="https://github.com/xocolatl/periods/blob/328c1aaac731f44958b725fb02ca75186f501ce7/periods--1.2.sql#L2230-L2252"&gt;the original&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There is less nesting, for similar reasons as before.&lt;/li&gt;

&lt;li&gt;We unwrap the ranges, much like the &lt;code&gt;EXISTS&lt;/code&gt; version. Again there is an &lt;code&gt;Infinity&lt;/code&gt;-vs-null discrepancy, but it is harder to deal with since the query uses null entries in the &lt;code&gt;lag&lt;/code&gt; result to indicate gaps.&lt;/li&gt;

&lt;li&gt;I couldn’t resist using &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; instead of &lt;code&gt;&amp;lt;=&lt;/code&gt; and &lt;code&gt;&amp;gt;=&lt;/code&gt; in the most-nested part to find relevant rows. The change was sufficiently obvious, and if it makes a difference it should speed things up, so it makes the comparison a bit more fair.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I made a new branch rooted in my &lt;a href="https://github.com/pjungwir/postgresql/tree/valid-time"&gt;valid-time branch&lt;/a&gt;, and added &lt;a href="https://github.com/pjungwir/postgresql/tree/temporal-fk-comparison"&gt;an extra commit&lt;/a&gt; to switch between each implementation with a compile-tag flag. By default we still use &lt;code&gt;range_agg&lt;/code&gt;, but instead you can say &lt;code&gt;‑DRI_TEMPORAL_IMPL_LAG&lt;/code&gt; or &lt;code&gt;‑DRI_TEMPORAL_IMPL_EXISTS&lt;/code&gt;. I installed each implementation in a separate cluster, listening on port 5460, 5461, and 5462 respectively.&lt;/p&gt;

&lt;p&gt;I also included procedures in Benchbase to simply run the above queries as &lt;code&gt;SELECT&lt;/code&gt;s. Since we are doing quite focused microbenchmarking here, I thought that would be less noisy than doing the same DML for each implementation. It also means we can run a mix of all three implementations together: they use the same cluster, and if there is any noise on the machine it affects them all. If you look at my temporal benchmark code, you’ll see the same SQL but adapted for the &lt;code&gt;employees&lt;/code&gt;/&lt;code&gt;positions&lt;/code&gt; tables.&lt;/p&gt;

&lt;h2 id="hypothesis"&gt;Hypothesis&lt;/h2&gt;

&lt;p&gt;Here is the query plan for the &lt;code&gt;range_agg&lt;/code&gt; implementation:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Aggregate
  Filter: ('[2020-10-10,2020-12-12)'::daterange &amp;lt;@ range_agg(x1.r))
  -&amp;gt;  Subquery Scan on x1
    -&amp;gt;  LockRows
      -&amp;gt;  Index Scan using employees_pkey on employees x
        Index Cond: ((id = 500) AND (valid_at &amp;amp;&amp;amp; '[2020-10-10,2020-12-12)'::daterange))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It uses the index, and it all seems like what we’d want. It is not an &lt;code&gt;Index Only Scan&lt;/code&gt;, but that’s because we lock the rows. Non-temporal foreign keys are the same way. This should perform pretty well.&lt;/p&gt;

&lt;p&gt;Here is the query plan for the &lt;code&gt;EXISTS&lt;/code&gt; implementation:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Result
  One-Time Filter: ((InitPlan 1).col1 AND (InitPlan 2).col1 AND (NOT (InitPlan 4).col1))
  InitPlan 1
  -&amp;gt;  LockRows
    -&amp;gt;  Index Scan using employees_pkey on employees x
      Index Cond: ((id = 500) AND (valid_at &amp;amp;&amp;amp; '[2020-10-10,2020-12-12)'::daterange))
      Filter: ((COALESCE(lower(valid_at), '-infinity'::date) &amp;lt;= '2020-10-10'::date) AND ('2020-10-10'::date &amp;lt; COALESCE(upper(valid_at), 'infinity'::date)))
  InitPlan 2
  -&amp;gt;  LockRows
    -&amp;gt;  Index Scan using employees_pkey on employees x_1
      Index Cond: ((id = 500) AND (valid_at &amp;amp;&amp;amp; '[2020-10-10,2020-12-12)'::daterange))
      Filter: ((COALESCE(lower(valid_at), '-infinity'::date) &amp;lt; '2020-12-12'::date) AND ('2020-12-12'::date &amp;lt;= COALESCE(upper(valid_at), 'infinity'::date)))
  InitPlan 4
  -&amp;gt;  LockRows
    -&amp;gt;  Index Scan using employees_pkey on employees pk1
      Index Cond: ((id = 500) AND (valid_at &amp;amp;&amp;amp; '[2020-10-10,2020-12-12)'::daterange))
      Filter: (('2020-10-10'::date &amp;lt; COALESCE(upper(valid_at), 'infinity'::date)) AND (COALESCE(upper(valid_at), 'infinity'::date) &amp;lt; '2020-12-12'::date) AND (NOT EXISTS(SubPlan 3)))
      SubPlan 3
      -&amp;gt;  LockRows
        -&amp;gt;  Index Scan using employees_pkey on employees pk2
          Index Cond: (id = pk1.id)
          Filter: ((COALESCE(lower(valid_at), '-infinity'::date) &amp;lt;= COALESCE(upper(pk1.valid_at), 'infinity'::date)) AND (COALESCE(upper(pk1.valid_at), 'infinity'::date) &amp;lt; COALESCE(upper(valid_at), 'infinity'::date)))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That looks like a lot of work!&lt;/p&gt;

&lt;p&gt;And here is the plan for the &lt;code&gt;lag&lt;/code&gt; implementation:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Aggregate
  Filter: ((array_agg(uk.x) FILTER (WHERE (uk.x IS NOT NULL)) IS NULL) AND (min(uk.uk_start_value) &amp;lt;= '2020-10-10'::date) AND (max(uk.uk_end_value) &amp;gt;= '2020-12-12'::date))
  -&amp;gt;  Subquery Scan on uk
    Filter: ((uk.uk_start_value &amp;lt; '2020-12-12'::date) AND (uk.uk_end_value &amp;gt;= '2020-10-10'::date))
    -&amp;gt;  WindowAgg
      -&amp;gt;  Sort
        Sort Key: uk_1.uk_start_value
        -&amp;gt;  Subquery Scan on uk_1
          -&amp;gt;  LockRows
            -&amp;gt;  Index Scan using employees_pkey on employees x
              Index Cond: ((id = 500) AND (valid_at &amp;amp;&amp;amp; '[2020-10-10,2020-12-12)'::daterange))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This looks a lot like the &lt;code&gt;range_agg&lt;/code&gt; version. We still use our index. There is an extra &lt;code&gt;Sort&lt;/code&gt; step, but internally the &lt;code&gt;range_agg&lt;/code&gt; function must do much the same thing (if not something worse). Maybe the biggest difference (though a slight one) is aggregating twice.&lt;/p&gt;

&lt;p&gt;So I expect &lt;code&gt;range_agg&lt;/code&gt; to perform the best, with &lt;code&gt;lag&lt;/code&gt; a close second, and &lt;code&gt;EXISTS&lt;/code&gt; far behind.&lt;/p&gt;

&lt;p&gt;One exception may be a single referencing row that spans many referenced rows. If &lt;code&gt;range_agg&lt;/code&gt; is O(n&lt;sup&gt;2&lt;/sup&gt;), it should fall behind as the referenced rows increase.&lt;/p&gt;

&lt;h2 id="results"&gt;Results&lt;/h2&gt;

&lt;p&gt;I started by running a quick test on my laptop, an M2 Macbook Air with 16 GB of RAM. I tested the DML commands on each cluster, one after another. Then I checked the benchbase summary file for the throughput. The results were what I expected:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-09/throughput-comparison-2024-07-28.png" alt="early results"&gt;&lt;/p&gt;

&lt;p&gt;Similarly, &lt;code&gt;range_agg&lt;/code&gt; had the best latency at the 25th, 50th, 75th, 90th, and 99th percentiles:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-09/latency-comparison-2024-07-28.png" alt="latency comparison"&gt;&lt;/p&gt;

&lt;p&gt;But the difference throughout is pretty small, and at the time my Benchbase procedures used a lot of &lt;code&gt;synchronized&lt;/code&gt; blocks to ensure there were few foreign key failures, and that kind of locking seemed like it might throw off the results. I needed to do more than this casual check.&lt;/p&gt;

&lt;p&gt;I ran all the future benchmarks on my personal desktop, running Ubuntu 22.04.&lt;/p&gt;
&lt;!--
TODO: fill in these details
cpu
ram
nvme
linux version
postgres 18devel
--&gt;
&lt;p&gt;It was hard to make things reproducible, but I wrote various scripts as I went, and I tried to capture results. The repo for all that is &lt;a href="https://github.com/pjungwir/benchmarking-temporal-tables"&gt;here&lt;/a&gt;. My pdxpug talk above contains some reflections about improving my benchmark methodology.&lt;/p&gt;

&lt;p&gt;I also removed the &lt;code&gt;synchronized&lt;/code&gt; blocks and dealt with foreign key failures a better way (by categorizing them as errors but not raising an exception).&lt;/p&gt;

&lt;p&gt;The first more careful tests used the direct &lt;code&gt;SELECT&lt;/code&gt; statements.&lt;/p&gt;

&lt;p&gt;Again, the 95th percentile latency was what I expected:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-09/95th-latency-comparison-half-invalid.png" alt="95% latency comparison"&gt;&lt;/p&gt;

&lt;p&gt;But the winner for mean latency was &lt;code&gt;EXISTS&lt;/code&gt;!:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-09/mean-latency-comparison-half-invalid.png" alt="mean latency comparison"&gt;&lt;/p&gt;

&lt;p&gt;A clue was in the Benchbase output showing successful transactions vs errors. (The &lt;code&gt;Noop&lt;/code&gt; procedure is so can make the proportions 33/33/33/1 instead of 33/33/34.):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Completed Transactions:
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyRangeAgg/01      [72064] ********************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyLag/02           [71479] *******************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyExists/03        [71529] *******************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.Noop/04                         [ 4585] *****
Aborted Transactions:
&amp;lt;EMPTY&amp;gt;

Rejected Transactions (Server Retry):
&amp;lt;EMPTY&amp;gt;

Rejected Transactions (Retry Different):
&amp;lt;EMPTY&amp;gt;

Unexpected SQL Errors:
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyRangeAgg/01      [80861] ********************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyLag/02           [80764] *******************************************************************************
com.oltpbenchmark.benchmarks.temporal.procedures.CheckForeignKeyExists/03        [80478] *******************************************************************************&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;More than half of the transactions were an invalid reference.&lt;/p&gt;

&lt;p&gt;And if we put one of those into &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;, we see that most of the plan was never executed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Result (actual time=0.034..0.035 rows=0 loops=1)
  One-Time Filter: ((InitPlan 1).col1 AND (InitPlan 2).col1 AND (NOT (InitPlan 4).col1))
  InitPlan 1
  -&amp;gt;  LockRows (actual time=0.033..0.033 rows=0 loops=1)
    -&amp;gt;  Index Scan using employees_pkey on employees x (actual time=0.033..0.033 rows=0 loops=1)
      Index Cond: ((id = 5999) AND (valid_at &amp;amp;&amp;amp; '[2020-10-10,2020-12-12)'::daterange))
      Filter: ((COALESCE(lower(valid_at), '-infinity'::date) &amp;lt;= '2020-10-10'::date) AND ('2020-10-10'::date &amp;lt; COALESCE(upper(valid_at), 'infinity'::date)))
  InitPlan 2
  -&amp;gt;  LockRows (never executed)
    -&amp;gt;  Index Scan using employees_pkey on employees x_1 (never executed)
      Index Cond: ((id = 5999) AND (valid_at &amp;amp;&amp;amp; '[2020-10-10,2020-12-12)'::daterange))
      Filter: ((COALESCE(lower(valid_at), '-infinity'::date) &amp;lt; '2020-12-12'::date) AND ('2020-12-12'::date &amp;lt;= COALESCE(upper(valid_at), 'infinity'::date)))
  InitPlan 4
  -&amp;gt;  LockRows (never executed)
    -&amp;gt;  Index Scan using employees_pkey on employees pk1 (never executed)
      Index Cond: ((id = 5999) AND (valid_at &amp;amp;&amp;amp; '[2020-10-10,2020-12-12)'::daterange))
      Filter: (('2020-10-10'::date &amp;lt; COALESCE(upper(valid_at), 'infinity'::date)) AND (COALESCE(upper(valid_at), 'infinity'::date) &amp;lt; '2020-12-12'::date) AND (NOT EXISTS(SubPlan 3)))
      SubPlan 3
      -&amp;gt;  LockRows (never executed)
        -&amp;gt;  Index Scan using employees_pkey on employees pk2 (never executed)
          Index Cond: (id = pk1.id)
          Filter: ((COALESCE(lower(valid_at), '-infinity'::date) &amp;lt;= COALESCE(upper(pk1.valid_at), 'infinity'::date)) AND (COALESCE(upper(pk1.valid_at), 'infinity'::date) &amp;lt; COALESCE(upper(valid_at), 'infinity'::date)))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this example, the beginning of the referencing range wasn’t covered, so Postgres never had to check the rest. Essentially the query is &lt;code&gt;a AND b AND c&lt;/code&gt;, so Postgres can short-circuit the evaluation as soon as it finds &lt;code&gt;a&lt;/code&gt; to be false. Using &lt;code&gt;range_agg&lt;/code&gt; or &lt;code&gt;lag&lt;/code&gt; doesn’t allow this, because an aggregate/window function has to run to completion to get a result.&lt;/p&gt;

&lt;p&gt;As confirmation (a bit gratuitous to be honest), I ran the &lt;code&gt;EXISTS&lt;/code&gt; benchmark with this bpftrace script:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// Count how many exec nodes per query were required,
// and print a histogram of how often each count happens.
// Run this for each FK implementation separately.
// My hypothesis is that the EXISTS implementation calls ExecProcNode far fewer times,
// but only if the FK is invalid.

u:/home/paul/local/bench-*/bin/postgres:standard_ExecutorStart {
  @nodes[tid] = 0
}
u:/home/paul/local/bench-*/bin/postgres:ExecProcNode {
  @nodes[tid] += 1
}
u:/home/paul/local/bench-*/bin/postgres:standard_ExecutorEnd {
  @calls = hist(@nodes[tid]);
  delete(@nodes[tid]);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For &lt;code&gt;EXISTS&lt;/code&gt; I got this histogram when there were no invalid references:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@calls:
[0]                    6 |                                                    |
[1]                    0 |                                                    |
[2, 4)                 0 |                                                    |
[4, 8)            228851 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16)                1 |                                                    |
[16, 32)               1 |                                                    |
[32, 64)               2 |                                                    |
[64, 128)              2 |                                                    |
[128, 256)             2 |                                                    |
[256, 512)             5 |                                                    |&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But with 50%+ errors I got this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@calls:
[0]                    6 |                                                    |
[1]                    0 |                                                    |
[2, 4)            218294 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)            183438 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         |
[8, 16)              231 |                                                    |
[16, 32)               1 |                                                    |
[32, 64)               2 |                                                    |
[64, 128)              2 |                                                    |
[128, 256)             2 |                                                    |
[256, 512)             5 |                                                    |&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So more than half the time, Postgres ran the query with half the steps (maybe one-fourth).&lt;/p&gt;

&lt;p&gt;After tuning the random numbers to bring errors closer to 1%, I got results more like the original ones. Mean latency:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-09/mean-latency-comparison-mostly-valid.png" alt="mostly valid mean latency comparison"&gt;&lt;/p&gt;

&lt;p&gt;Median latency:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-09/median-latency-comparison-mostly-valid.png" alt="mostly valid median latency comparison"&gt;&lt;/p&gt;

&lt;p&gt;95th percentile latency:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-09/95th-latency-comparison-mostly-valid.png" alt="mostly valid 95% latency comparison"&gt;&lt;/p&gt;

&lt;h2 id="conclusions"&gt;Conclusions&lt;/h2&gt;

&lt;p&gt;All foreign key implementations have expected query plans. We use indexes where we should, etc.&lt;/p&gt;

&lt;p&gt;When most foreign key references are valid, &lt;code&gt;range_agg&lt;/code&gt; outperforms the other two implementations by a small but consistent amount. But with a large number of invalid references, &lt;code&gt;EXISTS&lt;/code&gt; is a lot faster.&lt;/p&gt;

&lt;p&gt;In most applications I’ve seen, foreign keys are used as guardrails, and we expect 99% of checks to pass (or more really). When using &lt;code&gt;ON DELETE CASCADE&lt;/code&gt; the situation is different, but these benchmarks are for &lt;code&gt;NO ACTION&lt;/code&gt; or &lt;code&gt;RESTRICT&lt;/code&gt;, and I don’t think &lt;code&gt;CASCADE&lt;/code&gt; affords the &lt;code&gt;EXISTS&lt;/code&gt; implementation the same shortcuts. So it seems right to optimize for the mostly-valid case, not the more-than-half-invalid case.&lt;/p&gt;

&lt;p&gt;These results are good news, because &lt;code&gt;range_agg&lt;/code&gt; is also more general: it supports multiranges and custom types.&lt;/p&gt;

&lt;h2 id="further_work"&gt;Further Work&lt;/h2&gt;

&lt;p&gt;There are more things I’d like to benchmark (and if I do I’ll update this post):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replace separate start/end comparisons with range operators in the &lt;code&gt;EXISTS&lt;/code&gt; and &lt;code&gt;lag&lt;/code&gt; implementations. I just need to make sure they still pass all the tests when I do that.&lt;/li&gt;

&lt;li&gt;Correct the &lt;code&gt;Infinity&lt;/code&gt;-vs-null discrepancy.&lt;/li&gt;

&lt;li&gt;Monitor the CPU and disk activity under each implementation and compare the results. I don’t think I’ll see any difference in disk, but CPU might be interesting.&lt;/li&gt;

&lt;li&gt;Compare different scale factors (i.e. starting number of employees/positions).&lt;/li&gt;

&lt;li&gt;Compare implementations when an employee is chopped into many small records, and a single position spans all of them. If &lt;code&gt;range_agg&lt;/code&gt; is O(n&lt;sup&gt;2&lt;/sup&gt;) that should be worse than the sorting in the other options.&lt;/li&gt;

&lt;li&gt;Compare temporal foreign keys to non-temporal foreign keys (based on B-tree indexes, not GiST). I’m not sure yet how to do this in a meaningful way. Of course b-trees are faster in general, but how do I use them to achieve the same primary key and foreign key constraints? Maybe the best way is to create the tables without constraints, give them only b-tree indexes, and run the direct &lt;code&gt;SELECT&lt;/code&gt; statements, not the DML.&lt;/li&gt;
&lt;/ul&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-08-26:/posts/2024/08/benchbase-documentation/</id>
    <title type="html">Benchbase Documentation</title>
    <published>2024-08-26T00:00:00Z</published>
    <updated>2024-08-26T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/08/benchbase-documentation/" type="text/html"/>
    <content type="html">
&lt;p&gt;&lt;a href="https://github.com/cmu-db/benchbase"&gt;Benchbase&lt;/a&gt; is a framework from Carnegie Mellon for benchmarking databases. It comes with support for about 20 benchmarks and about as many DBMSes.&lt;/p&gt;

&lt;p&gt;Benchbase started life as &lt;a href="https://github.com/oltpbenchmark/oltpbench"&gt;OLTPBench&lt;/a&gt; as was introduced in &lt;a href="http://www.vldb.org/pvldb/vol7/p277-difallah.pdf"&gt;an academic paper&lt;/a&gt; from 2014.&lt;/p&gt;

&lt;p&gt;Using Benchbase the last month, I found the documentation to be pretty shallow, so this is my effort to improve things. A lot of this material was covered in &lt;a href="/posts/2024/08/benchbase-and-temporal-foreign-keys-pdxpug-talk/"&gt;my pdxpug talk&lt;/a&gt; last week.&lt;/p&gt;

&lt;h1 id="running"&gt;Running&lt;/h1&gt;

&lt;p&gt;Benchbase is written in Java and uses Maven to build and use.&lt;/p&gt;

&lt;p&gt;Following their README, first you build a tarball for your DBMS like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;./mvnw clean package -P postgres&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then you expand the tarball and run a benchmark like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;cd target
tar xvzf benchbase-postgres.tgz
cd benchbase-postgres
java -jar benchbase.jar -b tpcc -c config/postgres/sample_tpcc_config.xml --create=true --load=true --execute=true&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;-b&lt;/code&gt; option says which benchmark you want to run.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;-c&lt;/code&gt; option points to a config file (covered &lt;a href="#configuration"&gt;below&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;--create&lt;/code&gt; option doesn’t run &lt;code&gt;CREATE DATABASE&lt;/code&gt;, but creates the schema for the benchmark.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;--load&lt;/code&gt; option fills the schema with its starting data. The time for this is not included in the benchmark results.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;--execute&lt;/code&gt; option actually runs the benchmark. I often ran &lt;code&gt;‑‑create=true ‑‑load=true ‑‑execute=false&lt;/code&gt; to populate a database named e.g. &lt;code&gt;benchbase_template&lt;/code&gt;, then &lt;code&gt;createdb -T benchbase_template benchbase&lt;/code&gt; to make a quick copy, then &lt;code&gt;‑‑create=false ‑‑load=false ‑‑execute=true&lt;/code&gt; to run the benchmark. That helps iteration time a lot when you have a big load. But for higher-quality results you should do it all in one go, after running &lt;code&gt;initdb&lt;/code&gt;, as Melanie Plageman points out in one of her talks. (Sorry, I haven’t been able to find the reference again, but if I do I’ll point a link here.)&lt;/p&gt;

&lt;p&gt;If you are writing Java code for your own benchmark, then this one-liner is a lot faster than all that tarball stuff:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;./mvnw clean compile exec:java -P postgres -Dexec.args="-b tpcc -c config/postgres/sample_tpcc_config.xml --create=true --load=true --execute=true"&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of course you can skip the &lt;code&gt;clean&lt;/code&gt; and &lt;code&gt;compile&lt;/code&gt; if you like.&lt;/p&gt;

&lt;p&gt;Unfortunately the &lt;code&gt;exec:java&lt;/code&gt; target has been broken since 2023, but I submitted &lt;a href="https://github.com/cmu-db/benchbase/pull/548"&gt;a pull request&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id="configuration"&gt;Configuration&lt;/h1&gt;

&lt;p&gt;The benchmark behavior is controlled by the XML config file. The most complete docs are in the original OLTPBench repo’s &lt;a href="https://github.com/oltpbenchmark/oltpbench/wiki#workload-descriptor"&gt;Github wiki&lt;/a&gt;, although if you read the paper you’ll learn many other things you can control with this file. You can also look at a &lt;a href="https://github.com/cmu-db/benchbase/blob/main/config/postgres/sample_tpcc_config.xml"&gt;sample config file&lt;/a&gt; for your benchmark + database.&lt;/p&gt;

&lt;p&gt;The file begins with connection details like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-xml"&gt;&lt;span class="tag"&gt;&amp;lt;type&amp;gt;&lt;/span&gt;POSTGRES&lt;span class="tag"&gt;&amp;lt;/type&amp;gt;&lt;/span&gt;
&lt;span class="tag"&gt;&amp;lt;driver&amp;gt;&lt;/span&gt;org.postgresql.Driver&lt;span class="tag"&gt;&amp;lt;/driver&amp;gt;&lt;/span&gt;
&lt;span class="tag"&gt;&amp;lt;url&amp;gt;&lt;/span&gt;jdbc:postgresql://localhost:5432/benchbase?sslmode=disable&lt;span class="entity"&gt;&amp;amp;amp;&lt;/span&gt;ApplicationName=tpcc&lt;span class="entity"&gt;&amp;amp;amp;&lt;/span&gt;reWriteBatchedInserts=true&lt;span class="tag"&gt;&amp;lt;/url&amp;gt;&lt;/span&gt;
&lt;span class="tag"&gt;&amp;lt;username&amp;gt;&lt;/span&gt;admin&lt;span class="tag"&gt;&amp;lt;/username&amp;gt;&lt;/span&gt;
&lt;span class="tag"&gt;&amp;lt;password&amp;gt;&lt;/span&gt;password&lt;span class="tag"&gt;&amp;lt;/password&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;isolation&amp;gt;&lt;/code&gt; element controls the transaction isolation level:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-xml"&gt;&lt;span class="tag"&gt;&amp;lt;isolation&amp;gt;&lt;/span&gt;TRANSACTION_SERIALIZABLE&lt;span class="tag"&gt;&amp;lt;/isolation&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can ask to reconnect after a connection failure:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-xml"&gt;&lt;span class="tag"&gt;&amp;lt;reconnectOnConnectionFailure&amp;gt;&lt;/span&gt;true&lt;span class="tag"&gt;&amp;lt;/reconnectOnConnectionFailure&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I haven’t investigated exactly how that is used.&lt;/p&gt;

&lt;p&gt;You can also open a new connection for every transaction:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-xml"&gt;&lt;span class="tag"&gt;&amp;lt;newConnectionPerTxn&amp;gt;&lt;/span&gt;true&lt;span class="tag"&gt;&amp;lt;/newConnectionPerTxn&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;By default that is false, but you may want to make it true if you are focusing on your database’s connection overhead.&lt;/p&gt;

&lt;h2 id="loading"&gt;Loading&lt;/h2&gt;

&lt;p&gt;Here are some elements that apply to the loading step (not the actual benchmark run):&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-xml"&gt;&lt;span class="tag"&gt;&amp;lt;scalefactor&amp;gt;&lt;/span&gt;1&lt;span class="tag"&gt;&amp;lt;/scalefactor&amp;gt;&lt;/span&gt;
&lt;span class="tag"&gt;&amp;lt;batchsize&amp;gt;&lt;/span&gt;128&lt;span class="tag"&gt;&amp;lt;/batchsize&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Each benchmark interprets &lt;code&gt;scalefactor&lt;/code&gt; in its own way. For TPC-C this is the number of warehouses. For Twitter you get 500 users and 20,000 tweets, multiplied by the &lt;code&gt;scalefactor&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then &lt;code&gt;batchsize&lt;/code&gt; just tells the loader how to combine insert statements, for a quicker load.&lt;/p&gt;

&lt;h2 id="execution"&gt;Execution&lt;/h2&gt;

&lt;p&gt;You also list all the “procedures” the benchmark is capable of (or just the ones you care about):&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-xml"&gt;&lt;span class="tag"&gt;&amp;lt;transactiontypes&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;transactiontype&amp;gt;&lt;/span&gt;
        &lt;span class="tag"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;NewOrder&lt;span class="tag"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;/transactiontype&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;transactiontype&amp;gt;&lt;/span&gt;
        &lt;span class="tag"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Payment&lt;span class="tag"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;/transactiontype&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;transactiontype&amp;gt;&lt;/span&gt;
        &lt;span class="tag"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;OrderStatus&lt;span class="tag"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;/transactiontype&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;transactiontype&amp;gt;&lt;/span&gt;
        &lt;span class="tag"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;Delivery&lt;span class="tag"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;/transactiontype&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;transactiontype&amp;gt;&lt;/span&gt;
        &lt;span class="tag"&gt;&amp;lt;name&amp;gt;&lt;/span&gt;StockLevel&lt;span class="tag"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;/transactiontype&amp;gt;&lt;/span&gt;
&lt;span class="tag"&gt;&amp;lt;/transactiontypes&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Each procedure is defined in a Java file.&lt;/p&gt;

&lt;p&gt;Besides &lt;code&gt;&amp;lt;name&amp;gt;&lt;/code&gt;, you can also include &lt;code&gt;&amp;lt;preExecutionWait&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;postExecutionWait&amp;gt;&lt;/code&gt; to give a delay in milliseconds before/after running the transaction. So this is one way to add “think time”.&lt;/p&gt;

&lt;p&gt;There is also a concept of “supplemental” procedures, but that is not controlled by the config file. Only the SEATS and AuctionMark benchmarks use it. From quickly scanning the code, I think it lets a benchmark define procedures without depending on the user to list them. They won’t be added to the normal transaction queue, but the benchmark can run them elsewhere as needed. For example SEATS uses its supplemental procedure to find out which airports/flights/etc were added in the load step, so it can use them.&lt;/p&gt;

&lt;p&gt;The top-level &lt;code&gt;&amp;lt;terminals&amp;gt;&lt;/code&gt; element controls the concurrency. This is how many simultaneous connections you want:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-xml"&gt;&lt;span class="tag"&gt;&amp;lt;terminals&amp;gt;&lt;/span&gt;1&lt;span class="tag"&gt;&amp;lt;/terminals&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But the real behavior comes from the &lt;code&gt;&amp;lt;works&amp;gt;&lt;/code&gt; element. This contains &lt;code&gt;&amp;lt;work&amp;gt;&lt;/code&gt; child elements, each one a “phase” of your benchmark. For example:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-xml"&gt;&lt;span class="tag"&gt;&amp;lt;works&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;work&amp;gt;&lt;/span&gt;
        &lt;span class="tag"&gt;&amp;lt;time&amp;gt;&lt;/span&gt;60&lt;span class="tag"&gt;&amp;lt;/time&amp;gt;&lt;/span&gt;
        &lt;span class="tag"&gt;&amp;lt;rate&amp;gt;&lt;/span&gt;10000&lt;span class="tag"&gt;&amp;lt;/rate&amp;gt;&lt;/span&gt;
        &lt;span class="tag"&gt;&amp;lt;weights&amp;gt;&lt;/span&gt;45,43,4,4,4&lt;span class="tag"&gt;&amp;lt;/weights&amp;gt;&lt;/span&gt;
    &lt;span class="tag"&gt;&amp;lt;/work&amp;gt;&lt;/span&gt;
&lt;span class="tag"&gt;&amp;lt;/works&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here was have one phase lasting 60 seconds.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;weights&amp;gt;&lt;/code&gt; refer to the &lt;code&gt;&amp;lt;transactiontypes&amp;gt;&lt;/code&gt; above. Each weight is a percentage giving the share of that procedure in the total transactions. They must add to 100%.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;&amp;lt;rate&amp;gt;&lt;/code&gt; gives the targeted transactions per second (per terminal). Mostly this is a way to slow things down, not to speed things up: it is another way to include “think time” in between transactions. If your run doesn’t achieve this rate, it’s not an error.&lt;/p&gt;

&lt;p&gt;Each phase can override the top-level concurrency with &lt;span style="white-space:nowrap"&gt;&lt;code&gt;&amp;lt;active_terminals&amp;gt;5&amp;lt;/active_terminals&amp;gt;&lt;/code&gt;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;Also you can let the phase start gradually with &lt;code&gt;&amp;lt;work arrival="poisson"&amp;gt;&lt;/code&gt;. The OLTP-Bench paper demonstrates this technique.&lt;/p&gt;

&lt;p&gt;In addition a benchmark may understand other XML elements. For example Twitter lets you give &lt;code&gt;&amp;lt;tracefile&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;tracefile2&amp;gt;&lt;/code&gt;, and the benchmark will use those to read tweet ids and user ids (respectively), which it will use as inputs for its transactions (but not every transaction type uses both).&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-08-26:/posts/2024/08/benchbase-and-temporal-foreign-keys-pdxpug-talk/</id>
    <title type="html">PDXPUG Talk: Benchbase and Temporal Foreign Keys</title>
    <published>2024-08-26T00:00:00Z</published>
    <updated>2024-08-26T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/08/benchbase-and-temporal-foreign-keys-pdxpug-talk/" type="text/html"/>
    <content type="html">
&lt;p&gt;Last Thursday I gave &lt;a href="https://illuminatedcomputing.com/pages/pdxpug2024-benchbase-and-temporal-foreign-keys/"&gt;a talk at PDXPUG about using Benchbase to compare the performance of temporal foreign keys&lt;/a&gt;. It was a lot of fun, and a really good turnout. There were even folks from Seattle and Bend. After listening for an hour, people stuck around and talked about databases and benchmarks for another two, then the last few holdouts went out for drinks for another hour and a half. At least half the audience were way more qualified to give the talk than me. To my surprise &lt;a href="http://smalldatum.blogspot.com/"&gt;Mark Callaghan&lt;/a&gt; was there, who has published database benchmarks non-stop for years.&lt;/p&gt;

&lt;p&gt;I had two major goals: &lt;a href="/posts/2024/08/benchbase-documentation/"&gt;to document how to use Benchbase&lt;/a&gt; and to report on comparing three implementations of temporal foreign keys. A couple minor goals were to share the start of a broader general-purpose benchmark for temporal databases and to talk about a benchmarking methodology, especially mistakes I made and how I tried to improve.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-07-17:/posts/2024/07/temporal-ops/</id>
    <title type="html">Temporal Ops</title>
    <published>2024-07-17T00:00:00Z</published>
    <updated>2024-07-17T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/07/temporal-ops/" type="text/html"/>
    <content type="html">
&lt;p&gt;One silver lining of &lt;a href="/posts/2024/07/temporal-reverted/"&gt;temporal primary &amp;amp; foreign keys getting reverted&lt;/a&gt; is I got to meet &lt;a href="https://github.com/hettie-d"&gt;Hettie Dombrovskaya&lt;/a&gt; and &lt;a href="https://www.red-gate.com/simple-talk/author/borisnovikov/"&gt;Boris Novikov&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I’ve been working with them to write SQL for various temporal operations not covered by the SQL:2011 standard. There is no support there for outer joins, semijoins, antijoins, aggregates, or set operations (&lt;code&gt;UNION&lt;/code&gt;, &lt;code&gt;INTERSECT&lt;/code&gt;, &lt;code&gt;EXCEPT&lt;/code&gt;). As far as I know no one has ever shown how to implement those operations in SQL. I have queries so far for outer join, semijoin, and antijoin, and I’m planning to include aggregates based on &lt;a href="https://www.red-gate.com/simple-talk/databases/postgresql/making-temporal-databases-work-part-2-computing-aggregates-across-temporal-versions/"&gt;this article by Boris&lt;/a&gt;. The set operations look pretty easy to me, so hopefully I’ll have those soon too.&lt;/p&gt;

&lt;p&gt;If you’re interested, the repo is &lt;a href="https://github.com/pjungwir/temporal_ops"&gt;on Github&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-07-14:/posts/2024/07/raspberry-pi-sprinklers/</id>
    <title type="html">Debugging the Sprinkler System</title>
    <published>2024-07-14T00:00:00Z</published>
    <updated>2024-07-14T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/07/raspberry-pi-sprinklers/" type="text/html"/>
    <content type="html">
&lt;p&gt;Saturday I debugged the sprinklers.&lt;/p&gt;

&lt;p&gt;I thought I turned them on two weeks ago, and I heard &lt;em&gt;someone’s&lt;/em&gt; sprinklers outside my window that next Monday morning at 5 a.m., but after a week of 100-degree days it was clear ours weren’t doing their job. I had skipped my usual routine of checking each line, unearthing the sunken heads, and replacing what had failed. So now I had to deal with it.&lt;/p&gt;

&lt;p&gt;Somehow after living here for ten years I still found two new heads I had never seen before. Here is a map I’ve kept for years, maybe since our first summer:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-07/sprinkler-map.webp" alt="sprinkler map"&gt;&lt;/p&gt;

&lt;p&gt;It has every sprinkler head I’ve seen. Going by the rate I charge clients, that map is worth thousands of dollars.&lt;/p&gt;

&lt;p&gt;In the bottom corner is the box where the water comes in from the street. There are more boxes where valves let water into each line.&lt;/p&gt;

&lt;p&gt;One year I came across a buried water spigot in the middle of the grass. Then I lost it again.&lt;/p&gt;

&lt;p&gt;But this was a valuable spigot. It was over by our raised beds, where there is no other convenient water. You have to drag a hose from across the yard to water there. In 2022 I borrowed a neighbor’s metal detector. I still couldn’t find it. Finally I tore up the grass with a shovel, probing what must have been a 20’ x 20’ area, until finally I heard a metal clink. I extended the pipe and topped it with a copper rabbit spigot I won as a kid at the Redlands Garden Show for a potted cactus garden. I’ve carried that rabbit with me for 35 years, waiting for a chance to use it.&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-07/rabbit-spigot.webp" alt="rabbit spigot"&gt;&lt;/p&gt;

&lt;p&gt;That was two years ago. It’s on my map.&lt;/p&gt;

&lt;p&gt;So why is our grass dying?&lt;/p&gt;

&lt;p&gt;Naturally &lt;a href="https://github.com/pjungwir/raspi-sprinklers"&gt;I run our sprinklers off a raspberry pi&lt;/a&gt;. I set it up years ago, back in 2016. The controller that came with the house was dying. Two-thirds of the time when I tried to water line 12 or 13, line 4 or 5 would turn on instead. (Yes, we have 13 sprinkler lines. It’s a big yard.) Almost always it was off by 8, or sometimes 4: pretty clearly some loose wires. Why spend fifty bucks to replace it when I could spend days building my own? Look, at least there is no Kubernetes or CI pipeline, okay?&lt;/p&gt;

&lt;p&gt;There were raspi sprinkler products you could buy, and I think I saw an open source project, but that didn’t seem like fun. I wanted control and flexbility. I wanted power. I wanted Raspbian, Python, and cron.&lt;/p&gt;

&lt;p&gt;Here is my script, called &lt;code&gt;sprinkle&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/usr/bin/env python

# sprinkle - Raspberry Pi sprinkler controller

import time
import RPi.GPIO as GPIO
import sys, signal

# Your sprinkler lines:
# Your sprinkler line 1 goes in array position 0,
# then sprinkler line 2 goes in array position 1,
# etc.
# Each value is the Raspi GPIO pin
# you will connect to that line.
# So if you say
#   sprinkler_lines = [6, 19]
# then you should connect pin 6 to sprinker line 1,
# and pin 19 to sprinker line 2.
# sprinkler_lines = [21, 20, 16, 12, 25, 24, 23, 26, 19, 13, 6, 5, 22]
sprinkler_lines = [23, 24, 25, 16, 12, 20, 21, 22, 5, 6, 13, 19, 26]

def usage(err_code):
    print("USAGE: sprinkle.py &amp;lt;sprinkler_line&amp;gt; &amp;lt;number_of_minutes&amp;gt;")
    sys.exit(err_code)

def int_or_usage(str):
    try:
        return int(str)
    except ValueError:
        usage(1)

if len(sys.argv) != 3:
    usage(1)

sprinkler_line    = int_or_usage(sys.argv[1])
number_of_minutes = int_or_usage(sys.argv[2])

if sprinkler_line &amp;lt; 1 or sprinkler_line &amp;gt; len(sprinkler_lines):
    print("I only know about sprinkler lines 1 to %d." % len(sprinkler_lines))
    sys.exit(1)

if number_of_minutes &amp;lt; 1 or number_of_minutes &amp;gt; 30:
    print("I don't want to run the sprinklers for %d minutes." % number_of_minutes)
    sys.exit(1)


def exit_gracefully(signal, frame):
    GPIO.cleanup()
    sys.exit(0)
signal.signal(signal.SIGINT, exit_gracefully)


active_pin = sprinkler_lines[sprinkler_line - 1]
GPIO.setmode(GPIO.BCM)
for pin in sprinkler_lines:
    GPIO.setup(pin, GPIO.OUT)
    GPIO.output(pin, False)
GPIO.output(active_pin, True)
time.sleep(60 * number_of_minutes)
GPIO.output(active_pin, False)
exit_gracefully(None, None)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That is a lot of code but it turns on one GPIO pin, sleeps a while, then turns it off. Near the top you can see an array that maps sprinkler lines to GPIO pins. I kept the old sprinkler numbering, so it matches the notes the old owners left us. Array position &lt;code&gt;n&lt;/code&gt; means sprinker line &lt;code&gt;n+1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then I have a higher-level script I run each morning out of cron, which does the front on even days and the back on odd. It logs when it starts and finishes, which has helped me a lot:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/usr/bin/env python

# do-yard - Run sprinklers for the whole yard.
# We do the front yard on even days and the back yard on odd days.

import time
from subprocess import call

t = time.localtime()
if t.tm_yday % 2:
    print("%s: Starting the back" % time.strftime("%Y-%m-%d %H:%M:%S", t))
    # odd days we do the back yard:
    for line in [4, 5, 6, 7, 8, 12]:
        call(["/home/pi/sprinkle", str(line), "5"])
        time.sleep(60)
    print("%s: Finished the back" % time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))

else:
    print("%s: Starting the front" % time.strftime("%Y-%m-%d %H:%M:%S", t))
    # even days we do the front yard (and a little bit of the back):
    for line in [1, 2, 3, 9, 10, 11, 13]:
        call(["/home/pi/sprinkle", str(line), "5"])
        time.sleep(60)
    print("%s: Finished the front" % time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The hard part was figuring out the wiring. I’ve never gone much further than Ohm’s Law. For a long time I was stuck working out how to drive the sprinkler valves. Sprinkler valves use a solenoid to open and shut. In my garage, 13 colored wires come out of the ground, along with one neutral white wire to complete the circuit. Then plugged into the wall is an adapter to produce 24 volt AC, and two wires come out of that. In between used to be the old controller. It would send 24 VAC down whichever wire matched the spinkler line (&lt;code&gt;&amp;amp; ~(1 &amp;lt;&amp;lt; 3)&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The pi outputs 3.3 volts DC. At first &lt;a href="https://raspberrypi.stackexchange.com/questions/50435/driving-24vac-sprinkler-solenoids-through-uln2003a"&gt;I thought there was an integrated circuit that could convert the signal for me&lt;/a&gt;, but eventually I resigned myself to using a bank of relays:&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-07/raspi-sprinklers.webp" alt="raspi sprinklers"&gt;&lt;/p&gt;

&lt;p&gt;Oh also I never learned how to solder.&lt;/p&gt;

&lt;p&gt;A relay is a mechanical system. The AC power goes through, but it’s blocked by an open switch. The DC power is on another circuit, and it activates an electromagnet that closes the switch. When you turn on the signal, you see a red light, and the switch closing makes a loud click.&lt;/p&gt;

&lt;p&gt;A bank of 16 relays cost $12, almost as much as a sprinkler controller, so I really wanted my ICs to work out. Oh well.&lt;/p&gt;

&lt;p&gt;So today I started with checking the log. Well no, because the pi wasn’t responding to ssh again.&lt;/p&gt;

&lt;p&gt;It has always been tempermental. After a few hours the wifi dies, sometimes sooner. Pulling the plug for a moment fixes it, but then you have to wait while it boots. So I have to bring a laptop down to the garage, even just to check on things. Today I thought I would finally fix that.&lt;/p&gt;

&lt;p&gt;Other people &lt;a href="https://raspberrypi.stackexchange.com/questions/27475/wifi-disconnects-after-period-of-time-on-raspberry-pi-doesnt-reconnect"&gt;have the same problem&lt;/a&gt;. One reported culprit is power-saving mode. I checked and mine was running that way:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pi@raspberrypi:~ $ iw dev wlan0 get power_save
Power save: on&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The nicest &lt;a href="https://raspberrypi.stackexchange.com/questions/96606/make-iw-wlan0-set-power-save-off-permanent/96644#96644"&gt;advice I found&lt;/a&gt; was to disable it at boot with systemd. Just run this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sudo systemctl --full --force edit wifi_powersave@.service&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and in your editor enter—ugh, nano? That had to be fixed.&lt;/p&gt;

&lt;p&gt;Setting &lt;code&gt;EDITOR&lt;/code&gt; in root’s &lt;code&gt;~/.profile&lt;/code&gt; should do it.&lt;/p&gt;

&lt;p&gt;No? &lt;code&gt;~/.bashrc&lt;/code&gt; then?&lt;/p&gt;

&lt;p&gt;Still no? Back to Stack Overflow… .&lt;/p&gt;

&lt;p&gt;No clues. I guess I’m on my own.&lt;/p&gt;

&lt;p&gt;What is this &lt;code&gt;.selected_editor&lt;/code&gt; file in root’s home directory? Hmm, it already says vim.&lt;/p&gt;

&lt;p&gt;Is sudo even launching its command through a shell? Probably not, once I think of it. If it just execs the command directly, no wonder &lt;code&gt;~/.profile&lt;/code&gt; does nothing.&lt;/p&gt;

&lt;p&gt;More Stack Overflow. Most questions are about &lt;code&gt;visudo&lt;/code&gt;, and I see something called &lt;code&gt;sudoedit&lt;/code&gt;, and people are asking how to control which editor &lt;em&gt;that&lt;/em&gt; launches. (Why not just run the editor you want? The man page says it lets you keep your own editor configuration. Like my own &lt;code&gt;~/.vimrc&lt;/code&gt;? That’s cool. Really? How does that work?) But in my case the editor is getting launched by systemd. Surely we would have all been happier if we’d just gone with runit?&lt;/p&gt;

&lt;p&gt;Does root have &lt;code&gt;$SYSTEMD_EDITOR&lt;/code&gt; set? Surely not—no, too bad.&lt;/p&gt;

&lt;p&gt;Of course I could just edit the file myself, but it’s the principle of the thing.&lt;/p&gt;

&lt;p&gt;Okay, I give up:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sudo visudo -f /etc/sudoers.d/20_editor&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I typed this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Defaults env_keep += "editor EDITOR"&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So now when I run &lt;code&gt;sudo&lt;/code&gt;, it will pass along my own &lt;code&gt;$EDITOR&lt;/code&gt; choice.&lt;/p&gt;

&lt;p&gt;Is this a security hole? I can imagine some possible issues on a server, but for the pi in my garage it seems okay.&lt;/p&gt;

&lt;p&gt;Now systemd launches vim! Shamelessly I copy and pasted:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[Unit]
Description=Set WiFi power save %i
After=sys-subsystem-net-devices-wlan0.device

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/iw dev wlan0 set power_save %i

[Install]
WantedBy=sys-subsystem-net-devices-wlan0.device&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I’ve never seen this &lt;code&gt;%i&lt;/code&gt; thing before. The idea is it lets you do this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sudo systemctl disable wifi_powersave@off.service
sudo systemctl enable wifi_powersave@on.service&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;or this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sudo systemctl disable wifi_powersave@on.service
sudo systemctl enable wifi_powersave@off.service&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That’s cool.&lt;/p&gt;

&lt;p&gt;Oh, better not forget to run it now too:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sudo iw dev wlan0 set power_save off&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So I turned off power saving. Maybe that will fix the wifi.&lt;/p&gt;

&lt;p&gt;Let’s check the log file. Have the sprinklers been running?:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;2024-06-30 06:00:01: Starting the front
2024-06-30 06:42:03: Finished the front
2024-07-01 06:00:01: Starting the back
2024-07-01 06:36:03: Finished the back
2024-07-02 06:00:01: Starting the front
2024-07-02 06:42:02: Finished the front
2024-07-03 06:00:01: Starting the back
2024-07-03 06:36:02: Finished the back
2024-07-04 06:00:01: Starting the front
2024-07-04 06:42:03: Finished the front
2024-07-05 06:00:01: Starting the back
2024-07-05 06:36:03: Finished the back
2024-07-06 06:00:01: Starting the front
2024-07-06 06:42:03: Finished the front
2024-07-07 06:00:01: Starting the back
2024-07-07 06:36:02: Finished the back
2024-07-08 06:00:01: Starting the front
2024-07-08 06:42:03: Finished the front
2024-07-09 06:00:01: Starting the back
2024-07-09 06:36:03: Finished the back
2024-07-10 06:00:01: Starting the front
2024-07-10 06:42:02: Finished the front
2024-07-11 06:00:02: Starting the back
2024-07-11 06:36:03: Finished the back
2024-07-12 06:00:01: Starting the front
2024-07-12 06:42:03: Finished the front
2024-07-13 06:00:01: Starting the back
2024-07-13 06:36:03: Finished the back&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;They’ve been running all along! 40 minutes for the front, 30 for the back.&lt;/p&gt;

&lt;p&gt;But clearly they’re doing nothing. The pi is turning on a pin then just sitting there.&lt;/p&gt;

&lt;p&gt;So there must be a loose connection.&lt;/p&gt;

&lt;p&gt;I tried line 3: &lt;code&gt;./sprinkle 3 10&lt;/code&gt;. No red light, no click. Line 10. No red light, no click. Line 2. No red light, no click.&lt;/p&gt;

&lt;p&gt;I went upstairs to fetch my multimeter. Time to test connectivity and voltage.&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-07/multimeter.webp" alt="multimeter"&gt;&lt;/p&gt;

&lt;p&gt;How in the world did I wire this thing anyway?&lt;/p&gt;

&lt;p&gt;Then I noticed a couple red wire loops, connecting GPIO pins to the breadboard power rail, but detached now from the power rail. The pins both said 5V. (That tiny text was easier to read in 2016.) So those came loose? What if I put them back in again? I think I remember . . . wasn’t this supposed to power the relay?&lt;/p&gt;

&lt;p&gt;Trying my &lt;code&gt;sprinkle&lt;/code&gt; command again made the light come on! I must have missed the click though. Were the sprinklers running? No? What if I try a few lines? I’m really not hearing the click. But the light is on.&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-07/relays-lit.webp" alt="relay with red light"&gt;&lt;/p&gt;

&lt;p&gt;How does each relay work again? I set the multimeter to connectivity to probe each pair of posts. They were more connected than I expected. Was that bad? Okay I remember the white neutral wire running from one relay to another in series. And the colored wires go out and into the ground, one per relay.&lt;/p&gt;

&lt;p&gt;I remember something about those two little red wire loops. They really looked disconnected on purpose. They weren’t just loose, they were completely out of the breadboard.&lt;/p&gt;

&lt;p&gt;Is anything else loose? A bit, but when I fix it nothing changes.&lt;/p&gt;

&lt;p&gt;I remember those two red wires. They are supposed to give 5 volts to power the relay, but it never worked did it? It was supposed to, but it didn’t. Like the pi just didn’t have enough oomph. Or was the board supposed to power the pi?&lt;/p&gt;

&lt;p&gt;What are these other two thin black wires leaving the relay board? Where do they go? Off to the right, oh, to a power adapter! Two weeks ago I plugged in the adapter for the pi, and I plugged in the 24 VAC adapter, but the relays need power too, and they get it from the power strip over by the garage freezer.&lt;/p&gt;

&lt;p&gt;I guess this is why phone support asks if you’ve plugged it in.&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-07/sprinklers-running.webp" alt="sprinklers running"&gt;&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-07-05:/posts/2024/07/temporal-reverted/</id>
    <title type="html">Temporal Reverted</title>
    <published>2024-07-05T00:00:00Z</published>
    <updated>2024-07-05T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/07/temporal-reverted/" type="text/html"/>
    <content type="html">
&lt;p&gt;My work adding temporal primary keys and foreign keys to Postgres was &lt;a href="https://www.postgresql.org/message-id/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com"&gt;reverted from v17&lt;/a&gt;. The problem is empty ranges (and multiranges). An empty range doesn’t overlap anything, including another empty range. So &lt;code&gt;'empty' &amp;amp;&amp;amp; 'empty'&lt;/code&gt; is false. But temporal PKs are essentially an exclusion constraint using &lt;code&gt;(id WITH =, valid_at WITH &amp;amp;&amp;amp;)&lt;/code&gt;. Therefore you can insert duplicates, as long as the range is empty:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;INSERT INTO t (id, valid_at, name) VALUES (5, 'empty', 'foo');
INSERT INTO t (id, valid_at, name) VALUES (5, 'empty', 'bar');&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That might be okay for some users, but it surely breaks expectations for others. And it’s a questionable thing to do that we should probably just forbid. The SQL standard forbids empty &lt;code&gt;PERIOD&lt;/code&gt;s, so we should make sure that using plain ranges does the same. Adding a record with an empty application time doesn’t really have a meaning in the temporal model.&lt;/p&gt;

&lt;p&gt;I think this is a pretty small bump in the road. At &lt;a href="https://2024.pgconf.dev"&gt;the Postgres developers conference&lt;/a&gt; we found a good solution to excluding empty ranges. My original attempt used &lt;code&gt;CHECK&lt;/code&gt; constraints, but that had a lot of complications. Forbidding them in the executor is a lot simpler. I’ve already sent in &lt;a href="https://www.postgresql.org/message-id/56de0a38-77cc-48a8-bfa7-eb92fa57830b%40illuminatedcomputing.com"&gt;a new set of patches for v18&lt;/a&gt; that implement that change.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-04-06:/posts/2024/04/k3s-behind-nginx/</id>
    <title type="html">k3s Behind Nginx</title>
    <published>2024-04-06T00:00:00Z</published>
    <updated>2024-04-06T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/04/k3s-behind-nginx/" type="text/html"/>
    <content type="html">
&lt;p&gt;Here on &lt;a href="https://illuminatedcomputing.com"&gt;illuminatedcomputing.com&lt;/a&gt; I’ve got a bunch of sites served by nginx, but I’d like to run a little k3s cluster as well. The main benefit would be isolation. That is always helpful, but it especially matters for staging sites for some customers who don’t update very often.&lt;/p&gt;

&lt;p&gt;Instead of migrating everything all at once, I want to keep my host nginx but let it reverse proxy to k3s for sites running there. Then I will block direct traffic to k3s, so that there is only one way to get there. I realize this is not really a “correct” way to do k8s, but for a tiny setup like mine it makes sense. Maybe I should have just bought a separate box for k3s, but I find pushing tools a bit like this is a good way to learn how they really work, and that’s what happened here.&lt;/p&gt;

&lt;p&gt;It was harder than I thought. I found one or two people online seeking to do the same thing, but there were no good answers. I had to figure it out on my own, and now maybe this post will help someone else.&lt;/p&gt;

&lt;p&gt;The first step was to run k3s on other ports. I’m using the &lt;a href="https://kubernetes.github.io/ingress-nginx/"&gt;ingress-nginx ingress controller&lt;/a&gt; via a &lt;a href="https://artifacthub.io/packages/helm/ingress-nginx/ingress-nginx"&gt;Helm chart&lt;/a&gt;. In my &lt;code&gt;values.yaml&lt;/code&gt; I have it bind to 8080 and 8443 instead:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ingress-nginx:
  controller:
    enableHttp: true
    enableHttps: true
    service:
      ports:
        http: 8080
        https: 8443&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then I can see the Service is using those ports:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;paul@tal:~/src/illuminatedcomputing/k8s$ k get services -A
NAMESPACE         NAME                                         TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                         AGE
ingress           ingress-ingress-nginx-controller             LoadBalancer   10.43.91.109    107.150.34.82   8080:31333/TCP,8443:30702/TCP   7d20h
...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Setting up nginx to reverse proxy was also no problem. For example here is a private docker registry I’m running:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;server {
  listen 443 ssl;
  server_name docker.illuminatedcomputing.com;

  ssl_certificate ssl/docker.illuminatedcomputing.com.crt;
  ssl_certificate_key ssl/docker.illuminatedcomputing.com.key;

  location / {
    proxy_pass https://127.0.0.1:8443;
    proxy_set_header Host "docker.illuminatedcomputing.com";
  }
}

server {
  listen 80;
  server_name docker.illuminatedcomputing.com;

  location / {
    proxy_pass http://127.0.0.1:8080;
    proxy_set_header Host "docker.illuminatedcomputing.com";
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The only tricky part is the ssl cert. I already had the cluster built to get certs from LetsEncrypt with &lt;a href="https://cert-manager.io/"&gt;cert-manager&lt;/a&gt;. So I have a little cron script that pulls out the k8s Secret and puts it where the host nginx can find it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/bin/bash

exec &amp;gt; &amp;gt;(tee /var/log/update-k3s-ssl-certs.log) 2&amp;gt;&amp;amp;1

echo "$(date -Iseconds) starting"

set -eu

# Everything running in k8s needs to be proxied by nginx,
# so pull the ssl certs and drop them where nginx can find them.
# Do this every day so that we pick up LetsEncrypt renewals.

export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

# docker.illuminatedcomputing.com
kubectl get secret -n docker-registry docker-registry-tls -o json | jq -r '.data["tls.crt"] | @base64d' &amp;gt; /etc/nginx/ssl/docker.illuminatedcomputing.com.crt
kubectl get secret -n docker-registry docker-registry-tls -o json | jq -r '.data["tls.key"] | @base64d' &amp;gt; /etc/nginx/ssl/docker.illuminatedcomputing.com.key

# need to reload nginx to see new certs
systemctl reload nginx

echo "$(date -Iseconds) finished"&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Probably it would be easier to run certbot on the host and push the cert into k8s (or just terminate TLS), but using cert-manager is what I’d do for a customer, and I’m hopeful that eventually I’ll drop the reverse proxy altogether.&lt;/p&gt;

&lt;p&gt;So at this point connecting works:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl -v https://docker.illuminatedcomputing.com/v2/_catalog&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Of course it will be a 401 without the credentials, but you are still getting through to the service.)&lt;/p&gt;

&lt;p&gt;The problem is that this works too:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;curl -v https://docker.illuminatedcomputing.com:8443/v2/_catalog&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So how can I block that port from everything but the host nginx? I tried making the controller bind to just 127.0.0.1, e.g. with this config:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ingress-nginx:
  controller:
    config:
      bind-address: "127.0.0.1"
    enableHttp: true
    enableHttps: true
    service:
      externalIPs:
        - "127.0.0.1"
      ports:
        http: 8080
        https: 8443&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;bind-address&lt;/code&gt; line adds to a &lt;code&gt;ConfigMap&lt;/code&gt; used to generate the &lt;code&gt;nginx.conf&lt;/code&gt;. It doesn’t work though. The 127.0.0.1 is from the perspective of the controller pod, not the host 127.0.0.1.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;externalIPs&lt;/code&gt; (with or without &lt;code&gt;bind-address&lt;/code&gt;) also fails. When I add those two lines k3s gives this error:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Error: UPGRADE FAILED: cannot patch "ingress-ingress-nginx-controller" with kind Service: Service "ingress-ingress-nginx-controller" is invalid: spec.externalIPs[0]: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8, ::1/128)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So I gave up on that approach.&lt;/p&gt;

&lt;p&gt;But what about using iptables to block 8443 and 8080 from the outside? That’s probably simpler anyway—although k3s adds a big pile of its own iptables rules, and diving into that was a bit intimidating.&lt;/p&gt;

&lt;p&gt;The first thing I tried was putting a rule at the top of the &lt;code&gt;INPUT&lt;/code&gt; chain. I tried all these:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;iptables -I INPUT -p tcp \! -s 127.0.0.1 --dport 8443 -j DROP
iptables -I INPUT -p tcp \! -i lo --dport 8443 -j DROP
iptables -I INPUT -p tcp -i enp2s0 --dport 8443 -j DROP&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But none of those worked. I could still get through.&lt;/p&gt;

&lt;p&gt;At this point a friend asked ChatGPT for advice, but it wasn’t very helpful. It told me&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of having the ingress controller listen on an external IP or trying to make it listen only on 127.0.0.1, configure your host’s nginx to proxy_pass to your k3s services.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yes, I had explained I was doing that. Also:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You could create a network policy that only allows traffic to the ingress-nginx pods from within the cluster itself.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But that will block the reverse proxy too.&lt;/p&gt;

&lt;p&gt;So the cyber Pythia was not coming through for me. I was going to have to figure it out on my own. That meant coming to grips with all the rules k3s was installing.&lt;/p&gt;

&lt;p&gt;I started with adding some logging, for example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;iptables -I INPUT -p tcp -d 107.150.34.82 -j LOG --log-prefix '[PJPJPJ] '&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Tailing &lt;code&gt;/var/log/syslog&lt;/code&gt;, I could see messages for 443 requests, but nothing for 8443!&lt;/p&gt;

&lt;p&gt;So I took a closer look at the nat table (which is processed before the filter table), and I found some relevant rules:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

-A KUBE-EXT-2ZARXDYICCJUF4UZ -m comment --comment "masquerade traffic for ingress/ingress-ingress-nginx-controller:https external destinations" -j KUBE-MARK-MASQ
-A KUBE-EXT-2ZARXDYICCJUF4UZ -j KUBE-SVC-2ZARXDYICCJUF4UZ
-A KUBE-EXT-DBDMS67BVV2C2LTP -m comment --comment "masquerade traffic for ingress/ingress-ingress-nginx-controller:http external destinations" -j KUBE-MARK-MASQ
-A KUBE-EXT-DBDMS67BVV2C2LTP -j KUBE-SVC-DBDMS67BVV2C2LTP

-A KUBE-SEP-RQCBIXXO7M53R2WC -s 10.42.0.42/32 -m comment --comment "ingress/ingress-ingress-nginx-controller:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-RQCBIXXO7M53R2WC -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:https" -m tcp -j DNAT --to-destination 10.42.0.42:443
-A KUBE-SEP-TXLMBMTNQTOOKDI3 -s 10.42.0.42/32 -m comment --comment "ingress/ingress-ingress-nginx-controller:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-TXLMBMTNQTOOKDI3 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:http" -m tcp -j DNAT --to-destination 10.42.0.42:80

-A KUBE-SERVICES -d 107.150.34.82/32 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:https loadbalancer IP" -m tcp --dport 8443 -j KUBE-EXT-2ZARXDYICCJUF4UZ
-A KUBE-SERVICES -d 107.150.34.82/32 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:http loadbalancer IP" -m tcp --dport 8080 -j KUBE-EXT-DBDMS67BVV2C2LTP

-A KUBE-SVC-2ZARXDYICCJUF4UZ ! -s 10.42.0.0/16 -d 10.43.91.109/32 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:https cluster IP" -m tcp --dport 8443 -j KUBE-MARK-MASQ
-A KUBE-SVC-2ZARXDYICCJUF4UZ -m comment --comment "ingress/ingress-ingress-nginx-controller:https -&amp;gt; 10.42.0.42:443" -j KUBE-SEP-RQCBIXXO7M53R2WC
-A KUBE-SVC-DBDMS67BVV2C2LTP ! -s 10.42.0.0/16 -d 10.43.91.109/32 -p tcp -m comment --comment "ingress/ingress-ingress-nginx-controller:http cluster IP" -m tcp --dport 8080 -j KUBE-MARK-MASQ
-A KUBE-SVC-DBDMS67BVV2C2LTP -m comment --comment "ingress/ingress-ingress-nginx-controller:http -&amp;gt; 10.42.0.42:80" -j KUBE-SEP-TXLMBMTNQTOOKDI3&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you follow how that bounces around, it eventually gets rerouted to 10.42.0.42, either :443 or :80. So that’s why a connection to 8443 never hits the &lt;code&gt;INPUT&lt;/code&gt; chain.&lt;/p&gt;

&lt;p&gt;So the solution was to drop the traffic in the &lt;code&gt;nat&lt;/code&gt; table instead:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;root@www:~# iptables -I PREROUTING -t nat -p tcp -i enp2s0 --dport 8443 -j DROP
iptables v1.8.4 (legacy):
The "nat" table is not intended for filtering, the use of DROP is therefore inhibited.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Oops, just kidding!&lt;/p&gt;

&lt;p&gt;But instead I can just tell 8080 &amp;amp; 8443 to skip all the k3s rewriting:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;iptables -I PREROUTING -t nat -p tcp -i enp2s0 --dport 8443 -j RETURN
iptables -I PREROUTING -t nat -p tcp -i enp2s0 --dport 8080 -j RETURN&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now those &lt;em&gt;do&lt;/em&gt; show up on the &lt;code&gt;INPUT&lt;/code&gt; chain, but I don’t even need to &lt;code&gt;DROP&lt;/code&gt; them there. There is nothing actually listening on those ports. The controller is still binding to 443 and 80, and k3s is using iptables trickery to reroute connections to those ports. So those two lines above are sufficient, and someone connecting directly gets a &lt;code&gt;Connection refused&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To make this run each time the machine boots, I wrote a script at &lt;code&gt;/usr/local/bin/iptables-custom.sh&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;#!/bin/bash

# Installs some rules to prevent 8443 and 8080 from getting routed to k8s from the outside world,
# so that you must access them via our nginx reverse proxy.

(iptables -L -n -t nat | grep '^RETURN.*8443$' &amp;gt;/dev/null) || iptables -t nat -I PREROUTING -p tcp -i enp2s0 --dport 8443 -j RETURN
(iptables -L -n -t nat | grep '^RETURN.*8080$' &amp;gt;/dev/null) || iptables -t nat -I PREROUTING -p tcp -i enp2s0 --dport 8080 -j RETURN&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then I put this unit file at &lt;code&gt;/etc/systemd/system/iptables-custom.service&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[Unit]
Description=adds custom iptables rules after k3s has started
After=k3s.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/iptables-custom.sh

[Install]
WantedBy=default.target&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then I ran &lt;code&gt;systemctl daemon-reload&lt;/code&gt; and &lt;code&gt;systemctl enable iptables-custom&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That’s it! I hope this is helpful or you at least enjoyed the story.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-03-03:/posts/2024/03/sorting-socks/</id>
    <title type="html">Cozy Toes</title>
    <published>2024-03-03T00:00:00Z</published>
    <updated>2024-03-03T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/03/sorting-socks/" type="text/html"/>
    <content type="html">
&lt;p&gt;The kids and I love to play games on the weekend. &lt;s&gt;Our&lt;/s&gt; My favorite is Agricola, a board game about farming in the Middle Ages. We also play a lot of matching games. Saturday morning Elsa invented her own matching game using the cards from Abandon All Artichokes. Then I made up a matching game too. I called it “Cozy Toes”.&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-03/cozy-toes.jpg" alt="cozy toes"&gt;&lt;/p&gt;

&lt;p&gt;It was a money game. Whoever got the most pairs of matching socks won, and everyone else owed that person five cents for each pair they were short.&lt;/p&gt;

&lt;p&gt;I explained how you had to compare each sock to each other sock, which meant the work to find one sock’s match was as big as all the other socks. The work for all the socks—say there were &lt;em&gt;n&lt;/em&gt; of them—was &lt;em&gt;n&lt;/em&gt; times &lt;em&gt;n&lt;/em&gt;. So if you let the pile get too big, you have a lot of work to do.&lt;/p&gt;

&lt;p&gt;I suppose with several workers—say &lt;em&gt;m&lt;/em&gt; of them—the work was less, but I don’t know for sure what it was. In practice it is a hard job for many workers to share. In our own game there was a lot of contention.&lt;/p&gt;

&lt;p&gt;Later when they thought they had matched all the socks they could, and only odd singles remained, I asked how they could be sure. They decided to sort the socks from longest to shortest. Then they could see that there were no more matches. But they still compared every sock to every other.&lt;/p&gt;

&lt;p&gt;&lt;img src="/img/2024-03/unmatched-socks.jpg" alt="unmatched socks"&gt;&lt;/p&gt;

&lt;p&gt;If we keep playing this game maybe I will teach them how to bubble sort.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2024-01-24:/posts/2024/01/temporal-pks-merged/</id>
    <title type="html">Temporal PKs Merged!</title>
    <published>2024-01-24T00:00:00Z</published>
    <updated>2024-01-24T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2024/01/temporal-pks-merged/" type="text/html"/>
    <content type="html">
&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; My temporal patches &lt;a href="https://illuminatedcomputing.com/posts/2024/07/temporal-reverted/"&gt;were reverted from v17&lt;/a&gt;. Hopefully they will be accepted for v18 instead.&lt;/p&gt;

&lt;p&gt;Today first thing in the morning I saw that &lt;a href="https://www.postgresql.org/message-id/88518c81-dcdc-4c5b-9200-146b74a520ab%40eisentraut.org"&gt;the first part of my temporal tables work for Postgres got merged&lt;/a&gt;. It was two patches actually: a little one to add a new GiST support function and then the main patch adding support for temporal primary keys and unique constraints based on range types. The support for SQL:2011 PERIODs comes later; for now you must use ranges—although in my opinion that is better anyway. Also this patch allows multiranges or, keeping with Postgres’s long history of extensibility, any type with an overlaps operator. So unless some big problem appears, PKs and UNIQUE constraints are on track to be released in Postgres 17.&lt;/p&gt;

&lt;p&gt;Probably I can get (basic) foreign keys into v17 too. Temporal update/delete, foreign keys with CASCADE, and PERIODs will more likely take ’til 18.&lt;/p&gt;

&lt;p&gt;If you are interested in temporal features, early testing is always appreciated! :-)&lt;/p&gt;

&lt;p&gt;Getting this into Postgres has been a ten-year journey, and the rest of this post is going to be a self-indulgent history of that work. You’ve been warned. :-)&lt;/p&gt;

&lt;p&gt;It started in 2013 when I kept noticing my clients needed a better way to track the history of things that change over time, and I discovered &lt;a href="https://www2.cs.arizona.edu/~rts/publications.html"&gt;Richard Snodgrass’s book &lt;em&gt;Developing Time-Oriented Database Applications in SQL&lt;/em&gt;&lt;/a&gt;. He offered a rigorous, systematic approach, with working SQL solutions for everything. This was exactly what I needed. His approach was vastly better than the ad hoc history-tracking I’d seen so far. But no one had implemented any of it!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/CA+renyVepHxTO1c7dFbVjP1GYMUc0-3qDNWPN30-noo5MPyaVQ@mail.gmail.com"&gt;My first Postgres patch&lt;/a&gt; in 2015 was motivated by temporal databases: I added UUID support to the &lt;code&gt;btree_gist&lt;/code&gt; extension. A temporal primary key is basically an exclusion constraint on &lt;code&gt;(id WITH =, valid_at WITH &amp;amp;&amp;amp;)&lt;/code&gt;, and I had a project with UUID ids. But that exclusion constraint requires a GiST index that knows how to perform equal comparisons against the &lt;code&gt;id&lt;/code&gt; column and overlap comparisons against the &lt;code&gt;valid_at&lt;/code&gt; column. Out-the-box GiST indexes can’t do that (unless your ids are something weird like range types). If your ids are integers, you can install &lt;code&gt;btree_gist&lt;/code&gt; to create a GiST opclass that knows what integer &lt;code&gt;=&lt;/code&gt; means, but at the time UUIDs were not supported. So I started there. I liked that temporal databases had a manageable feature set and a manageable body of literature, so that even a working programmer like me could break new ground (not like Machine Learning or even Time Series databases). Nonetheless that patch took a year and a half to get committed, and it was really other people like Chris Bandy who finished it.&lt;/p&gt;

&lt;p&gt;I kept reading about temporal databases, and in 2017 I wrote &lt;a href="https://github.com/pjungwir/time_for_keys/commit/b0146278cf0d25a6d3f05a38e2ab2c6e8001c93c"&gt;a proof-of-concept for temporal foreign keys&lt;/a&gt;, mostly at AWS Re:Invent. I happened to be given a free registration &amp;amp; hotel room, but it was too late to register for any of the good talks. But all that time with nothing to do was fantastically productive, and I remember by the flight home I was adding tons of tests, trying to cover every feature permutation—ha, as if. A few days after I returned I also published my &lt;a href="https://illuminatedcomputing.com/posts/2017/12/temporal-databases-bibliography/"&gt;annotated bibliography&lt;/a&gt;, which I’ve updated many times since.&lt;/p&gt;

&lt;p&gt;In Snodgrass a temporal foreign key is a page-and-a-half of SQL, mostly because a referencing row may need more than one referenced row to completely cover its time span. But I realized we could make the check much simpler if we used an aggregate function to combine all the relevant rows in the referenced table first. So I wrote &lt;code&gt;range_agg&lt;/code&gt;, first &lt;a href="https://github.com/pjungwir/range_agg"&gt;as an extension&lt;/a&gt;, then &lt;a href="https://www.postgresql.org/message-id/16d71dc8-34cf-5ebd-1ce5-ccd93c0a14f9@illuminatedcomputing.com"&gt;as a core patch&lt;/a&gt;. Jeff Davis (who laid the foundation for temporal support with range types and exclusion constraints) said my function was too narrow and pushed me to implement &lt;a href="https://commitfest.postgresql.org/31/2112/"&gt;multiranges&lt;/a&gt;, a huge improvement. Again it took a year and a half, and I had trouble making consistent progress. There was a lot of work at the end by Alvaro Herrera and Alexander Korotkov (and I’m sure others) to get it committed. That was a few days before Christmas 2020.&lt;/p&gt;

&lt;p&gt;Although the Postgres review process can take a long time, I cherish how it pushes me to do better. As a consultant/freelancer I encounter codebases of, hmm, varying quality, and Postgres gives me an example of what high standards look like.&lt;/p&gt;

&lt;p&gt;One thing I still remember from reading &lt;a href="https://www.amazon.com/Programmers-Work-Interviews-Computer-Industry/dp/1556152116"&gt;&lt;em&gt;Programmers at Work&lt;/em&gt;&lt;/a&gt; many years ago was how many inteviewees said they tried to build things at a higher level of abstraction than they thought they’d need. I’ve seen enough over-engineered tangles and inner-platform effects that my own bias is much closer to YAGNI and keeping things concrete, but the advice in those interviews still prods me to discover good abstractions. The Postgres codebase is full of things like that, and really it’s such a huge project that strong organizing ideas are essential. Multiranges was a great example of how to take a concrete need and convert it into something more general-purpose. And I thought I was doing that already with &lt;code&gt;range_agg&lt;/code&gt;! I think one thing that makes an abstraction good is a kind of definiteness, something opinionated. So it is not purely general, but really adds something new. It always requires an act of creation.&lt;/p&gt;

&lt;p&gt;The coolest thing I’ve heard of someone doing with multiranges was &lt;a href="https://iopscience.iop.org/article/10.3847/1538-3881/ac5ab8"&gt;using them in astronomy to search for neutrinos, gravitational waves, and gamma-ray bursts&lt;/a&gt;. By using multiranges, they were able to compare observations with maps of the night sky “orders of magnitude faster” than with other implementations. (Hopefully I’ve got that right: I read a pre-print of the paper but it was not all easy for me to understand!)&lt;/p&gt;

&lt;p&gt;My first patch for an actual temporal feature was &lt;a href="https://www.postgresql.org/message-id/CA%2BrenyWxfXpThaOXiNuo6dEJQPYOWjysnXQw7_m7WJnNHVn_-g%40mail.gmail.com"&gt;primary keys&lt;/a&gt; back in 2018. Then foreign keys followed in 2019, just a couple weeks before I gave a talk at PgCon about temporal databases. By the end of the year I had &lt;code&gt;FOR PORTION OF&lt;/code&gt; as well. At first &lt;code&gt;FOR PORTION OF&lt;/code&gt; was implemented in the Executor Phase, but when I gave a progress report for PgCon 2020 I was already working on a trigger-based reimplementation, though it wasn’t submitted until June 2021. I also pulled in &lt;a href="https://www.postgresql.org/message-id/mSRBIYry-zk13wGeWVYCGw5o0LqZ4dyectlawH43VoLzQ70Tqa2oClVdkmQ1MlhG2lToRBkyY77g1o7vSGUwMS9BXvE-H-bg_x9bjx0DKNI%3D%40protonmail.com"&gt;work by Vik Fearing from 2018&lt;/a&gt; to support &lt;code&gt;ADD/DROP PERIOD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Soon after that progress got harder: my wife and I had our sixth baby in August, and somehow he seemed to be more work than the others. I took over daily math lessons (we homeschool), and I had to let go my biggest client, who needed more hours than I could give. (I’m proud to have given them an orderly transition over several months though.) In January 2022 Peter Eisentraut gave me a thorough review, but I went silent. Still, I had a lot of encouragement from the community, especially Corey Huinker, and eventually doing Postgres got easier again. I had a talk accepted for PgCon 2023, and I worked hard to submit new patches, which I did only weeks before the conference.&lt;/p&gt;

&lt;p&gt;The best part of PgCon was getting everyone who cared about my work together in the hallway to agree on the overall approach. I had worried for years about using ranges as well as PERIODs, since the standard doesn’t know anything about ranges. The second-best part was when someone told me I should stop calling myself a Postgres newbie.&lt;/p&gt;

&lt;p&gt;At PgCon Peter asked me to re-organize the patches, essentially implementing PERIODs as &lt;code&gt;GENERATED&lt;/code&gt; range columns. It made the code much nicer. I also went back to an Executor Phase approach for &lt;code&gt;FOR PORTION OF&lt;/code&gt;. Using triggers had some problems around updateable views and &lt;code&gt;READ COMMITTED&lt;/code&gt; transaction isolation.&lt;/p&gt;

&lt;p&gt;Since May I’ve felt more consistent than during my other Postgres work. I’ve been kept busy by excellent feedback by a meticulous reviewer, Jian He, who has caught many bugs. Often as soon as I get caught up, before I’ve written the email with the new patch files, he finds more things!&lt;/p&gt;

&lt;p&gt;Another thing that’s helped is going out once a week (for nearly a year now) to get early dinner then work on Postgres at a local bar. Somehow it’s much easier to do Postgres from somewhere besides my home office, where I do all my normal work. Getting dinner lets me read something related (lately &lt;a href="https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321"&gt;&lt;em&gt;Designing Data-Intensive Applications&lt;/em&gt; by Martin Klepmann&lt;/a&gt; and &lt;a href="https://postgrespro.com/community/books/internals"&gt;&lt;em&gt;PostgreSQL 14 Internals&lt;/em&gt; by Egor Rogov&lt;/a&gt;), and it’s fun. Doing just a little every week helps me keep momentum, so that fitting in further progress here and there seems easy. I’m lucky to have a wife who has supported it so often, despite leaving her with the kids and dishes.&lt;/p&gt;

&lt;p&gt;I think I have years more work of temporal features to add, first finishing SQL:2011 then going beyond (e.g. temporal outer joins, temporal aggregates, temporal upsert). It’s been a great pleasure!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2023-11-06:/posts/2023/11/git-for-postgres-hacking/</id>
    <title type="html">Git for Postgres Hacking</title>
    <published>2023-11-06T00:00:00Z</published>
    <updated>2023-11-06T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2023/11/git-for-postgres-hacking/" type="text/html"/>
    <content type="html">
&lt;p&gt;In Postgres development it’s normal for patch attempts to require many revisions and last a long time. I just sent in &lt;a href="https://www.postgresql.org/message-id/2d9740ad-3bb7-473c-8441-344351caa8ee%40illuminatedcomputing.com"&gt;v17&lt;/a&gt; of my &lt;a href="https://commitfest.postgresql.org/44/4308/"&gt;SQL:2011 application time patch&lt;/a&gt;. The commitfest entry dates back to summer of 2021, but it’s really a continuation of &lt;a href="https://www.postgresql.org/message-id/flat/20200930073908.GQ1996%40paquier.xyz#c63d0a97f3c7bee005a1840164f00688"&gt;this thread from 2018&lt;/a&gt;. And it’s not yet done.&lt;/p&gt;

&lt;p&gt;My &lt;a href="https://commitfest.postgresql.org/31/2112/"&gt;work on multiranges&lt;/a&gt; is a similar story: 1.5 years from first patch to committed.&lt;/p&gt;

&lt;p&gt;Today I saw &lt;a href="https://jvns.ca/blog/2023/11/06/rebasing-what-can-go-wrong-/"&gt;this post&lt;/a&gt; by Julia Evans about problems people have with git rebase (&lt;a href="https://news.ycombinator.com/item?id=38164046"&gt;also see the hn discussion&lt;/a&gt;), and it reminded me of my struggles handling long-lived branches.&lt;/p&gt;

&lt;p&gt;In my early days with git I avoided rebasing, because I wanted the history to be authentic. Nowaday I rebase pretty freely, both to move my commits on top of the latest &lt;code&gt;master&lt;/code&gt; branch work and to interactively clean things up so the commits show logical progress (with generous commit messages explaining the motivation and broad design decisions: the “why”).&lt;/p&gt;

&lt;p&gt;But in my paid client work, PRs get merged pretty fast. There is nothing like the multi-year wait of Postgres hacking. Often I’ve wished for more history there. It’s not my day job, so it’s hard to remember fine details about something from months or years ago. And I’ve changed direction a couple times, and sometimes I want a way to consult that old history.&lt;/p&gt;

&lt;p&gt;But with Postgres you don’t have any choice but to rebase. You send your patch files to a mailing list, and if they don’t apply cleanly no one will look at them. I’ve spent hours and hours rebasing patches because the underlying systems changed before they could get committed.&lt;/p&gt;

&lt;p&gt;With multiranges this was tough, but at least it was just one patch file. Application time is a series of five patches, which over time have changed order and evolved from four. When it’s time to send a new version, I run &lt;code&gt;git format-patch&lt;/code&gt;, which turns each commit into a &lt;code&gt;.patch&lt;/code&gt; file. So I need to wind up with five well-groomed commits rebased on the latest &lt;code&gt;master&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;My personal copy of the postgres repo on github has &lt;a href="https://github.com/pjungwir/postgresql/branches/yours"&gt;a bunch of silly-named branches&lt;/a&gt; for stashing work when I want to change direction, so the history isn’t totally lost. But for a long time I had no system. It feels like when you see a spreadsheet named &lt;code&gt;Annual Report - Copy of Jan 7.bak - final - FINAL.xls&lt;/code&gt;. After all these years it’s unmanageable. (Okay at least I know not to name any Postgres submission “final”! ;-)&lt;/p&gt;

&lt;p&gt;I think I finally found a way to keep history that works for me. On my main &lt;a href="https://github.com/pjungwir/postgresql/commits/valid-time"&gt;&lt;code&gt;valid-time&lt;/code&gt; branch&lt;/a&gt; I keep a series of commits for each small change. I rebase to move them up and down, so that they will squash cleanly into the five commits I need at the end. You can see that I have one main commit for each of the five patches, but each is followed by many commits named &lt;code&gt;fixup pks: fixed this&lt;/code&gt; or &lt;code&gt;fixup fks: feedback from so-and-so&lt;/code&gt;. I rebase on &lt;code&gt;master&lt;/code&gt; every so often. I force-push all the time, since no one else uses the repo. (I do work on both a laptop and a desktop though, so I have to remember to &lt;code&gt;git fetch &amp;amp;&amp;amp; git reset --hard origin/valid-time&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;When I’m ready to submit new patches, I take a snapshot with &lt;code&gt;git checkout -b valid-time-v17-pre-squash&lt;/code&gt; and “make a backup” with &lt;code&gt;git push -u&lt;/code&gt;. Then I make a branch to squash things (&lt;code&gt;git checkout -b valid-time-v17&lt;/code&gt;). I do a &lt;code&gt;git rebase -i HEAD~60&lt;/code&gt;, press &lt;code&gt;*&lt;/code&gt; on &lt;code&gt;pick&lt;/code&gt;, type &lt;code&gt;cw fixup&lt;/code&gt;, then &lt;code&gt;n.n.n.n.n.n.&lt;/code&gt;, etc. ’til I have just the five commits. Then I have a script to do a clean build + test on each commit, since I want things to work at every point. While that’s running I write the email about the new patch, and hopefully send it in.&lt;/p&gt;

&lt;p&gt;So now I’m capturing the fine-grained history that went into each submission, and that won’t change no matter how aggressively I rebase the current work. I’m pretty happy with this flow. I wish I had started years ago.&lt;/p&gt;

&lt;p&gt;One git feature I could &lt;em&gt;almost&lt;/em&gt; use is &lt;code&gt;git rebase -i --autosquash&lt;/code&gt;. (Here are some &lt;a href="https://thoughtbot.com/blog/autosquashing-git-commits"&gt;articles&lt;/a&gt; &lt;a href="https://andrewlock.net/smoother-rebases-with-auto-squashing-git-commits/"&gt;about&lt;/a&gt; &lt;a href="https://git-scm.com/docs/git-rebase"&gt;it&lt;/a&gt;.) If your commit messages are named &lt;code&gt;fixup! foo&lt;/code&gt;, then git will automatically set those commits to &lt;code&gt;fixup&lt;/code&gt;, not &lt;code&gt;pick&lt;/code&gt;, and it will move them to just below whatever commit matches &lt;code&gt;foo&lt;/code&gt;. I follow this pattern but with &lt;code&gt;fixup&lt;/code&gt; not &lt;code&gt;fixup!&lt;/code&gt;, to keep it all manual. At first I just didn’t trust it (or myself).&lt;/p&gt;

&lt;p&gt;Now I’m ready to move to this workflow, but I’m not sure how to “match” one of my five main commits. I want a meaningful title (i.e. the first line of the commit message) for each little commit, so I use short abbreviations for the patch they target, e.g. &lt;code&gt;fixup pks: Add documentation for pg_constraint.contemporal column&lt;/code&gt;. Git doesn’t know that it should match &lt;code&gt;pks&lt;/code&gt; to &lt;code&gt;Add temporal PRIMARY KEY and UNIQUE constraints&lt;/code&gt; and ignore everything after the colon. If there were a way to preserve tags after a rebase I think I could tag the main commit as &lt;code&gt;pks&lt;/code&gt; and it &lt;em&gt;might&lt;/em&gt; work (but maybe not with the extra stuff after the colon).&lt;/p&gt;

&lt;p&gt;You can have git generate the new commit message for you with &lt;code&gt;git commit --fixup $sha&lt;/code&gt;, but it just copies the whole title verbatim, which is not what I want. Also who wants to remember &lt;code&gt;$sha&lt;/code&gt; for those five parent commits? And finally, I want to move these commits into place immediately, so I can build &amp;amp; test against each patch as I work. Git can’t move them for me without squashing them.&lt;/p&gt;

&lt;p&gt;The Thoughtbot article linked above says you can use a regex, e.g. &lt;code&gt;git commit --fixup :/pks&lt;/code&gt;, but: (1) The regex is used immediately to find the parent, but it gets replaced with that parent’s title. It doesn’t stay in your commit message. (2) If you give an additional commit message, it goes two lines below the &lt;code&gt;fixup!&lt;/code&gt; line, so it’s not in the commit title. This only solves having to remember &lt;code&gt;$sha&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What I really want is &lt;code&gt;fixup! ^: blah blah blah&lt;/code&gt; where &lt;code&gt;^&lt;/code&gt; means “the closest non-squashed parent”, &lt;strong&gt;and&lt;/strong&gt; the &lt;code&gt;^&lt;/code&gt; is resolved at rebase time, not commit time, &lt;strong&gt;and&lt;/strong&gt; everything after the colon is not used for matching. (If it needs to be a regex then &lt;code&gt;:/.&lt;/code&gt; is sufficient too.)&lt;/p&gt;

&lt;p&gt;Anyway I’m using my manual process for now, since with vim I can change 60 &lt;code&gt;pick&lt;/code&gt;s to &lt;code&gt;fixup&lt;/code&gt; in a few seconds. I’m not willing to lose meaningful titles to save a few seconds with &lt;code&gt;fixup!&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Nonetheless it would be nice to have one less step I have to remember. Involuntarily I keep thinking about how I can make this feature work for me. If someone has a suggestion, please do let me know.&lt;/p&gt;

&lt;p&gt;Another approach is “stacked commits”. I went as far as installing &lt;a href="https://github.com/arxanas/git-branchless"&gt;git branchless&lt;/a&gt; and reading the docs &lt;a href="https://jg.gg/2018/09/29/stacked-diffs-versus-pull-requests/"&gt;and&lt;/a&gt; &lt;a href="https://kastiglione.github.io/git/2020/09/11/git-stacked-commits.html"&gt;some&lt;/a&gt; &lt;a href="https://andrewlock.net/working-with-stacked-branches-in-git-is-easier-with-update-refs/"&gt;articles&lt;/a&gt;, but to be honest I never went beyond a few tests, and I haven’t thought about it for a few months. It’s in the back of my head to give it a more honest effort.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2023-10-16:/posts/2023/10/rails-actionmailer/</id>
    <title type="html">Rails ActionMailer Internals</title>
    <published>2023-10-16T00:00:00Z</published>
    <updated>2023-10-16T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2023/10/rails-actionmailer/" type="text/html"/>
    <content type="html">
&lt;p&gt;When it comes to sending email in Rails, I’ve wondered for years about the gap between this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-ruby"&gt;&lt;span class="keyword"&gt;class&lt;/span&gt; &lt;span class="class"&gt;UserMailer&lt;/span&gt; &amp;lt; &lt;span class="constant"&gt;ApplicationMailer&lt;/span&gt;
  &lt;span class="keyword"&gt;def&lt;/span&gt; &lt;span class="function"&gt;welcome&lt;/span&gt;(user)
    &lt;span class="instance-variable"&gt;@user&lt;/span&gt; = user
    mail(&lt;span class="key"&gt;to&lt;/span&gt;: user.email)
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-ruby"&gt;&lt;span class="keyword"&gt;class&lt;/span&gt; &lt;span class="class"&gt;User&lt;/span&gt;
  &lt;span class="keyword"&gt;def&lt;/span&gt; &lt;span class="function"&gt;send_welcome_notification&lt;/span&gt;
    &lt;span class="constant"&gt;UserMailer&lt;/span&gt;.welcome(&lt;span class="predefined-constant"&gt;self&lt;/span&gt;).deliver_later
  &lt;span class="keyword"&gt;end&lt;/span&gt;
&lt;span class="keyword"&gt;end&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We are defining an &lt;em&gt;instance method&lt;/em&gt;, but we are calling a &lt;em&gt;class method&lt;/em&gt;. What’s going on there? I finally decided to take a closer look.&lt;/p&gt;

&lt;p&gt;Well naturally this is implemented by &lt;a href="https://github.com/rails/rails/blob/main/actionmailer/lib/action_mailer/base.rb#L628-L635"&gt;&lt;code&gt;method_missing&lt;/code&gt;&lt;/a&gt;. When you call &lt;code&gt;UserMailer.welcome&lt;/code&gt;, the class will call your instance method—sort of! Actually &lt;code&gt;method_missing&lt;/code&gt; just returns a &lt;a href="https://api.rubyonrails.org/v7.0.8/classes/ActionMailer/MessageDelivery.html"&gt;&lt;code&gt;MessageDelivery&lt;/code&gt;&lt;/a&gt; object, which provides lazy evaluation. It’s like a promise (but not asynchronous). Your method doesn’t get called until you resolve the “promise,” which normally would happen when you say &lt;code&gt;deliver_now&lt;/code&gt;. You can also call &lt;a href="https://api.rubyonrails.org/v7.0.8/classes/ActionMailer/MessageDelivery.html#method-i-message"&gt;&lt;code&gt;#message&lt;/code&gt;&lt;/a&gt; which must resolve the promise (and returns whatever your method returned—sort of!).&lt;/p&gt;

&lt;p&gt;What if you say &lt;code&gt;deliver_later&lt;/code&gt;? That &lt;em&gt;still&lt;/em&gt; doesn’t call your method. Instead it queues up a job, and later &lt;em&gt;that&lt;/em&gt; will say &lt;code&gt;deliver_now&lt;/code&gt; to finally call your method.&lt;/p&gt;

&lt;p&gt;But if you’re using Sidekiq (with &lt;code&gt;config.active_job.queue_adapter = :sidekiq&lt;/code&gt;), you might wonder how that &lt;code&gt;#welcome&lt;/code&gt; method works, since we’re passing a &lt;code&gt;User&lt;/code&gt; class and Sidekiq &lt;a href="https://github.com/sidekiq/sidekiq/wiki/Best-Practices#1-make-your-job-parameters-small-and-simple"&gt;can only serialize primitive types&lt;/a&gt;. But it does work! The trick is that Rails’ &lt;a href="https://github.com/rails/rails/blob/main/activejob/lib/active_job/queue_adapters/sidekiq_adapter.rb#L21-L26"&gt;queue adapter for Sidekiq&lt;/a&gt; does its own serialization before handing off the job to Sidekiq, and it tells Sidekiq to run &lt;a href="https://github.com/rails/rails/blob/main/activejob/lib/active_job/queue_adapters/sidekiq_adapter.rb#L66-L75"&gt;its own Worker subclass&lt;/a&gt; that will deserialize things correctly.&lt;/p&gt;

&lt;p&gt;All this assumes that your mailer method returns a &lt;a href="https://api.rubyonrails.org/classes/Mail/Message.html"&gt;&lt;code&gt;Mail::Message&lt;/code&gt;&lt;/a&gt; instance. That’s what &lt;a href="https://api.rubyonrails.org/v7.0.8/classes/ActionMailer/Base.html#method-i-mail"&gt;&lt;code&gt;#mail&lt;/code&gt;&lt;/a&gt; is giving you. But what if you &lt;em&gt;don’t&lt;/em&gt;? What if you call &lt;code&gt;mail&lt;/code&gt; but not as the last line of your method? What if you call it more than once?&lt;/p&gt;

&lt;p&gt;Well actually &lt;a href="https://github.com/rails/rails/blob/main/actionmailer/lib/action_mailer/base.rb#L870"&gt;&lt;code&gt;#mail&lt;/code&gt;&lt;/a&gt; (linking to the source code this time) remembers the message it generated, so even if you don’t return that from your own method, Rails will still send it properly. In fact it doesn’t matter what your own method returns!&lt;/p&gt;

&lt;p&gt;And if you call &lt;code&gt;#mail&lt;/code&gt; multiple times, then Rails will return early and do nothing for the second and third calls—sort of! If you pass any arguments or a block, then Rails will evaluate it again. But it still only knows how to store &lt;em&gt;one&lt;/em&gt; &lt;code&gt;Message&lt;/code&gt;. So when you finally call &lt;code&gt;deliver_now&lt;/code&gt;, only one email will go out (ask me how I know).&lt;/p&gt;

&lt;p&gt;Btw it turns out this is pretty much all documented on the &lt;a href="https://github.com/rails/rails/blob/main/actionmailer/lib/action_mailer/base.rb"&gt;&lt;code&gt;ActionMailer::Base&lt;/code&gt;&lt;/a&gt; class, but it’s not really covered in the &lt;a href="https://guides.rubyonrails.org/action_mailer_basics.html"&gt;Rails Guide&lt;/a&gt;, so I never came across it. I only found those docs when I decided to read the code. I don’t know if other Rails devs spend much time reading Rails’ own code, but I’ve found it helpful again and again. It’s not hard and totally worth it!&lt;/p&gt;

&lt;p&gt;Another trick I’ve used for years is &lt;code&gt;bundle show actionmailer&lt;/code&gt; (or in the old days &lt;code&gt;cd $(bundle show actionmailer)&lt;/code&gt;, before they broke that with a deprecation notice), and then you can add &lt;code&gt;pp&lt;/code&gt; or &lt;code&gt;binding.pry&lt;/code&gt; wherever you like. It’s a great way to test your understanding of what’s happening or discover the internals of something.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2023-09-29:/posts/2023/09/custom-postgres-ubuntu-style/</id>
    <title type="html">Custom Postgres Ubuntu Style</title>
    <published>2023-09-29T00:00:00Z</published>
    <updated>2023-09-29T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2023/09/custom-postgres-ubuntu-style/" type="text/html"/>
    <content type="html">
&lt;p&gt;Ubuntu has a very nice way of organizing multiple versions of Postgres. They all get their own directories, and the commands dispatch to the latest version or something else if you set the &lt;code&gt;PGCLUSTER&lt;/code&gt; envvar or give a &lt;code&gt;--cluster&lt;/code&gt; option. For instance if you have installed Postgres 14, you will see files in &lt;code&gt;/usr/lib/postgresql/14&lt;/code&gt; and &lt;code&gt;/usr/share/postgresql/14&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In Postgres a single installation is called a “cluster”. It has nothing to do with using multiple machines; it’s just the traditional term for the collection of configuration, data files, a postmaster process listening on a given port and its helper processes, etc.&lt;/p&gt;

&lt;p&gt;Elsewhere in the postgres world you say &lt;code&gt;initdb&lt;/code&gt; to create a cluster. In Ubuntu you say &lt;code&gt;pg_createcluster&lt;/code&gt;. By default Ubuntu creates a cluster named &lt;code&gt;main&lt;/code&gt; for each version you install. This gives you directories like &lt;code&gt;/etc/postgresql/14/main&lt;/code&gt; (for configuration) and &lt;code&gt;/var/lib/postgresql/14/main&lt;/code&gt; (for the data). The log file is &lt;code&gt;/var/log/postgresql/postgresql-14-main.log&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you want to run an old version of &lt;code&gt;pg_dump&lt;/code&gt;, you can say &lt;code&gt;PGCLUSTER=10/main pg_dump --version&lt;/code&gt; or &lt;code&gt;pg_dump --cluster=10/main --version&lt;/code&gt;. Likewise for &lt;code&gt;pg_restore&lt;/code&gt;, etc. (but—sidequest spolier alert—not &lt;code&gt;psql&lt;/code&gt; or a couple other things: see the footnote for more).&lt;/p&gt;

&lt;p&gt;One command that sadly &lt;em&gt;doesn’t&lt;/em&gt; support this is &lt;code&gt;pg_config&lt;/code&gt;, which is used to build custom extensions. Personally I just patch my local copy (or actually add a patched version earlier in the path, in my &lt;code&gt;~/bin&lt;/code&gt;), like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-bash"&gt;#!/bin/sh

# If postgresql-server-dev-* is installed, call pg_config from the latest
# available one. Otherwise fall back to libpq-dev's version.
#
# (C) 2011 Martin Pitt &amp;lt;mpitt@debian.org&amp;gt;
# (C) 2014-2016 Christoph Berg &amp;lt;myon@debian.org&amp;gt;
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.

set -e
PGBINROOT="/usr/lib/postgresql/"
#redhat# PGBINROOT="/usr/pgsql-"

# MY CHANGES START HERE
if [ -n "$PGCLUSTER" ]; then
  exec "$PGBINROOT/$PGCLUSTER/bin/pg_config" "$@"
fi
# MY CHANGES END HERE

LATEST_SERVER_DEV=`ls -v $PGBINROOT*/bin/pg_config 2&amp;gt;/dev/null|tail -n1`

if [ -n "$LATEST_SERVER_DEV" ]; then
    exec "$LATEST_SERVER_DEV" "$@"
else
    if [ -x /usr/bin/pg_config.libpq-dev ]; then
        exec /usr/bin/pg_config.libpq-dev "$@"
    else
        echo "You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application." &amp;gt;&amp;amp;2
        exit 1
    fi
fi&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Without those changes you can’t build custom C extensions against old versions of Postgres. I’ve mentioned this in the past in &lt;a href="https://stackoverflow.com/a/43403193"&gt;this Stackoverflow answer&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But that’s not what this post is about!&lt;/p&gt;

&lt;p&gt;This post is about compiling your own Postgres that you can manage like other Postgres versions on Ubuntu. I want an install that includes my temporal patches, so I can convert my timetracking app to use real temporal features. I want the files to live in the normal places, and I want it to start/stop the normal way.&lt;/p&gt;

&lt;p&gt;I’ve been hacking on Postgres for many years (Last May someone at PGCon told me I should stop calling myself a newbie. . . .), and I’ve always used &lt;code&gt;./configure --prefix=~/local ...&lt;/code&gt; to keep a dev installation. But I’ve never used it for anything durable. It’s just handy for &lt;code&gt;make installcheck&lt;/code&gt; and psql’ing and attaching a debugger. I blow it away all the time with &lt;code&gt;rm -rf ~/local/pgsql/data &amp;amp;&amp;amp; ~/local/bin/initdb -D ~/local/pgsql/data&lt;/code&gt;. I crash it all the time because that’s how it goes when I’m writing C. ;-) That’s not where my timetracking data should live.&lt;/p&gt;

&lt;p&gt;My first attempt was to build Postgres like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-bash"&gt;version=17devel
./configure \
  'CFLAGS=-ggdb -Og -g3 -fno-omit-frame-pointer' \
  --enable-tap-tests --enable-cassert --enable-debug \
  --prefix=/usr/lib/postgresql/${version} \
  --datarootdir=/usr/share/postgresql/${version}
make clean &amp;amp;&amp;amp; make world &amp;amp;&amp;amp; sudo make install-world&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(I might as well keep some dev stuff in there in case I need it.)&lt;/p&gt;

&lt;p&gt;Then as the &lt;code&gt;postgres&lt;/code&gt; user I tried this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;postgres@tal:~$ pg_createcluster 17devel main
Error: invalid version '17devel'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Alas!&lt;/p&gt;

&lt;p&gt;Ubuntu’s multi-version system is controlled by the &lt;code&gt;postgresql-common&lt;/code&gt; package, so I got the source for it by running &lt;code&gt;apt-get source postgresql-common&lt;/code&gt;. (You might need to uncomment a &lt;code&gt;deb-src&lt;/code&gt; line in &lt;code&gt;/etc/apt/sources.list&lt;/code&gt; and run &lt;code&gt;sudo apt-get update&lt;/code&gt;.) Grepping for “invalid version” I found the message in &lt;code&gt;pg_createcluster&lt;/code&gt; from these lines:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-perl"&gt;my ($version) = $ARGV[0] =~ /^(\d+\.?\d+)$/;
error "invalid version '$ARGV[0]'" unless defined $version;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Instead of fighting with the system I decided to call it version 30. It worked!&lt;/p&gt;

&lt;p&gt;Except I had one last problem:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;postgres@tal:~$ psql -p 5443
psql: error: connection to server on socket "/tmp/.s.PGSQL.5443" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The issue is that the postgresql-common infrastructure dispatches to the latest tools by default, and our “version 30” psql is looking in the wrong place for a socket file. In &lt;code&gt;postgresql.conf&lt;/code&gt; you can see this line:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;unix_socket_directories = '/var/run/postgresql' # comma-separated list of directories&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And taking a peek we have:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;paul@tal:~$ ls -A /var/run/postgresql/
10-main.pg_stat_tmp  13-main.pid           9.4-main.pg_stat_tmp  .s.PGSQL.5433.lock  .s.PGSQL.5437       .s.PGSQL.5440.lock
10-main.pid          14-main.pg_stat_tmp   9.4-main.pid          .s.PGSQL.5434       .s.PGSQL.5437.lock  .s.PGSQL.5441
11-main.pg_stat_tmp  14-main.pid           9.5-main.pg_stat_tmp  .s.PGSQL.5434.lock  .s.PGSQL.5438       .s.PGSQL.5441.lock
11-main.pid          15-main.pid           9.5-main.pid          .s.PGSQL.5435       .s.PGSQL.5438.lock  .s.PGSQL.5442
12-main.pg_stat_tmp  30-main.pid           9.6-main.pg_stat_tmp  .s.PGSQL.5435.lock  .s.PGSQL.5439       .s.PGSQL.5442.lock
12-main.pid          9.3-main.pg_stat_tmp  9.6-main.pid          .s.PGSQL.5436       .s.PGSQL.5439.lock  .s.PGSQL.5443
13-main.pg_stat_tmp  9.3-main.pid          .s.PGSQL.5433         .s.PGSQL.5436.lock  .s.PGSQL.5440       .s.PGSQL.5443.lock&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Yeah I run a lot of versions. :-)&lt;/p&gt;

&lt;p&gt;This is one way to fix the problem:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;postgres@tal:~$ PGCLUSTER=14/main psql -p 5443
psql (17devel)
Type "help" for help.

postgres=#&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But that’s too annoying, and the &lt;code&gt;\d&lt;/code&gt; commands are going to be broken because they won’t know how to query the latest &lt;code&gt;pg_*&lt;/code&gt; tables. (And by the way, why does psql still say it’s &lt;code&gt;17devel&lt;/code&gt;? I haven’t looked into that yet but it’s suspicious.&lt;sup&gt;1&lt;/sup&gt;) And in fact even using &lt;code&gt;PGCLUSTER=30/main psql&lt;/code&gt; still works!&lt;/p&gt;

&lt;p&gt;I think it’s a bug in this Perl code from &lt;code&gt;/usr/bin/psql&lt;/code&gt;:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-perl"&gt;# if only a port is specified, look for local cluster on specified port
if ($explicit_port and not $version and not $cluster and not $explicit_host and not $explicit_service) {
    LOOP: foreach my $v (reverse get_versions()) {
        foreach my $c (get_version_clusters $v) {
            my $p = get_cluster_port $v, $c;
            if ($p eq $explicit_port) {
                $version = $v;
                # set PGCLUSTER variable for information
                $ENV{PGCLUSTER} = "$version/$c";
                last LOOP;
            }
        }
    }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can see that it sets &lt;code&gt;$version&lt;/code&gt; but not &lt;code&gt;$cluster&lt;/code&gt; (just &lt;code&gt;$ENV{PGCLUSTER}&lt;/code&gt;). Later if &lt;code&gt;$cluster&lt;/code&gt; is set then it will look up the correct socket dir, but it’s only set if we’re explicit. Personally I’m fixing this by adding &lt;code&gt;$cluster = $c;&lt;/code&gt; right before the &lt;code&gt;$version = $v&lt;/code&gt; line. Then we’ll call &lt;code&gt;get_cluster_socketdir&lt;/code&gt; below. It might not be 100% correct but it is good enough for my purposes.&lt;/p&gt;

&lt;p&gt;So now I have a custom-patched Postgres running on Ubuntu! I see its &lt;code&gt;/etc&lt;/code&gt; files, its data files, and its log file. After &lt;code&gt;systemctl daemon-reload&lt;/code&gt; I can start it etc. So I think I’m all set. I’d just better re-run &lt;code&gt;./configure --prefix=~/local&lt;/code&gt; before I forget and re-install something broken on top of it. :-)&lt;/p&gt;

&lt;p&gt;If I run into more problems, I’ll update this post.&lt;/p&gt;
&lt;hr style="border-bottom: 1px solid #999; width: 50%"&gt;&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; Oh, the answer is simple. From &lt;code&gt;/usr/bin/psql&lt;/code&gt;:&lt;/p&gt;
&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-perl"&gt;# if we have no version yet, use the latest version. If we were called as psql,
# pg_archivecleanup, or pg_isready, always use latest version
if (not $version or $cmdname =~ /^(psql|pg_archivecleanup|pg_isready)$/) {
    my $max_version;
    if ($version and $version &amp;lt; 9.2) { # psql 15 only supports PG 9.2+
        $max_version = 14;
    }
    $version = get_newest_version($cmdname, $max_version);
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But that means most of the last paragraph was wrong. Since the non-self-compiled tools find the socket file just fine, there must be a better solution than patching &lt;code&gt;psql&lt;/code&gt; (which is technically &lt;code&gt;pg_wrapper&lt;/code&gt; btw). So we are not done. Stay tuned for the, ahem, sequel!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2023-06-29:/posts/2023/06/rails-dirty-methods/</id>
    <title type="html">Rails dirty methods</title>
    <published>2023-06-29T00:00:00Z</published>
    <updated>2023-06-29T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2023/06/rails-dirty-methods/" type="text/html"/>
    <content type="html">
&lt;p&gt;Rails has lots of methods to see what attributes have changed on your model. Some tell you the changes you haven’t yet saved; some, the changes you just saved. But the behavior and names of these attributes &lt;a href="https://blog.toshima.ru/2017/04/06/saved-change-to-attribute.html"&gt;have&lt;/a&gt; &lt;a href="https://www.fastruby.io/blog/rails/upgrades/active-record-5-1-api-changes.html"&gt;changed&lt;/a&gt; &lt;a href="https://www.bigbinary.com/blog/rails-6-1-adds-_previously_was-attribute-methods"&gt;over&lt;/a&gt; &lt;a href="https://blog.saeloun.com/2020/03/24/rails-attribute_name_previously_changed-accepts-from-and-to-arguments/"&gt;time&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I thought I had a handle on this until I saw &lt;code&gt;saved_change_to_attribute?&lt;/code&gt; and wondered how it differs from &lt;code&gt;attribute_previously_changed?&lt;/code&gt;. Turns out they are identical!&lt;/p&gt;

&lt;p&gt;Well sort of. The spelling I’m used to, &lt;code&gt;attribute_previously_changed?&lt;/code&gt;, comes from &lt;a href="https://github.com/rails/rails/blob/6-1-stable/activemodel/lib/active_model/dirty.rb#L187-L189"&gt;&lt;code&gt;ActiveModel::Dirty&lt;/code&gt;&lt;/a&gt; (and is a bit older), whereas &lt;code&gt;saved_change_to_attribute?&lt;/code&gt; is defined in &lt;a href="https://github.com/rails/rails/blob/6-1-stable/activerecord/lib/active_record/attribute_methods/dirty.rb#L51-L53"&gt;&lt;code&gt;ActiveRecord::AttributeMethods::Dirty&lt;/code&gt;&lt;/a&gt;. Not all ActiveModels are ActiveRecords. But in your ActiveRecord classes, they do the same thing.&lt;/p&gt;

&lt;p&gt;I’ve linked to Rails 6.1 here. They were nearly identical before that, but for a while one took extra options and the other didn’t. You have to go back to Rails 5.0 to get a more substantial difference, when we had &lt;code&gt;attribute_previously_changed?&lt;/code&gt; but not &lt;code&gt;saved_change_to_attribute?&lt;/code&gt;. They are still identical today in Rails 7. I’m surprised they don’t deprecate the ActiveRecord methods and just use ActiveModel.&lt;/p&gt;

&lt;p&gt;Just to give a quick catalog, here is the full set of methods. Anywhere you see &lt;code&gt;attribute&lt;/code&gt; you can replace it with the name of the attribute you care about (which just calls the generic method with its name as parameter).&lt;/p&gt;

&lt;p&gt;before you save:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;changes
changed_attributes              # can't replace "attribute"
attribute_change
attribute_changed?
attribute_was

changes_to_save
has_changes_to_save?
attributes_in_database          # can't replace "attribute"
attribute_in_database
changed_attribute_names_to_save # can't replace "attribute"
attribute_change_to_be_saved
will_save_change_to_attribute?&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;after you save:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;previous_changes
attribute_previous_change
attribute_previously_changed?
attribute_previously_was

saved_changes
saved_changes?
saved_change_to_attribute
saved_change_to_attribute?
attribute_before_last_save&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I’ve grouped the methods from each file, and you can see there are many synonyms.&lt;/p&gt;

&lt;p&gt;By the way if you are making heavy use of ActiveRecord callbacks and using these methods to trigger them (e.g. &lt;code&gt;after_commit :send_shipped_notification if: :shipped_at_previously_changed?&lt;/code&gt;), watch out! The conditions on these get evaluated one-by-one, so if some earlier callback saves further changes to the model, your old &lt;code&gt;previous_changes&lt;/code&gt; are lost! The callback you expect to get called just doesn’t. I’ve had to debug that failure way too many times.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2019-09-04:/posts/2019/08/sql2011-survey/</id>
    <title type="html">Survey of SQL:2011 Temporal Features</title>
    <published>2019-09-04T00:00:00Z</published>
    <updated>2019-09-04T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2019/08/sql2011-survey/" type="text/html"/>
    <content type="html">
&lt;h1 id="introduction"&gt;Introduction&lt;/h1&gt;

&lt;p&gt;This blog post is a survey of SQL:2011 Temporal features from MariaDB, IBM DB2, Oracle, and MS SQL Server. I’m working on adding temporal features to Postgres, so I wanted to see how other systems interpret the standard.&lt;/p&gt;

&lt;p&gt;If you’re new to temporal databases, you also might enjoy &lt;a href="https://github.com/pjungwir/postgres-temporal-talk"&gt;this talk&lt;/a&gt; I gave at PGCon 2019.&lt;/p&gt;

&lt;p&gt;In this post I cover both application-time (aka valid-time) and system-time, but I focus more on valid-time. Valid-time tracks the history of the thing “out there”, e.g. when a house was remodeled, when an employee got a raise, etc. System-time tracks the history of when you changed the database. In general system-time is more widely available, both as native SQL:2011 features and as extensions/plugins/etc., but is less interesting. It is great for compliance/auditing, but you’re unlikely to build application-level features on it. Also since it’s generated automatically you don’t need special DML commands for it, and it is less important to protect yourself with temporal primary and foreign keys.&lt;/p&gt;

&lt;p&gt;At this point all the major systems I survey have &lt;em&gt;some&lt;/em&gt; temporal support, although none of them support it completely. On top of that the standard itself is quite modest, although in some ways it can be interpreted more or less expansively.&lt;/p&gt;

&lt;h1 id="the_standard"&gt;The Standard&lt;/h1&gt;

&lt;p&gt;I’ll start by giving a quick overview of the standard. Here I’m working from the draft documents (downloaded from &lt;a href="https://modern-sql.com/standard"&gt;here&lt;/a&gt;), and my interpretation may not be correct. If you have any corrections please let me know! Also you can find a more complete description of the standard at &lt;a href="https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf"&gt;this article by Kulkarni and Michels&lt;/a&gt; (pdf).&lt;/p&gt;

&lt;p&gt;In SQL:2011 the gateway to temporal features is a &lt;code&gt;PERIOD&lt;/code&gt;, which is something you declare on your table. It is a range-like structure derived from two existing &lt;code&gt;date&lt;/code&gt; columns. (Actually the standard also supports &lt;code&gt;timestamp&lt;/code&gt; and &lt;code&gt;timestamp with time zone&lt;/code&gt;, but I’ll use &lt;code&gt;date&lt;/code&gt; as a synecdoche throughout this post.)&lt;/p&gt;

&lt;h2 id="periods"&gt;Periods&lt;/h2&gt;

&lt;p&gt;You can declare a valid-time &lt;code&gt;PERIOD&lt;/code&gt; when you create the table or afterwards:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  PERIOD FOR valid_at (valid_from, valid_til)
);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can call the &lt;code&gt;PERIOD&lt;/code&gt; whatever you like &lt;em&gt;except&lt;/em&gt; &lt;code&gt;SYSTEM_TIME&lt;/code&gt;, which is magical and enables system-time features. Both of the &lt;code&gt;PERIOD&lt;/code&gt;‘s source columns must be &lt;code&gt;NOT NULL&lt;/code&gt;, and if not they are automatically converted to it. (Most databases do the same thing with a &lt;code&gt;PRIMARY KEY&lt;/code&gt;.) Note that the &lt;code&gt;NOT NULL&lt;/code&gt; requirement means to represent “forever” or “until further notice” you must use a sentinel value like &lt;code&gt;3000-01-01&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Naturally a &lt;code&gt;PERIOD&lt;/code&gt; adds an implicit constraint that &lt;code&gt;valid_from&lt;/code&gt; must be less than &lt;code&gt;valid_til&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can also define a &lt;code&gt;SYSTEM_TIME&lt;/code&gt; period and ask the database to track changes for you:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id        INTEGER,
  sys_from  TIMESTAMP GENERATED ALWAYS AS ROW START,
  sys_til   TIMESTAMP GENERATED ALWAYS AS ROW END,
  PERIOD FOR system_time (valid_from, valid_til)
) WITH SYSTEM VERSIONING;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Technically the standard lets you use &lt;code&gt;DATE&lt;/code&gt; columns for system-time periods, but it’s hard to imagine how that would work in practice. Really anything short of the RDBMS’s finest granularity could “squeeze out” some history.&lt;/p&gt;

&lt;h2 id="primary_keys"&gt;Primary Keys&lt;/h2&gt;

&lt;p&gt;If you have a valid-time &lt;code&gt;PERIOD&lt;/code&gt; then you can declare a temporal primary key when you create the table:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  PERIOD FOR valid_at (valid_from, valid_til),
  CONSTRAINT tpk_t PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A temporal primary key is a lot like a normal primary key, except the scalar part (here just &lt;code&gt;id&lt;/code&gt;) &lt;em&gt;does not have to be unique&lt;/em&gt;, as long as rows with the same key don’t overlap in time. In other words you can give product 5 one price today and another tomorrow, and there’s no contradiction. But if you have two rows with the same scalar key covering the same date, that’s a violation of temporal entity integrity.&lt;/p&gt;

&lt;h2 id="foreign_keys"&gt;Foreign Keys&lt;/h2&gt;

&lt;p&gt;Temporal referential integrity is like ordinary referential integrity, except the non-unique nature of temporal primary keys makes it trickier. In a temporal foreign key, the child row’s lifespan must be completely “covered” by one (or more!) rows in the parent table. In other words some parent record must exist for every moment the child record exists. You can declare a temporal foreign key between two tables that both have &lt;code&gt;PERIOD&lt;/code&gt;s, e.g.:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE ch (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  t_id        INTEGER,
  PERIOD FOR valid_at (valid_from, valid_til),
  CONSTRAINT tpk_ch PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
  CONSTRAINT tfk_ch_to_t FOREIGN KEY (id, PERIOD valid_at)
    REFERENCES t (id, PERIOD valid_at)
);&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="projecting"&gt;Projecting&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;PERIOD&lt;/code&gt; is not included in the projection when you &lt;code&gt;SELECT * FROM t&lt;/code&gt;. It is questionable whether you can project it explicitly with &lt;code&gt;SELECT *, valid_at FROM t&lt;/code&gt;, but since it’s not a full-fledged data type I’d say probably not.&lt;/p&gt;

&lt;h2 id="filtering"&gt;Filtering&lt;/h2&gt;

&lt;p&gt;Also you can’t reference a &lt;code&gt;PERIOD&lt;/code&gt; in most other contexts, e.g. as a function input, or a &lt;code&gt;GROUP BY&lt;/code&gt; criterion, or when &lt;code&gt;ORDER&lt;/code&gt;ing, or joining. You &lt;em&gt;can&lt;/em&gt; use it in a “period predicate”, which lets you test these period relationships:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;overlaps&lt;/li&gt;

&lt;li&gt;equals&lt;/li&gt;

&lt;li&gt;contains&lt;/li&gt;

&lt;li&gt;precedes&lt;/li&gt;

&lt;li&gt;succeeds&lt;/li&gt;

&lt;li&gt;immediately precedes&lt;/li&gt;

&lt;li&gt;immediately succeeds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Either side of the relationship can use a previously-named &lt;code&gt;PERIOD&lt;/code&gt; or an anonymous dynamically-constructed one, e.g.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;x.valid_at OVERLAPS PERIOD(y.valid_from, y.valid_til)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It’s not clear to me where you can use a period predicate, although the standard groups it with other kinds of predicate under the &lt;code&gt;&amp;lt;predicate&amp;gt;&lt;/code&gt; object, so maybe anywhere you like? This &lt;a href="https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html"&gt;browsable BNF grammer&lt;/a&gt; makes it easy to see that a &lt;code&gt;&amp;lt;predicate&amp;gt;&lt;/code&gt; can go anywhere that accepts a &lt;a href="https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#_6_39_boolean_value_expression"&gt;boolean expression&lt;/a&gt;, which can be used in a &lt;code&gt;&amp;lt;search condition&amp;gt;&lt;/code&gt;, which is what you put into your &lt;code&gt;WHERE&lt;/code&gt; clause, or a join’s &lt;code&gt;ON&lt;/code&gt;, or a &lt;code&gt;CASE WHEN&lt;/code&gt;, or lots of other places. If you have a firmer read of the standard here, let me know!&lt;/p&gt;

&lt;p&gt;Also there is a special syntax for querying based on system-time. The standard doesn’t mention using it for valid-time, although you could imagine doing it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM t FOR SYSTEM_TIME AS OF t1
SELECT * FROM t FOR SYSTEM_TIME BETWEEN t1 AND t2
SELECT * FROM t FOR SYSTEM_TIME BETWEEN SYMMETRIC t1 AND t2
SELECT * FROM t FOR SYSTEM_TIME FROM t1 TO t2&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you ask for a limited time range, the stard/end columns do &lt;em&gt;not&lt;/em&gt; get truncated to match your request. In other words, if you query &lt;code&gt;FOR SYSTEM_TIME BETWEEN '2000-01-01' AND '2020-01-01'&lt;/code&gt;, your result records’ &lt;code&gt;sys_til&lt;/code&gt; attributes are still &lt;code&gt;3000-01-01&lt;/code&gt; (or whatever your sentinel is).&lt;/p&gt;

&lt;h2 id="dml"&gt;DML&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;UPDATE&lt;/code&gt; and &lt;code&gt;DELETE&lt;/code&gt; commands you can restrict the timespan you want changed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;UPDATE  t
FOR PORTION OF valid_at FROM t1 TO t2
SET     ...
...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;DELETE FROM t
FOR PORTION OF valid_at FROM t1 TO t2
...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;These commands may require special transformations if they “hit” only part of an existing record. For example if you delete the middle of a longer timespan, then you need to replace the old big record with your new version &lt;em&gt;plus&lt;/em&gt; two short records (one on each end). An update is the same: after changing the targeted portion, you’d have to insert new records to preserve each end of the original. The standard gives careful instructions here: the RDBMS should include these extra inserts within the “primary effect” of the operation.&lt;/p&gt;

&lt;p&gt;There is no need for any special syntax for &lt;code&gt;INSERT&lt;/code&gt;, nor for special transformations.&lt;/p&gt;

&lt;p&gt;The standard doesn’t have anything to say about a &lt;code&gt;MERGE&lt;/code&gt; statement (in Postgres &lt;code&gt;ON CONFLICT DO UPDATE&lt;/code&gt;), except in the case of system-time tables, where there is no new syntax and it does what you’d expect.&lt;/p&gt;

&lt;h2 id="questions"&gt;Questions&lt;/h2&gt;

&lt;p&gt;Since a &lt;code&gt;PERIOD&lt;/code&gt; is attached to a table and isn’t part of the relational model, it isn’t part of a result set. It gets lost when you query a table. That makes it hard to query non-table temporal data, like views, subqueries, CTEs, and set-returning functions. (This was &lt;a href="/posts/2017/12/temporal-databases-bibliography/"&gt;a major criticism of the original TSQL2 proposal&lt;/a&gt; from the 90s.) Nonetheless I can imagine how SQL:2011 leaves open some workarounds, e.g. by letting you use anonymous &lt;code&gt;PERIOD&lt;/code&gt;s inside period predicates, and letting you use period predicates as widely as possible. Also you could argue that projecting a &lt;code&gt;PERIOD&lt;/code&gt; is unnecessary since you already have the start and end columns. So &lt;em&gt;if&lt;/em&gt; an RDBMS gave you deep support for period predicates, composing temporal results would still be possible—albeit awkward. In practice though, no one does this, as we will see.&lt;/p&gt;

&lt;p&gt;SQL:2011 also has no support for joining temporal results. You can effect an inner join with the &lt;code&gt;OVERLAPS&lt;/code&gt; operator, but not the other kinds.&lt;/p&gt;

&lt;p&gt;Snodgrass suggested that temporal databases should “coalesce” results before presenting them or at least before saving them to a table. Coalescing means that when two rows have adjacent or overlapping timespans and all other attributs are identical, they get merged to become just one row. Duplicates are removed. This seems like good behavior, both for clarity and to avoid cutting up your data more and more finely as time goes on, but SQL:2011 doesn’t mention it.&lt;/p&gt;

&lt;p&gt;There is also no explicit mention of how triggers combine with the new temporal DML operations.&lt;/p&gt;

&lt;h1 id="mariadb"&gt;MariaDB&lt;/h1&gt;

&lt;p&gt;MySQL doesn’t support any temporal features, but recent versions of MariaDB &lt;a href="https://mariadb.com/kb/en/library/temporal-data-tables/"&gt;have started to add support&lt;/a&gt;. Version 10.3.4 (released Jan 2018) included system-time support; Version 10.4.3 (Feb 2019), valid-time.&lt;/p&gt;

&lt;h2 id="system_time"&gt;System Time&lt;/h2&gt;

&lt;p&gt;MariaDB supports the normal syntax for declaring system-time tables, but you can also use this abbreviated syntax if you like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id INT
) WITH SYSTEM VERSIONING;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That will automatically add pseudo-columns named &lt;code&gt;ROW_START&lt;/code&gt; and &lt;code&gt;ROW_END&lt;/code&gt; (which also don’t appear in &lt;code&gt;SELECT * FROM t&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Or the standard syntax works too:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id        INTEGER,
  sys_from  TIMESTAMP(6) GENERATED ALWAYS AS ROW START,
  sys_til   TIMESTAMP(6) GENERATED ALWAYS AS ROW END,
  PERIOD FOR SYSTEM_TIME (valid_from, valid_til)
) WITH SYSTEM VERSIONING;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Either way, for a &lt;code&gt;timestamp(6)&lt;/code&gt; column (which is what the docs use) it looks like the max future date is 2038:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;MariaDB [temporal]&amp;gt; insert into t (id) values (2);
Query OK, 1 row affected (0.008 sec)

MariaDB [temporal]&amp;gt; select * from t2;
+------+----------------------------+----------------------------+
| id   | valid_from                 | valid_til                  |
+------+----------------------------+----------------------------+
|    2 | 2019-07-27 17:07:51.849190 | 2038-01-18 19:14:07.999999 |
+------+----------------------------+----------------------------+
1 row in set (0.004 sec)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That seems awfully soon to me.&lt;/p&gt;

&lt;p&gt;You can use these three ways of asking for system-time filters:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-01';
SELECT * FROM t FOR SYSTEM_TIME FROM '2020-01-01' TO '2030-01-01';
SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-01' AND '2030-01-01';&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;MariaDB doesn’t know about &lt;code&gt;BETWEEN SYMMETRIC&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can also say &lt;code&gt;FOR SYSTEM_TIME ALL&lt;/code&gt;, which is useful because the default (with no &lt;code&gt;FOR SYSTEM_TIME&lt;/code&gt; at all) is to filter &lt;code&gt;AS OF NOW()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;MariaDB partially addresses the composability problem by letting you say &lt;code&gt;FOR SYSTEM_TIME&lt;/code&gt; against a view, which “pushes down” the filter to the underlying tables. This even works if the view queries some non-system-time tables. Since every system-time &lt;code&gt;PERIOD&lt;/code&gt; is named the same thing, the database can sensibly interpret &lt;code&gt;FOR SYSTEM_TIME&lt;/code&gt; against your view.&lt;/p&gt;

&lt;h3 id="systemtime_partitions"&gt;System-Time Partitions&lt;/h3&gt;

&lt;p&gt;To prevent tables getting too large, you can automatically partition a table by its &lt;code&gt;SYSTEM_TIME&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id INT
) WITH SYSTEM VERSIONING
  PARTITION BY SYSTEM_TIME (
    PARTITION p_hist HISTORY,
    PARTITION p_curr CURRENT
  );&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That will keep current records in one partition and historical records in another. You can also have multiple historical partitions and ask the system to switch to the next one every &lt;code&gt;n&lt;/code&gt; rows. You can also drop older partitions to keep your data growth under control.&lt;/p&gt;

&lt;h3 id="excluded_columns"&gt;Excluded Columns&lt;/h3&gt;

&lt;p&gt;To further economize on disk, you can qualify specific columns as &lt;code&gt;WITHOUT SYSTEM VERSIONING&lt;/code&gt; to exclude them from history.&lt;/p&gt;

&lt;h2 id="application_time"&gt;Application Time&lt;/h2&gt;

&lt;p&gt;Declaring an application-time &lt;code&gt;PERIOD&lt;/code&gt; works, but you can’t include a temporal &lt;code&gt;PRIMARY KEY&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  PERIOD FOR valid_at (valid_from, valid_til),
  -- This next line breaks!:
  CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Naturally you can’t create temporal foreign keys either.&lt;/p&gt;

&lt;p&gt;If you omit the &lt;code&gt;NOT NULL&lt;/code&gt; for the &lt;code&gt;PERIOD&lt;/code&gt; source columns (as above), they become &lt;code&gt;NOT NULL&lt;/code&gt; automatically.&lt;/p&gt;

&lt;h3 id="dml_2"&gt;DML&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;UPDATE&lt;/code&gt; and &lt;code&gt;DELETE&lt;/code&gt; statements you can use &lt;code&gt;FOR PORTION OF valid_at&lt;/code&gt;, per the standard. You can’t use an anonymous period:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;UPDATE  t
FOR PORTION OF PERIOD (valid_from, valid_til)
SET     ...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(It’s hard to imagine why you’d want to though.)&lt;/p&gt;

&lt;h3 id="projecting_2"&gt;Projecting&lt;/h3&gt;

&lt;p&gt;You can’t &lt;code&gt;SELECT&lt;/code&gt; a period, named or anonymous:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM t;
SELECT *, valid_at FROM t;
SELECT *, PERIOD (valid_from, valid_til) FROM t;&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id="filtering_2"&gt;Filtering&lt;/h3&gt;

&lt;p&gt;In a &lt;code&gt;SELECT&lt;/code&gt; you can’t use &lt;code&gt;FOR valid_at&lt;/code&gt; to filter things. That’s a little sad but perhaps understandable since arguably the standard only requires &lt;code&gt;FOR SYSTEM_TIME&lt;/code&gt;. But period predicates don’t work either. These were all errors for me:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM t WHERE valid_at CONTAINS '2020-01-01';
SELECT * FROM t WHERE valid_at OVERLAPS PERIOD('2020-01-01', '2030-01-01');&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So if you want to ask questions about your valid-time history, you need to query against the scalar date columns.&lt;/p&gt;

&lt;h3 id="triggers"&gt;Triggers&lt;/h3&gt;

&lt;p&gt;You can declare triggers on valid-time tables, and the triggers &lt;em&gt;do&lt;/em&gt; fire for the extra inserts. Here is what I did to test things:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE thist (
  id INTEGER,
  old_valid_from DATE,
  old_valid_til DATE,
  new_valid_from DATE,
  new_valid_til DATE, op CHAR(1)
);

CREATE TRIGGER tins AFTER INSERT ON t
FOR EACH ROW
INSERT INTO thist VALUES 
(NEW.id, NULL, NULL, NEW.valid_from, NEW.valid_til, 'i');

CREATE TRIGGER tupd AFTER UPDATE ON t
FOR EACH ROW
INSERT INTO thist VALUES
(NEW.id, OLD.valid_from, OLD.valid_til, NEW.valid_from, NEW.valid_til, 'u');

CREATE TRIGGER tdel AFTER DELETE ON t
FOR EACH ROW
INSERT INTO thist VALUES
(OLD.id, OLD.valid_from, OLD.valid_til, NULL, NULL, 'd');&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you &lt;code&gt;UPDATE&lt;/code&gt; in the middle of a larger record, you get two &lt;code&gt;INSERT&lt;/code&gt;s for the unaltered ends followed by an &lt;code&gt;UPDATE&lt;/code&gt; of the middle. (The &lt;code&gt;INSERT&lt;/code&gt;s come first.) The &lt;code&gt;NEW.valid_from&lt;/code&gt; and &lt;code&gt;NEW.valid_til&lt;/code&gt; mark the part that is being inserted/updated, as you’d expect.&lt;/p&gt;

&lt;p&gt;If you &lt;code&gt;DELETE&lt;/code&gt; in the middle of a larger record, you also get two &lt;code&gt;INSERTS&lt;/code&gt; followed by a &lt;code&gt;DELETE&lt;/code&gt; of the part you touched. In the delete trigger the &lt;code&gt;OLD.valid_{from,til}&lt;/code&gt; columns have their actual old values, not the slice you’re deleting. (This probably makes sense, but it feels a little too mechanical/literal. It means your &lt;code&gt;DELETE&lt;/code&gt; trigger doesn’t know what slice of history you’re actually removing.)&lt;/p&gt;

&lt;h2 id="bitemporal"&gt;Bitemporal&lt;/h2&gt;

&lt;p&gt;You can also define bitemporal tables!&lt;/p&gt;

&lt;h1 id="ibm_db2"&gt;IBM DB2&lt;/h1&gt;

&lt;p&gt;DB2 has the fullest temporal support of all the databases I examined. My tests used version 11.5.0.0 on Linux.&lt;/p&gt;

&lt;h2 id="system_time_2"&gt;System Time&lt;/h2&gt;

&lt;p&gt;System-time works with a few syntax differences:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id        INTEGER NOT NULL PRIMARY KEY,
  sys_from  TIMESTAMP(12) NOT NULL GENERATED ALWAYS AS ROW BEGIN,
  sys_til   TIMESTAMP(12) NOT NULL GENERATED ALWAYS AS ROW END,
  PERIOD SYSTEM_TIME (sys_from, sys_til)
);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You have to omit &lt;code&gt;WITH SYSTEM VERSIONING&lt;/code&gt;, and you have to explicitly make the period source columns &lt;code&gt;NOT NULL&lt;/code&gt;. Also you say &lt;code&gt;GENERATED ALWAYS AS ROW BEGIN&lt;/code&gt; not &lt;code&gt;GENERATED ALWAYS AS ROW START&lt;/code&gt;. Finally it is &lt;code&gt;PERIOD SYSTEM_TIME&lt;/code&gt; not &lt;code&gt;PERIOD FOR SYSTEM_TIME&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The sentinel for “forever” is &lt;code&gt;9999-12-30-00.00.00.000000000000&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id="application_time_2"&gt;Application Time&lt;/h2&gt;

&lt;p&gt;DB2 supports many valid-time features—but only if you name the period &lt;code&gt;BUSINESS_TIME&lt;/code&gt;. At IBM, it’s always business time! (I am shamelessly stealing this joke from my audience at PGCon 2019.)&lt;/p&gt;

&lt;p&gt;Valid-time periods have the same syntax quirks as system-time.&lt;/p&gt;

&lt;p&gt;You can define temporal primary keys!&lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/intro/src/tpc/db2z_integrity.html"&gt;the docs&lt;/a&gt; you can define temporal foreign keys, but I couldn’t make it work:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db2 =&amp;gt; create table t2 (id integer not null, valid_from date not null, valid_til date not null, \
db2 (cont.) =&amp;gt; t_id integer, period business_time (valid_from, valid_til), \
db2 (cont.) =&amp;gt; constraint t2pk primary key (id, business_time without overlaps), \
db2 (cont.) =&amp;gt; constraint tfk foreign key (t_id, period business_time) \
db2 (cont.) =&amp;gt; references t (id, period business_time));
DB21034E  The command was processed as an SQL statement because it was not a 
valid Command Line Processor command.  During SQL processing it returned:
SQL0104N  An unexpected token "business_time" was found following "gn key 
(t_id, period".  Expected tokens may include:  "&amp;lt;space&amp;gt;".  SQLSTATE=42601&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Someome else can’t make it work either, according to &lt;a href="https://www.ibm.com/developerworks/community/forums/html/topic?id=440e07ad-23ee-4b0a-ae23-8c747abca819"&gt;this forum thread&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ALTER TABLE&lt;/code&gt; failed for me too:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db2 =&amp;gt; create table t2 (id integer not null, valid_from date not null, valid_til date not null, \
db2 (cont.) =&amp;gt; t_id integer, period business_time (valid_from, valid_til), \
db2 (cont.) =&amp;gt; constraint t2pk primary key (id, business_time without overlaps));
DB20000I  The SQL command completed successfully.
db2 =&amp;gt; alter table t2 add constraint tfk foreign key (t_id, period business_time) \
db2 (cont.) =&amp;gt; references t (id, period business_time);
DB21034E  The command was processed as an SQL statement because it was not a 
valid Command Line Processor command.  During SQL processing it returned:
SQL0104N  An unexpected token "business_time" was found following "gn key 
(t_id, period".  Expected tokens may include:  "&amp;lt;space&amp;gt;".  SQLSTATE=42601&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If I learn a way to make it work, I’ll update this article.&lt;/p&gt;

&lt;h3 id="projecting_3"&gt;Projecting&lt;/h3&gt;

&lt;p&gt;As usual &lt;code&gt;SELECT * FROM t&lt;/code&gt; does not give you the period, and &lt;code&gt;SELECT *, valid_at FROM t&lt;/code&gt; is an error. Periods are not first-class types.&lt;/p&gt;

&lt;h3 id="filtering_3"&gt;Filtering&lt;/h3&gt;

&lt;p&gt;DB2 nicely interprets the standard generously and lets you use the system-time &lt;code&gt;SELECT&lt;/code&gt; syntax for application-time too:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM t FOR business_time FROM t1 TO t2
SELECT * FROM t FOR business_time AS OF t1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;but not:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM t FOR business_time BETWEEN t1 AND t2&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I couldn’t get any of the period predicates to work, e.g.:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db2 =&amp;gt; select * from t where business_time contains '2015-01-01';
SQL0104N  An unexpected token "contains" was found following "where 
business_time".  Expected tokens may include:  "CONCAT".  SQLSTATE=42601&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I also couldn’t do anything creative with anonymous periods, e.g.:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db2 =&amp;gt; select * from t for period(valid_from, valid_til) as of '2015-01-01';
SQL0104N  An unexpected token "period" was found following "select * from t 
for".  Expected tokens may include:  "&amp;lt;space&amp;gt;".  SQLSTATE=42601&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;IBM doesn’t even care if you call it &lt;code&gt;business_time&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db2 =&amp;gt; select * from t for period business_time(valid_from, valid_til) as of '2015-01-01';
SQL0104N  An unexpected token "period business_time" was found following 
"select * from t for".  Expected tokens may include:  "&amp;lt;space&amp;gt;".  
SQLSTATE=42601&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That means temporal features are going to break down when used with views, subqueries, CTEs, and set-returning functions. A period is tied to a table, but not a result set.&lt;/p&gt;

&lt;h3 id="dml_3"&gt;DML&lt;/h3&gt;

&lt;p&gt;IBM DML is pretty standard. You can &lt;code&gt;UPDATE&lt;/code&gt; or &lt;code&gt;DELETE&lt;/code&gt; &lt;code&gt;FOR PORTION OF BUSINESS_TIME FROM '2010-06-01' TO '2010-06-15'&lt;/code&gt;. The extra &lt;code&gt;INSERT&lt;/code&gt;s happen as expected.&lt;/p&gt;

&lt;h3 id="triggers_2"&gt;Triggers&lt;/h3&gt;

&lt;p&gt;Like MariaDB, DB2 does call triggers for the derived &lt;code&gt;INSERT&lt;/code&gt;s. Here is some setup to add a row to &lt;code&gt;thist&lt;/code&gt; whenever a trigger gets called:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;create table thist (id integer, old_valid_from date, old_valid_til date, new_valid_from date, new_valid_til date, op char(1));

create trigger tins after insert on t referencing new as new \
for each row insert into thist values \
(NEW.id, null, null, NEW.valid_from, NEW.valid_til, 'i');

create trigger tupd after update on t referencing old as old new as new \
for each row insert into thist values \
(NEW.id, OLD.valid_from, OLD.valid_til, NEW.valid_from, NEW.valid_til, 'u');

create trigger tdel after delete on t referencing old as old \
for each row insert into thist values \
(OLD.id, OLD.valid_from, OLD.valid_til, null, null, 'd');&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If we &lt;code&gt;UPDATE FOR PORTION OF&lt;/code&gt; in the middle of a larger record, our &lt;code&gt;INSERT&lt;/code&gt; trigger is called twice:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;db2 =&amp;gt; update t \
db2 (cont.) =&amp;gt; for portion of business_time \
db2 (cont.) =&amp;gt; from '2015-01-01' to '2016-01-01' \
db2 (cont.) =&amp;gt; set foo = 'bar';
DB20000I  The SQL command completed successfully.
db2 =&amp;gt; select * from t;

ID          VALID_FROM VALID_TIL  FOO       &lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;pre&gt;&lt;code&gt;          1 01/01/2015 01/01/2016 bar       
          1 01/01/2020 01/01/2030 -         
          1 01/01/2010 01/01/2015 -         
          1 01/01/2016 01/01/2020 -         

  4 record(s) selected.

db2 =&amp;gt; select * from thist;

ID          OLD_VALID_FROM OLD_VALID_TIL NEW_VALID_FROM NEW_VALID_TIL OP&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;pre&gt;&lt;code&gt;          1 -              -             01/01/2010     01/01/2015    i 
          1 -              -             01/01/2016     01/01/2020    i 
          1 01/01/2010     01/01/2020    01/01/2015     01/01/2016    u 

  3 record(s) selected.&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="bitemporal_2"&gt;Bitemporal&lt;/h2&gt;

&lt;p&gt;Bitemporal works too!&lt;/p&gt;

&lt;h1 id="oracle"&gt;Oracle&lt;/h1&gt;

&lt;p&gt;For my tests I used Oracle 19c (version 19.3) for Linux and ran it on CentOS 7.&lt;/p&gt;

&lt;h2 id="system_time_3"&gt;System time&lt;/h2&gt;

&lt;p&gt;Oracle has its own way of tracking table history, so it doesn’t bother with SQL:2011 system-time.&lt;/p&gt;

&lt;h2 id="application_time_3"&gt;Application time&lt;/h2&gt;

&lt;p&gt;Oracle lets you declare a &lt;code&gt;PERIOD&lt;/code&gt;, but like MariaDb you can’t define a temporal primary key:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE t (
  id          INTEGER,
  valid_from  DATE,
  valid_til   DATE,
  PERIOD FOR valid_at (valid_from, valid_til),
  -- This next line breaks!:
  CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of course that means no foreign keys either.&lt;/p&gt;

&lt;p&gt;One interesting thing is that a &lt;code&gt;PERIOD&lt;/code&gt; doesn’t force your columns to &lt;code&gt;NOT NULL&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SQL&amp;gt; desc t;        
 Name                                      Null?    Type&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;pre&gt;&lt;code&gt; ID                                                 NUMBER(38)
 VALID_FROM                                         DATE
 VALID_TIL                                          DATE&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And that’s because nulls &lt;em&gt;are&lt;/em&gt; allowed in &lt;code&gt;PERIOD&lt;/code&gt;-source columns:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SQL&amp;gt; insert into t values (6, null, null);

1 row created.

SQL&amp;gt; select * from t where id = 6;

        ID VALID_FRO VALID_TIL&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;pre&gt;&lt;code&gt;         6&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id="projecting_4"&gt;Projecting&lt;/h3&gt;

&lt;p&gt;When you say &lt;code&gt;SELECT * FROM t&lt;/code&gt; you don’t get the period. You also can’t say this either, but in Oracle’s case it’s a parser error:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT *, valid_at FROM t;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This doesn’t work either:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT *, 1+1 FROM t;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But if you avoid the &lt;code&gt;*&lt;/code&gt; you can select it!:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SQL&amp;gt; SELECT id, valid_from, valid_til, valid_at FROM t;

        ID VALID_FRO VALID_TIL   VALID_AT&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;pre&gt;&lt;code&gt;         1 01-JAN-00 01-JAN-30      33426
         2 01-JAN-10 01-JAN-30      33426
         3 01-JAN-20 01-JAN-30      33426
         4 01-JAN-00 01-JAN-10      33426&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The result doesn’t mean much to me though. Anyone have any ideas?&lt;/p&gt;

&lt;h3 id="filtering_4"&gt;Filtering&lt;/h3&gt;

&lt;p&gt;Like in DB2 you &lt;em&gt;are&lt;/em&gt; able to filter by a valid-time period, although the syntax is a little non-standard (and wordy):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SQL&amp;gt; SELECT * FROM t
  2  AS OF PERIOD FOR valid_at DATE '2005-01-01';

        ID VALID_FRO VALID_TIL&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;pre&gt;&lt;code&gt;         1 01-JAN-00 01-JAN-30
         4 01-JAN-00 01-JAN-10
         6&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Incidentally, you can see here that &lt;code&gt;NULL&lt;/code&gt; in a period means “unbounded”. You can also make just one of the bounds &lt;code&gt;NULL&lt;/code&gt;, and &lt;code&gt;AS OF&lt;/code&gt; queries give the expected results. This is just like Postgres ranges! If Oracle does this for &lt;code&gt;PERIOD&lt;/code&gt;s, perhaps Postgres should too?&lt;/p&gt;

&lt;p&gt;You can use &lt;code&gt;BETWEEN&lt;/code&gt; too, but its syntax is similarly garbled:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SQL&amp;gt; SELECT * FROM t
  2  VERSIONS PERIOD FOR valid_at
  3  BETWEEN DATE '2025-01-01' AND DATE '2035-01-01';

        ID VALID_FRO VALID_TIL&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;pre&gt;&lt;code&gt;         2 01-JAN-10 01-JAN-30
         1 01-JAN-00 01-JAN-30
         3 01-JAN-20 01-JAN-30
         6&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Anonymous periods don’t seem to work though:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SQL&amp;gt; SELECT * FROM t
  2  AS OF PERIOD FOR (valid_from, valid_til) DATE '2005-01-01';
AS OF PERIOD FOR (valid_from, valid_til) DATE '2005-01-01'
                 *
ERROR at line 2:
ORA-00904: : invalid identifier&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You also can’t use standard period predicates:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SQL&amp;gt; SELECT * FROM t WHERE valid_at CONTAINS DATE '2015-01-01';
SELECT * FROM t WHERE valid_at CONTAINS DATE '2015-01-01'
                               *
ERROR at line 1:
ORA-00920: invalid relational operator


SQL&amp;gt; SELECT * FROM t WHERE valid_at OVERLAPS PERIOD('2015-01-01', '2020-01-01');
SELECT * FROM t WHERE valid_at OVERLAPS PERIOD('2015-01-01', '2020-01-01')
                               *
ERROR at line 1:
ORA-00920: invalid relational operator&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id="dml_4"&gt;DML&lt;/h3&gt;

&lt;p&gt;Oracle doesn’t understand &lt;code&gt;FOR PORTION OF&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SQL&amp;gt; UPDATE t FOR PORTION OF valid_at
  2  FROM DATE '2005-01-01' TO DATE '2006-01-01'
  3  SET id = 8 WHERE id = 1;
UPDATE t FOR PORTION OF valid_at
             *
ERROR at line 1:
ORA-00905: missing keyword


SQL&amp;gt; DELETE FROM t FOR PORTION OF valid_at
  2  FROM DATE '2005-01-01' TO DATE '2006-01-01'
  3  WHERE id = 1;
DELETE FROM t FOR PORTION OF valid_at
              *
ERROR at line 1:
ORA-00933: SQL command not properly ended&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id="triggers_3"&gt;Triggers&lt;/h3&gt;

&lt;p&gt;In Oracle you can define triggers on tables with a valid-time period, but without temporal DML there are no interesting questions about how they should behave. Nonetheless here are the same triggers as above but in Oracle syntax (in case I ever want to test this in the future):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE thist (
  id INTEGER,
  old_valid_from DATE,
  old_valid_til DATE,
  new_valid_from DATE,
  new_valid_til DATE, op CHAR(1)
);

CREATE TRIGGER tins AFTER INSERT ON t
FOR EACH ROW
BEGIN
INSERT INTO thist VALUES 
(:NEW.id, NULL, NULL, :NEW.valid_from, :NEW.valid_til, 'i');
END;
/

CREATE TRIGGER tupd AFTER UPDATE ON t
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
BEGIN
INSERT INTO thist VALUES
(:NEW.id, :OLD.valid_from, :OLD.valid_til, :NEW.valid_from, :NEW.valid_til, 'u');
END;
/

CREATE TRIGGER tdel AFTER DELETE ON t
REFERENCING OLD AS OLD
FOR EACH ROW
BEGIN
INSERT INTO thist VALUES
(:OLD.id, :OLD.valid_from, :OLD.valid_til, NULL, NULL, 'd');
END;
/&lt;/code&gt;&lt;/pre&gt;

&lt;h1 id="ms_sql_server"&gt;MS SQL Server&lt;/h1&gt;

&lt;p&gt;I tested an evaluation copy of MS SQL Server 2017 (version &lt;code&gt;14.0.1000.169, RTM&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;SQL Server doesn’t support application-time periods at all, just system-time.&lt;/p&gt;

&lt;h2 id="system_time_4"&gt;System Time&lt;/h2&gt;

&lt;p&gt;The syntax for system-time tables is just a little non-standard:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE dbo.t (
  id INTEGER PRIMARY KEY,
  valid_from datetime2 GENERATED ALWAYS AS ROW START,
  valid_til datetime2 GENERATED ALWAYS AS ROW END,
  PERIOD FOR SYSTEM_TIME (valid_from, valid_til)
) WITH (
  SYSTEM_VERSIONING = ON
);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note the parens, the underscore, and the &lt;code&gt;= ON&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The history is stored in a separate invisible table with a generated name. But you can query that table like any other, so if you want to give it a nicer name you can:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;WITH (
  SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.thist)
);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;valid_til&lt;/code&gt; sentinel will be &lt;code&gt;9999-12-31 23:59:59.9999999&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To ask about a certain time you can say any of these:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-01';
SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-01' AND '2030-01-01';
SELECT * FROM t FOR SYSTEM_TIME FROM '2020-01-01' TO '2030-01-01';&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;but not &lt;code&gt;BETWEEN SYMMETRIC&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id="conclusion"&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;So basically everyone has at least one kind of &lt;code&gt;PERIOD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Everyone but Oracle has system-time (and they have another &lt;a href="https://www.oracle.com/database/technologies/high-availability/flashback.html"&gt;older approach&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The only database with temporal primary keys is DB2. They claim to have temporal foreign keys too, but I couldn’t make it work.&lt;/p&gt;

&lt;p&gt;I was pleased that two databases let you select with &lt;code&gt;FOR&lt;/code&gt; and a valid-time period. No one lets you build anonymous periods (in &lt;code&gt;FOR&lt;/code&gt;, &lt;code&gt;FOR PORTION OF&lt;/code&gt;, or elsewhere), and no one supports period predicates.&lt;/p&gt;

&lt;p&gt;With temporal DML, the extra inserts seem to be consistent (between MariaDB and DB2), and both databases fire triggers on them the same way.&lt;/p&gt;

&lt;p&gt;I hope this helps the Postgres community work out their own temporal behavior with respect to the standard. I think it was an interesting study in its own right, too. One thing I learned is that “every other RDBMS supports SQL:2011” is only sort of true, at least as of today. :-)&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2019-01-09:/posts/2019/01/drawing-redux-form-fieldarrays-with-pug/</id>
    <title type="html">Drawing Redux Form FieldArrays with Pug</title>
    <published>2019-01-09T00:00:00Z</published>
    <updated>2019-01-09T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2019/01/drawing-redux-form-fieldarrays-with-pug/" type="text/html"/>
    <content type="html">
&lt;p&gt;Having been spoiled by the Rails &lt;a href="https://words.steveklabnik.com/rails-has-two-default-stacks"&gt;Prime Stack&lt;/a&gt; for nearly a decade, in my React projects I prefer using &lt;a href="https://pugjs.org/api/getting-started.html"&gt;Pug&lt;/a&gt; (née Jade) instead of JSX for its Haml-like syntax. A lot of people praise Pug and Haml for saving them typing, and while that’s nice, the real appeal to me is how easy they are to read. You spend a lot more time reading code than writing it, and Pug/Haml make the document structure immediately obvious. Making closing-tag errors obsolete is pretty nice, too.&lt;/p&gt;

&lt;p&gt;With the &lt;a href="https://github.com/pugjs/babel-plugin-transform-react-pug"&gt;&lt;code&gt;babel-plugin-transform-react-pug&lt;/code&gt; package&lt;/a&gt;, you can replace your JSX with something like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;&lt;span class="reserved"&gt;class&lt;/span&gt; MyApp &lt;span class="reserved"&gt;extends&lt;/span&gt; React.Component {
  render() {
    &lt;span class="keyword"&gt;return&lt;/span&gt; pug&lt;span class="error"&gt;`&lt;/span&gt;
      Provider(store=configureStore(&lt;span class="local-variable"&gt;this&lt;/span&gt;.props))
        table
          tbody
            tr
              td table
              td layout
              td anyone?
    &lt;span class="error"&gt;`&lt;/span&gt;;
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But Pug is definitely not as widely adopted within React as Haml is within Rails, and it shows. I ran into a tricky issue using Pug to render &lt;code&gt;FieldArray&lt;/code&gt;s with &lt;a href="https://github.com/erikras/redux-form"&gt;Redux Form&lt;/a&gt; and &lt;a href="https://github.com/react-bootstrap/react-bootstrap"&gt;React Bootstrap&lt;/a&gt;. To combine those two packages I’m basically following &lt;a href="https://medium.com/@jtbennett/using-redux-form-to-handle-user-input-1392826f2c6d"&gt;John Bennett’s advice&lt;/a&gt;, except with Pug.&lt;/p&gt;

&lt;p&gt;Here is how typical scalar fields work:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const renderInput = ({input, label, type, meta}) =&amp;gt; pug&lt;span class="error"&gt;`&lt;/span&gt;
  FormGroup(controlId=input.name validationState=&lt;span class="predefined"&gt;$&lt;/span&gt;{validState(meta)})
    Col(componentClass=&lt;span class="predefined"&gt;$&lt;/span&gt;{ControlLabel} sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
      = label || humanizeString(input.name)
    Col(sm=&lt;span class="integer"&gt;5&lt;/span&gt;)
      FormControl(...input type=type)
&lt;span class="error"&gt;`&lt;/span&gt;

&lt;span class="reserved"&gt;class&lt;/span&gt; EditClientPage &lt;span class="reserved"&gt;extends&lt;/span&gt; React.Component {
  render() {
    &lt;span class="keyword"&gt;return&lt;/span&gt; pug&lt;span class="error"&gt;`&lt;/span&gt;
      Form(horizontal onSubmit=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="local-variable"&gt;this&lt;/span&gt;.props.handleSubmit})
        fieldset
          Field(name=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;name&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; component=renderInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;text&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
          Field(name=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;paymentDue&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; component=renderInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;number&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
    &lt;span class="error"&gt;`&lt;/span&gt;
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That’s from my time-tracking app where I wrote about using &lt;a href="/posts/2019/01/validating-fieldarrays-in-redux-form/"&gt;&lt;code&gt;FieldArray&lt;/code&gt; with &lt;code&gt;redux-form-validators&lt;/code&gt;&lt;/a&gt;. The &lt;code&gt;Field&lt;/code&gt; component is from Redux Form, and everything else is from React-Bootstrap (&lt;code&gt;Form&lt;/code&gt;, &lt;code&gt;FormGroup&lt;/code&gt;, &lt;code&gt;Col&lt;/code&gt;, &lt;code&gt;ControlLabel&lt;/code&gt;, and &lt;code&gt;FormControl&lt;/code&gt;). You can see that &lt;code&gt;Field&lt;/code&gt; expects a custom component to draw its details, giving you total flexibility how you structure your form’s DOM. Most of the Boostrap components go inside the custom component used by &lt;code&gt;Field&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So far that’s pretty nice, but if you have a &lt;code&gt;FieldArray&lt;/code&gt; you need more nesting. A &lt;code&gt;FieldArray&lt;/code&gt; is also part of Redux Form, and is used to draw a list of child records with their sub-fields. In my case I want the page to have one or more “work categories”, each with a name and hourly rate, e.g. “Design” and “Development”.&lt;/p&gt;

&lt;p&gt;Like &lt;code&gt;Field&lt;/code&gt;, a &lt;code&gt;FieldArray&lt;/code&gt; delegates rendering to a custom component. Then that component will render the individual &lt;code&gt;Field&lt;/code&gt;s (each with their own custom component in turn). If you adapted the Redux Form docs’ example, you might try something like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const renderSimpleInput = ({input, placeholder, type, meta}) =&amp;gt; pug&lt;span class="error"&gt;`&lt;/span&gt;
  span(&lt;span class="reserved"&gt;class&lt;/span&gt;=&lt;span class="predefined"&gt;$&lt;/span&gt;{validClass(meta)})
    FormControl(...input placeholder=placeholder type=type)
&lt;span class="error"&gt;`&lt;/span&gt;

const renderWorkCategories = ({fields, meta}) =&amp;gt; &lt;span class="error"&gt;`&lt;/span&gt;pug
  .noop
    = fields.map((wc, index) =&amp;gt; &lt;span class="error"&gt;`&lt;/span&gt;pug
      FormGroup(key=&lt;span class="predefined"&gt;$&lt;/span&gt;{index})
        Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
          Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;&lt;span class="predefined"&gt;$&lt;/span&gt;{wc}.name&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;text&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;name&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
        Col(sm=&lt;span class="integer"&gt;5&lt;/span&gt;)
          Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;&lt;span class="predefined"&gt;$&lt;/span&gt;{wc}.rate&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;number&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;rate&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
        Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
          a.btn.btn-&lt;span class="keyword"&gt;default&lt;/span&gt;(onClick=&lt;span class="predefined"&gt;$&lt;/span&gt;{()=&amp;gt;fields.remove(index)}) remove
    &lt;span class="error"&gt;`&lt;/span&gt;)
    FormGroup
      Col(smOffset=&lt;span class="integer"&gt;2&lt;/span&gt; sm=&lt;span class="integer"&gt;5&lt;/span&gt;)
        a.btn.btn-&lt;span class="keyword"&gt;default&lt;/span&gt;(onClick=&lt;span class="predefined"&gt;$&lt;/span&gt;{()=&amp;gt;fields.push({})}) add rate
&lt;span class="error"&gt;`&lt;/span&gt;

&lt;span class="reserved"&gt;class&lt;/span&gt; EditClientPage &lt;span class="reserved"&gt;extends&lt;/span&gt; React.Component {
  render() {
    &lt;span class="keyword"&gt;return&lt;/span&gt; pug&lt;span class="error"&gt;`&lt;/span&gt;
      ...
      FieldArray(name=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;workCategories&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; component=&lt;span class="predefined"&gt;$&lt;/span&gt;{renderWorkCategories})
      ...
    &lt;span class="error"&gt;`&lt;/span&gt;
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The problem is that you can’t nest pug strings like that. I’m not sure if the problem is with the Babel transformer or the pug parser itself, but you get an error. Of course that’s not idiomatic Pug anyway, but surprisingly, you can’t use Pug’s &lt;code&gt;each&lt;/code&gt; command either:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const renderWorkCategories = ({fields, meta}) =&amp;gt; &lt;span class="error"&gt;`&lt;/span&gt;pug
  .noop
    each wc, index &lt;span class="keyword"&gt;in&lt;/span&gt; fields
      FormGroup(key=&lt;span class="predefined"&gt;$&lt;/span&gt;{index})
        Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
          Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;&lt;span class="predefined"&gt;$&lt;/span&gt;{wc}.name&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;text&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;name&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
        Col(sm=&lt;span class="integer"&gt;5&lt;/span&gt;)
          Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;&lt;span class="predefined"&gt;$&lt;/span&gt;{wc}.rate&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;number&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;rate&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
        Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
          a.btn.btn-&lt;span class="keyword"&gt;default&lt;/span&gt;(onClick=&lt;span class="predefined"&gt;$&lt;/span&gt;{()=&amp;gt;fields.remove(index)}) remove
    ...
&lt;span class="error"&gt;`&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This gives you the error &lt;code&gt;Expected "fields" to be an array because it was passed to each&lt;/code&gt;. Apparently Redux Form is not using a normal array here, but its own special object.&lt;/p&gt;

&lt;p&gt;The trick is to call &lt;code&gt;getAll&lt;/code&gt;, like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;each wc, index &lt;span class="keyword"&gt;in&lt;/span&gt; fields.getAll()
  FormGroup(key=&lt;span class="predefined"&gt;$&lt;/span&gt;{index})
    Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
      Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;workCategories[&lt;span class="predefined"&gt;$&lt;/span&gt;{index}].name&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;text&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;name&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
    Col(sm=&lt;span class="integer"&gt;5&lt;/span&gt;)
      Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;workCategories[&lt;span class="predefined"&gt;$&lt;/span&gt;{index}].rate&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;number&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;rate&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
    Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
      a.btn.btn-&lt;span class="keyword"&gt;default&lt;/span&gt;(onClick=&lt;span class="predefined"&gt;$&lt;/span&gt;{()=&amp;gt;fields.remove(index)}) remove
&lt;span class="error"&gt;`&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Note that we also had to stop using &lt;code&gt;${wc}&lt;/code&gt; and are building the field name “by hand”. Personally I think we can stop here and be done, but if that feels like breaking encapsulation to you, or if you want something more generic that doesn’t need to “know” its &lt;code&gt;FieldArray&lt;/code&gt; name, there is another way to do it. Even if it’s a bit too much for a real project, it’s interesting enough that it’s maybe worth seeing.&lt;/p&gt;

&lt;p&gt;To start, we need to call &lt;code&gt;fields.map&lt;/code&gt; with another custom component. This almost works:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const renderWorkCategory = (wc, index) =&amp;gt; &lt;span class="error"&gt;`&lt;/span&gt;pug
  FormGroup(key=&lt;span class="predefined"&gt;$&lt;/span&gt;{index})
    Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
      Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;&lt;span class="predefined"&gt;$&lt;/span&gt;{wc}.name&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;text&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;name&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
    Col(sm=&lt;span class="integer"&gt;5&lt;/span&gt;)
      Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;&lt;span class="predefined"&gt;$&lt;/span&gt;{wc}.rate&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;number&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;rate&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
    Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
      a.btn.btn-&lt;span class="keyword"&gt;default&lt;/span&gt;(onClick=&lt;span class="predefined"&gt;$&lt;/span&gt;{()=&amp;gt;fields.remove(index)}) remove
&lt;span class="error"&gt;`&lt;/span&gt;

const renderWorkCategories = ({fields, meta}) =&amp;gt; &lt;span class="error"&gt;`&lt;/span&gt;pug
  .noop
    = fields.map(renderWorkCategory)
    ...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The only problem is the remove button: &lt;code&gt;fields&lt;/code&gt; is no longer in scope!&lt;/p&gt;

&lt;p&gt;The solution is to use currying. The component we hand to &lt;code&gt;fields.map&lt;/code&gt; will be a partially-applied function, generated by passing in &lt;code&gt;fields&lt;/code&gt; early. ES6 syntax makes it really easy. The full code looks like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const renderWorkCategory = (fields) =&amp;gt; (wc, index) =&amp;gt; &lt;span class="error"&gt;`&lt;/span&gt;pug
  FormGroup(key=&lt;span class="predefined"&gt;$&lt;/span&gt;{index})
    Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
      Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;&lt;span class="predefined"&gt;$&lt;/span&gt;{wc}.name&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;text&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;name&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
    Col(sm=&lt;span class="integer"&gt;5&lt;/span&gt;)
      Field(name=&lt;span class="predefined"&gt;$&lt;/span&gt;{&lt;span class="error"&gt;`&lt;/span&gt;&lt;span class="predefined"&gt;$&lt;/span&gt;{wc}.rate&lt;span class="error"&gt;`&lt;/span&gt;} component=renderSimpleInput type=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;number&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt; placeholder=&lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;rate&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;)
    Col(sm=&lt;span class="integer"&gt;2&lt;/span&gt;)
      a.btn.btn-&lt;span class="keyword"&gt;default&lt;/span&gt;(onClick=&lt;span class="predefined"&gt;$&lt;/span&gt;{()=&amp;gt;fields.remove(index)}) remove
&lt;span class="error"&gt;`&lt;/span&gt;

const renderWorkCategories = ({fields, meta}) =&amp;gt; &lt;span class="error"&gt;`&lt;/span&gt;pug
  .noop
    = fields.map(renderWorkCategory(fields))
    ...
&lt;span class="error"&gt;`&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You may recall we also used currying to &lt;a href="...."&gt;combine &lt;code&gt;redux-form-validators&lt;/code&gt; with &lt;code&gt;FieldArray&lt;/code&gt;&lt;/a&gt;. It can really come in handy!&lt;/p&gt;

&lt;p&gt;As I’ve said before, programming one thing is usually easy; it gets hard &lt;a href="https://illuminatedcomputing.com/posts/2017/05/doing-many-things/"&gt;when we try to do many things at once&lt;/a&gt;. Here I show how to use Pug with a &lt;code&gt;FieldArray&lt;/code&gt; from Redux Form, on a page styled with React Bootstrap. I hope you found it useful if like me you are trying to have your cake and eat it too. :-)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; It turns out I was making things too complicated: &lt;code&gt;fields.map&lt;/code&gt; takes an optional third parameter, &lt;code&gt;fields&lt;/code&gt;. That means there is no need to curry &lt;code&gt;renderWorkCategory&lt;/code&gt; and pass in &lt;code&gt;fields&lt;/code&gt; early. Instead of this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const renderWorkCategory = (fields) =&amp;gt; (wc, index) =&amp;gt; ...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;you can just say this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const renderWorkCategory = (wc, index, fields) =&amp;gt; ...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I guess it pays to read the documentation! :-)&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>tag:illuminatedcomputing.com,2019-01-08:/posts/2019/01/validating-fieldarrays-in-redux-form/</id>
    <title type="html">Validating FieldArrays in Redux Form</title>
    <published>2019-01-08T00:00:00Z</published>
    <updated>2019-01-08T00:00:00Z</updated>
    <link rel="alternate" href="https://illuminatedcomputing.com/posts/2019/01/validating-fieldarrays-in-redux-form/" type="text/html"/>
    <content type="html">
&lt;p&gt;I have a &lt;a href="https://github.com/erikras/redux-form"&gt;Redux Form&lt;/a&gt; project where I’m using &lt;a href="https://github.com/gtournie/redux-form-validators"&gt;&lt;code&gt;redux-form-validators&lt;/code&gt;&lt;/a&gt; to get easy Rails-style validations. Its docs explain how to define validators for normal scalar fields, but I thought I’d add how to do the same for a &lt;code&gt;FieldArray&lt;/code&gt;, which is a list of zero of more sub-fields.&lt;/p&gt;

&lt;p&gt;In my case I’m working on a time-tracking and invoicing application, where admins can edit a client. The form has regular fields for the client’s name, the invoicing frequency, etc., and then it also has “work categories”. Each client has one or more work categories, and a work category is just a name and an hourly rate. For instance you might charge one rate for design and another for development, or you might track retainer hours at $0/hour and extra hours at your normal rate.&lt;/p&gt;

&lt;p&gt;Redux Form makes it really easy to include child records right inside the main form using &lt;code&gt;FieldArray&lt;/code&gt;. Their docs give a nice &lt;a href="https://redux-form.com/7.4.2/examples/fieldarrays/"&gt;example of how to validate those nested fields&lt;/a&gt;, but it’s pretty DIY and verbose.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;redux-form-validators&lt;/code&gt; on the other hand, it’s easy. First you define a validation object with the rules for each field, like so:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const validations = {
  &lt;span class="key"&gt;name&lt;/span&gt;: [required()],
  &lt;span class="key"&gt;paymentDue&lt;/span&gt;: [required(), numericality({&lt;span class="reserved"&gt;int&lt;/span&gt;: &lt;span class="predefined-constant"&gt;true&lt;/span&gt;, &lt;span class="key"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;: &lt;span class="integer"&gt;0&lt;/span&gt;})],
  &lt;span class="comment"&gt;// ...&lt;/span&gt;
};&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then you write a little validation function to pass to Redux Form:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const &lt;span class="function"&gt;validate&lt;/span&gt; = &lt;span class="keyword"&gt;function&lt;/span&gt;(values) =&amp;gt; {
  const errors = {}
  &lt;span class="keyword"&gt;for&lt;/span&gt; (let field &lt;span class="keyword"&gt;in&lt;/span&gt; validations) {
    let value = values[field]
    errors[field] = validations[field].map(validateField =&amp;gt; {
      &lt;span class="keyword"&gt;return&lt;/span&gt; validateField(value, values)
    }).find(x =&amp;gt; x) &lt;span class="comment"&gt;// Take the first non-null error message.&lt;/span&gt;
  }
  &lt;span class="keyword"&gt;return&lt;/span&gt; errors
};

&lt;span class="reserved"&gt;export&lt;/span&gt; &lt;span class="keyword"&gt;default&lt;/span&gt; connect(mapStateToProps, mapDispatchToProps)(
  reduxForm({
    &lt;span class="key"&gt;form&lt;/span&gt;: &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;editClient&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;,
    validate
  })(EditClientPage)
);&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That is just straight from the docs. For each field it iterates over its validators and reports the first error it finds. It’s simple, but it doesn’t know how to handle nesting like with &lt;code&gt;FieldArray&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But notice you can really do whatever you like. It’s a bit striking for someone used to Rails how exposed and customizable everything is here. So how can we rewrite &lt;code&gt;validate&lt;/code&gt; to be smarter?&lt;/p&gt;

&lt;p&gt;Ideally we’d like to support validations &lt;em&gt;both&lt;/em&gt; on individual sub-fields, like the &lt;code&gt;name&lt;/code&gt; of a single work category, and &lt;em&gt;also&lt;/em&gt; on the &lt;code&gt;workCategories&lt;/code&gt; as a whole, for example checking that you have at least one. This is exactly what the Redux Form example does above: it checks that your club has at least one member, and that each member has a first and last name.&lt;/p&gt;

&lt;p&gt;Because this is a &lt;code&gt;FieldArray&lt;/code&gt;, Redux Form expects a different structure for its entry in the errors object returned by &lt;code&gt;validate&lt;/code&gt;. Normally you’d have a string value, like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;{
  &lt;span class="key"&gt;paymentDue&lt;/span&gt;: &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;must be numeric&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But for a &lt;code&gt;FieldArray&lt;/code&gt; you want to pass an array with one error object for each corresponding &lt;code&gt;FieldArray&lt;/code&gt; element, like this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;{
  &lt;span class="key"&gt;workCategories&lt;/span&gt;: [
    {},   &lt;span class="comment"&gt;// no errors&lt;/span&gt;
    {&lt;span class="key"&gt;name&lt;/span&gt;: &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;is required&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;},
    {&lt;span class="key"&gt;name&lt;/span&gt;: &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;is too short&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;, &lt;span class="key"&gt;rate&lt;/span&gt;: &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;must be numeric&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;},
  ]
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Those sub-errors will get passed in the &lt;code&gt;meta&lt;/code&gt; object given to the component used to render each &lt;code&gt;Field&lt;/code&gt;. Again, you can see that happening in the Redux Form example.&lt;/p&gt;

&lt;p&gt;In addition, the array may have its own &lt;code&gt;_error&lt;/code&gt; attribute for top-level errors. That gets passed as &lt;code&gt;meta.error&lt;/code&gt; to the &lt;code&gt;FieldArray&lt;/code&gt; component itself. So &lt;code&gt;_error&lt;/code&gt; is not just a convention; it’s a “magic” attribute built into Redux Form. We want to set it too.&lt;/p&gt;

&lt;p&gt;I want a way to define all these validations in one big object: top-level fields, the &lt;code&gt;FieldArray&lt;/code&gt; itself, and the fields of individual &lt;code&gt;FieldArray&lt;/code&gt; records. Here is the structure I set up:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const validations = {
  &lt;span class="key"&gt;name&lt;/span&gt;:       [required()],
  &lt;span class="key"&gt;paymentDue&lt;/span&gt;: [required(), numericality({&lt;span class="reserved"&gt;int&lt;/span&gt;: &lt;span class="predefined-constant"&gt;true&lt;/span&gt;, &lt;span class="key"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;: &lt;span class="integer"&gt;0&lt;/span&gt;})],
  ...
  workCategories: {
    &lt;span class="key"&gt;_error&lt;/span&gt;: [
      required({&lt;span class="key"&gt;msg&lt;/span&gt;: &lt;span class="string"&gt;&lt;span class="delimiter"&gt;"&lt;/span&gt;&lt;span class="content"&gt;Please enter at least one work category.&lt;/span&gt;&lt;span class="delimiter"&gt;"&lt;/span&gt;&lt;/span&gt;}),
      length({&lt;span class="key"&gt;min&lt;/span&gt;: &lt;span class="integer"&gt;1&lt;/span&gt;, &lt;span class="key"&gt;msg&lt;/span&gt;: &lt;span class="string"&gt;&lt;span class="delimiter"&gt;"&lt;/span&gt;&lt;span class="content"&gt;Please enter at least one work category.&lt;/span&gt;&lt;span class="delimiter"&gt;"&lt;/span&gt;&lt;/span&gt;})
    ],
    &lt;span class="key"&gt;name&lt;/span&gt;:   [required()],
    &lt;span class="key"&gt;rate&lt;/span&gt;:   [required(), numericality({&lt;span class="key"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;: &lt;span class="integer"&gt;0&lt;/span&gt;})],
  },
};&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then instead of the recommended &lt;code&gt;validate&lt;/code&gt; function I used this:&lt;/p&gt;

&lt;div class="CodeRay"&gt;&lt;div class="code"&gt;&lt;pre&gt;&lt;code class="language-javascript"&gt;const buildErrors = (validations) =&amp;gt; (values) =&amp;gt; {
  const errors = {};
  &lt;span class="keyword"&gt;for&lt;/span&gt; (let field &lt;span class="keyword"&gt;in&lt;/span&gt; validations) {
    &lt;span class="keyword"&gt;if&lt;/span&gt; (field === &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;_error&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;) &lt;span class="keyword"&gt;continue&lt;/span&gt;;
    let value = values[field];
    const fieldValidations = validations[field];
    &lt;span class="keyword"&gt;if&lt;/span&gt; (fieldValidations.constructor === Array) {
      errors[field] = fieldValidations
          .map(validateField =&amp;gt; validateField(value, values))
          .find(x =&amp;gt; x);
    } &lt;span class="keyword"&gt;else&lt;/span&gt; {
      errors[field] = value ? value.map(o =&amp;gt; buildErrors(fieldValidations)(o)) : [];
      &lt;span class="keyword"&gt;if&lt;/span&gt; (fieldValidations._error) {
        errors[field]._error = fieldValidations._error
            .map(validateField =&amp;gt; validateField(value, values))
            .find(x =&amp;gt; x);
    }
  }
  &lt;span class="keyword"&gt;return&lt;/span&gt; errors;
}

&lt;span class="reserved"&gt;export&lt;/span&gt; &lt;span class="keyword"&gt;default&lt;/span&gt; connect(mapStateToProps, mapDispatchToProps)(
  reduxForm({
    &lt;span class="key"&gt;form&lt;/span&gt;: &lt;span class="string"&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;span class="content"&gt;editClient&lt;/span&gt;&lt;span class="delimiter"&gt;'&lt;/span&gt;&lt;/span&gt;,
    &lt;span class="key"&gt;validate&lt;/span&gt;: buildErrors(validations),
  })(EditClientPage)
);&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are a few things to note here: I’m using a recursive function, where each call handles one “level” of the &lt;code&gt;validations&lt;/code&gt; object. It iterates over the field names, and if the field has an array, it handles it as before. Otherwise it expects a nested object structured just like the outermost object, which each sub-field has its own array of validators. There may also be a list of validators under &lt;code&gt;_errors&lt;/code&gt;, and those are handled specially. I’m using currying so that I can build a top-level validation function for &lt;code&gt;reduxForm&lt;/code&gt; as well as nested functions for each &lt;code&gt;FieldArray&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This function also supports reporting errors &lt;em&gt;both&lt;/em&gt; on sub-fields and the whole &lt;code&gt;FieldArray&lt;/code&gt; at the same time. That wouldn’t happen if the only &lt;code&gt;FieldArray&lt;/code&gt; check was to have at least one element, like here, but you can imagine scenarios where you want to validate the &lt;code&gt;FieldArray&lt;/code&gt; as a whole even when it isn’t empty.&lt;/p&gt;

&lt;p&gt;I’m happy that this approach lets me combine the ease of &lt;code&gt;redux-form-validators&lt;/code&gt; with &lt;code&gt;FieldArray&lt;/code&gt; to get the best of both worlds. I also like that &lt;code&gt;buildErrors&lt;/code&gt; is general enough that I can move it to a utility collection, and not write it separately for each form in my app.&lt;/p&gt;

&lt;p&gt;Also: you might enjoy &lt;a href="/posts/2019/01/drawing-redux-form-fieldarrays-with-pug/"&gt;my follow-up article showing how to render the form &lt;code&gt;Field&lt;/code&gt;s and &lt;code&gt;FieldArray&lt;/code&gt;s with Pug and Bootstrap&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
</feed>

