09 Nov 2025
There are some things in mathematics that are simply too magical to be true. One such thing, in my opinion, are self-inverse and unitary transforms. In this post, I’d like to talk about some of the most beautiful self-inverse transforms that I have come across so far. I’m very sure I’ve missed some gems, but ah well one learns something new everyday.
A transform \(T\) takes an input object \(x\) and outputs another object which we call \(T(x)\) or \(Tx\). We say \(T\) is applied to \(x\). Real functions are the most common example of transforms, taking one real number to another.
A transform becomes self-inverse if it can be applied to its own output to get back the original input. That is, to undo \(T\), we simply have to redo it. This is amazing news for engineers, because for many applications, coming up with an input that produces a desired output is much harder (e.g. finding a function that produces a known signal or satisfies a known differential equation) than verifying that a given input produces a desired output (e.g. checking that a function satisfies a differential equation). That is, re-doing a transform is often much, much easier than undoing it.
To be cheeky, if construction was self-inverse, then all we would have to do to rebuild something is to destroy it again. Too bad this isn’t true in practice.
Much of the engineering world is familiar with the ubiquitous fourier transform, which revolutionized signal processing and reading this article via the internet would hardly be possible without it.
Here, in true computer science fashion, we will discretize and consider the Discrete Fourier Transform (DFT). Let \(n\) be a positive integer. The DFT is a linear transform \(\mathcal F\) taking an \(n\)-length sequence \(a = (a_0, a_1, \ldots, a_{n-1})\) to another sequence \(\mathcal F(a) = (b_0, b_1, \ldots, b_{n-1})\) defined by
\[b_k = \frac{1}{\sqrt{n}} \sum_{j=0}^{n-1} a_j \left(e^{-2\pi \iota k/n}\right)^j \quad \text{for } k = 0, 1, \ldots, n-1.\]
As a matrix operation, consider the \(n\times n\) matrix \(F\) with entries
\[F_{jk} = \frac{1}{\sqrt{n}} e^{-2\pi \iota jk/n} \quad \text{for } j, k = 0, 1, \ldots, n-1.\]
Then the DFT is simply the matrix multiplication \(\mathcal F(a) = F a\).
So the DFT is what’s called unitary, which is almost as good as self-inverse: to undo the DFT, we need to do something as easy as apply the DFT again. In particular, let’s define the adjoint of the DFT, denoted \(\mathcal F^\dagger\) via the matrix \(F^\dagger\) with elements \(F^\dagger_{jk} = \frac{1}{\sqrt{n}} e^{2\pi \iota jk/n}\) – so just a change from \(\iota\) to \(-\iota\). Since mathematics is symmetric in \(\iota\) and \(-\iota\), any algorithm to apply \(\mathcal F\) yields one with the same complexity to apply \(\mathcal F^\dagger\). The DFT satisfies the property \(F^\dagger F = I\), where \(I\) is the identity matrix. This means that for every sequence \(a\), we have \(\mathcal F^\dagger \mathcal F(a) = a\) (the unitary property).
Take a moment to appreciate the beauty of the transform: it’s very easy using something called the Fast Fourier Transform to apply the DFT to a sequence. And yes, when we receive an “impure” signal, we often apply the DFT to it. Purifying the result of the DFT happens to be super-easy: the true signal shows up in the result as spikes (Grant Sanderson does a great job (as usual) explaining this, by the way). Filtering the result by setting the noise to zero and leaving the spikes is easy too. The main thing now is to find a signal that produces this filtered result directly – un-doing the fourier transform on the filtered result. The mathemagic of the unitarity of the DFT means that this is just as easy as applying the DFT. And boom: signal processing works!
If you’re interested to learn more, I highly recommend this amazing set of notes and of course, Grant Sanderson’s video on the subject.
Interlude: Convolutions
Central to many self-inverse transforms is the idea of a convolution. A convolution (etymology: entwine, combine) is a way of combining two sequences to produce a third sequence, and it is defined as follows: given two infinite sequences \(a = (a_0, a_1, \ldots)\) and \(b = (b_0, b_1, \ldots)\), their convolution \(c = a * b\) is another infinite sequence defined by
\[c_i = \sum_{j=0}^{i} a_j b_{i-j} \quad \text{for } i = 0, 1, \ldots.\]
It arises naturally in many settings; most students encounter it first in the context of polyonomial multiplication: if \(a(x) = a_0 + a_1 x + \ldots + a_n x^n\) (represented by the sequence \((a_0, a_1, \ldots, a_n, 0, 0, \ldots)\)) and \(b(x) = b_0 + b_1 x + \ldots + b_m x^m\) (represented similarly) are two polynomials, then the coefficients of the product polynomial \(c(x) = a(x)b(x)\) are given by the convolution: \(c_i = \sum_{j=0}^{i} a_j b_{i-j}\) for \(i = 0, 1, \ldots\).
The convolution may appear convoluted, but it finds use more often than one might expect: think enumerative combinatorics, fourier analysis, number theory, etc. (The thing in “convolutional” neural networks is actually a cross-correlation and is slightly different but can be presented as an evaluation of a convolution too.) The essential motivation behind convolutions is that often one wants to pick some of one and some of another, in a way that the total quantity picked is a constant. And the number of different ways we can do that for each value of the total amount picked is described by the result of the convolution. For example, in combinatorics, the convolution tells you how many ways you can select items from two groups so that the total number of items selected equals a given number.
A key property of the convolution is that it is associative: that is, for any three sequences \(a, b, c\), we have
\[(a * b) * c = a * (b * c).\]
Indeed, notice that for any \(i \geq 0\),
\[\begin{align*}
[(a * b) * c]_i &= \sum_{j=0}^{i} (a * b)_j c_{i-j} \\ &= \sum_{j=0}^{i} \left(\sum_{k=0}^{j} a_k b_{j-k}\right) c_{i-j} \\
&= \sum_{k=0}^{i} a_k \left(\sum_{j=k}^{i} b_{j-k} c_{i-j}\right) \\
&= \sum_{k=0}^{i} a_k \left(\sum_{j'=0}^{i-k} b_{j'} c_{i-k-j'}\right) \\
&= \sum_{k=0}^{i} a_k (b * c)_{i-k} \\ &= [a * (b * c)]_i.
\end{align*}\]
Notice how \(i-k = i-j + j-k\) helps us re-write the summation over \(j\) as another convolution evaluated at \(i-k\).
The convolution is also commutative, i.e. \(a * b = b * a\) for any two sequences \(a\) and \(b\) (left as an easy exercise for readers with pen and paper). Associativity and commutativity help us manipulate convolutions easily without having to re-write them as summations (more on this later), treating sequence convolution like integer multiplication.
Convolutions go much, much further than the standard convolution presented here. We’re going to see one more in this post, and perhaps one or two more in the sequel post.
The binomial convolution is defined by
\[[a \circ b]_n := \sum_{j=0}^{n} \binom{n}{j} a_j b_{n-j} \quad \text{for } n = 0, 1, \ldots.\]
For readers familiar with generating functions, this is the convolution involved in the product of two exponential generating functions.
Surprisingly, the binomial convolution is also associative! For each \(n \geq 0\), we have
\[\begin{align*}
[(a \circ b) \circ c]_n &= \sum_{j=0}^{n} \binom{n}{j} (a \circ b)_j c_{n-j} \\
&= \sum_{j=0}^{n} \binom{n}{j} \left(\sum_{k=0}^{j} \binom{j}{k} a_k b_{j-k}\right) c_{n-j} \\
&= \sum_{k=0}^{n} a_k \left(\sum_{j=k}^{n} \binom{n}{j} \binom{j}{k} b_{j-k} c_{n-j}\right).
\end{align*}\]
We now call for some sorcery with the binomial coefficients: it turns out that
\[\begin{align*}
\binom{n}{j} \binom{j}{k} &= \frac{n!}{j!(n-j)!} \cdot \frac{j!}{k!(j-k)!} \\
&= \frac{n!}{k!(n-j)!(j-k)!}\cdot\frac{(n-k)!}{(n-k)!} = \binom{n}{k} \binom{n-k}{j-k}.
\end{align*}\]
This helps us re-write the inner summation as a convolution:
\[\begin{align*}
\sum_{j=k}^{n} \binom{n}{j} \binom{j}{k} b_{j-k} c_{n-j} &= \sum_{j'=0}^{n-k} \binom{n}{k} \binom{n-k}{j'} b_{j'} c_{n-k-j'} = \binom{n}{k} (b \circ c)_{n-k},
\end{align*}\]
which allows us to conclude that
\([(a \circ b) \circ c]_n = \sum_{k=0}^{n} \binom{n}{k} a_k (b \circ c)_{n-k} = [a \circ (b \circ c)]_n,\)
so the $\circ$ operation is associative. Since \(\binom{n}{k} = \binom{n}{n-k}\), the binomial convolution is also commutative.
A similar convolution is the Dirichlet convolution, defined by
\[[a \star b]_n := \sum_{d|n} a_d b_{\frac{n}{d}} \quad \text{for } n \geq 1,\]
where the notation \(d \mid n\) means that \(d\) divides \(n\) (so the sum is over all divisors \(d\) of \(n\)). The Dirichlet convolution is also associative (hell yes) and commutative, and it arises naturally in number theory, especially in the context of multiplicative functions. Commutativity of \(\star\) is easy to see; let’s quickly show that it is also associative. For each \(n \geq 1\), we have
\[\begin{align*}
[(a \star b) \star c]_n &= \sum_{d|n} (a \star b)_d c_{\frac{n}{d}} \\
&= \sum_{d|n} \left(\sum_{e|d} a_{e} b_{\frac{d}{e}}\right) c_{\frac{n}{d}} \\
&= \sum_{e|n} a_{e} \left(\sum_{d: e|d, d|n} b_{\frac{d}{e}} c_{\frac{n}{d}}\right).
\end{align*}\]
The inner summation is over all \(d\) such that \(e\mid d\) and \(d\mid n\), which is equivalent to (setting \(d = d'e\) for positive integer \(d'\)) summing over all \(d'\) such that \(d'\mid \frac ne\). So we can re-write the inner summation as
\[\sum_{d'|\frac ne} b_{d'} c_{\frac{n}{d'e}} = (b \star c)_{\frac{n}{e}}.\]
This yields
\[[(a \star b) \star c]_n = \sum_{e|n} a_e (b \star c)_{\frac{n}{e}} = [a \star (b \star c)]_n,\]
and done. Yay!
Please do notice the very similar structure of the proofs of associativity for the different convolutions. Many different convolutions go through the same proofs and have similar properties, and after a while the mathematician treats such new proofs as “pencil-pushing” exercises, referring to the mechanical re-writing of the same steps over and over again. Other mathematicians filter out exactly what properties are used about the convolution to make the proof work, and then generalize the proof to any convolution that satisfies those properties in one go – you see, abstraction in mathematics is quite similar to the “if you use it more than once, make it a function” rule in programming.
Why care about – indeed, focus on – convolutions in a post on self-inverse transforms? Here’s the big idea: invertible convolutions with identity lead to a natural invertible transform. And often, this transform is (almost) self-inverse. Thus, in a sense, convolutions can generate self-inverse transforms.
The identity for a convolution operation is a sequence \(e\) such that for any sequence \(a\), we have \(a * e = a\). For the usual convolution \(*\), notice that the identity is the sequence \(e = (1, 0, 0, \ldots)\). The same sequence is also the identity for the binomial convolution \(\circ\) (and the Dirichlet convolution, and more). We say that sequence \(b\) inverts the sequence \(a\) if \(a * b = e (= b * a)\). By commutativity and associativity, this means that if \(d = a * c\), then \(b * d = c\), in accordance with rules that treat \(b\) as “\(1/a\)”.
Consider the following toy example. Define the partial-sum transform \(S\) taking one sequence to another by \(S(a) := a * \mathbf 1\), where \(\mathbf 1\) is the sequence \((1, 1, 1, \ldots)\). Notice that for each \(i \geq 0\), we have
\[S(a)_i = \sum_{j=0}^{i} a_j,\]
hence the name partial-sum. Now suppose for a moment that we could find a sequence \(\lambda\) that inverts the sequence \(\mathbf 1\), i.e. \(\mathbf 1 * \lambda = e\). This would mean
\(S(a) * \lambda = a * \mathbf 1 * \lambda = a * e = a,\)
which means that convolution with \(\lambda\) amounts to un-doing \(S\)! In general, if \(\mathbf 1\) can be inverted, then a transform generated by convolution with \(\mathbf 1\) can be inverted just as easily, by convolving with the inverse of \(\mathbf 1\). Choosing a particular convolution and computing \(\lambda\) is all that’s enough to conclude what’s called an inversion formula for the transform.
How does one find such an inverse \(\lambda\)? Just write out the constraints that must be met so that \(\lambda * \mathbf 1 = e = (1, 0, \ldots)\).
\[\begin{align*}
\lambda_0 &= 1, \\
\lambda_0 + \lambda_1 &= 0, \\
\lambda_0 + \lambda_1 + \lambda_2 &= 0, \\
&\;\;\vdots
\end{align*}\]
which yields the unique solution \(\lambda = (1, -1, 0, 0, \ldots)\). Alright, so what does this give us? \(a = S(a) * \lambda\), so for each \(i \geq 1\),
\[a_i = [\lambda * S(a)]_i = \lambda_0 S(a)_i + \lambda_1 S(a)_{i-1} = S(a)_i - S(a)_{i-1}.\]
But this is obvious: each element of the sequence \(a\) is indeed the difference between adjacent prefix-sums!
Of course, like most things in mathematics, this is far from the end of the story. Much more powerful transforms and the associated inversion formulas can be generated with different convolutions. For example, doing the same thing as above with the binomial convolution, for example, leads to the binomial transform \(B\) and the nontrivial Pascal’s inversion formula (Pascal really did go crazy with binomial coefficients!). And with the Dirichlet convolution, the celebrated Möbius inversion formula. Note, of course, that all the nontriviality of a theorem thus generated lies in the proof of the associativity and commutativity of the convolution involved and in the proof that the sequence \(\mathbf 1\) is indeed inverted by \(\lambda\).
Anyways, we must now generate \(\lambda\) for the binomial convolution. The constraints are
\[\begin{align*}
\lambda_0 &= 1, \\
\binom{1}{0} \lambda_0 + \binom{1}{1} \lambda_1 &= 0, \\
\binom{2}{0} \lambda_0 + \binom{2}{1} \lambda_1 + \binom{2}{2} \lambda_2 &= 0, \\
&\;\;\vdots
\end{align*}\]
which yields the pleasant solution \(\lambda = (1, -1, 1, -1, \ldots)\) (remember \(\sum_{k=0}^n (-1)^k \binom{n}{k} = 0\) for \(n \geq 1\)?). The associated transform is
\[B(a)_n = \sum_{k=0}^{n} \binom{n}{k} a_k,\]
and so our computation of the inverse \(\lambda\) yields the inversion formula
\[a_n = \sum_{k=0}^{n} \binom{n}{k} B(a)_k (-1)^{n-k}.\]
so \(B\) is also “almost self-inverse”, except for the alternating sign. (Note: alternating signs can sometimes prove to be a nuisance in efficient computation, see permanent vs determinant)
That’s a wrap. For the reader interested in a further application of convolutions, let’s go to combinatorics: I recommend checking out generating functions, compositional constructions and the use of convolutions as the coefficients of products of generating functions. Hopefully I’ll get to writing about this at some point.
11 Jun 2025
Goodness gracious. I almost forgot about the post due today. We’ll start with a book series review that took me a fair fraction of 2024 to cover most of (yes, I haven’t still read them all). It was totally worth the effort.
The author first: Isaac Asimov was a biochemist who wrote extensively on science and science fiction. He is best known for the namesake of this post, the Foundation Saga. The series is a grand tapestry of speculative human history, spanning thousands of years, tons of planets across the galaxy. The entire series spans 20,000 years of human history, from around the current time when generalist robots have just been invented but are often large hunks and mistake-prone, to the exploration and colonization of nearby planets, interstellar travel and the colonization of star systems, to the rise of a Galactic Empire, to the eventual inevitable fall of said Empire, and the cleverly orchestrated rise of a second Empire, finishing with plans of turning the galaxy into something very special indeed. Naturally, it was also written over a period of fifty-one years and comprises 20+ books, with three independent series (Robot, Foundation and Empire) glued together by a common thread of characters and events, prequels, sequels, and more standalone novels. (Personal opinion: I do not think anyone but a scientist in the know could have written such a series given the depth and realism of the science involved. Then again, it is hard to believe that a scientist could be so ingenious in creating the so-very-relatable social and political dynamics in the series. Asimov was a rare gem indeed.)
The main draw of the series in my opinion, apart from the sheer scale and scope of the story, is the portrayal of very realistic scientific, social and political dynamics that inevitably show up in the history of a civilization. A few examples follow.
-
Consider the development of a robot that has been programmed to never harm a human, both physically and psychologically. Anne and Brit approach the robot. Anne is in love with Bryan, but the reverse is sadly not true. Anne asks the robot if Bryan loves her. The robot, having been programmed with overruling priority to never harm a human, says yes. And you can probably guess the rest of the soap opera that unfolds.
-
As a more serious example, consider the planet Terminus recently colonized by a group of scientists and engineers, suddenly showing up with technology that is far advanced than that of the neighboring planets. Terminus has essentially no resources of its own, both for sustenance and for military freedom. What does Terminus do, desperate to stay alive? As it happens, not something very nice, but something that has indeed happened over and over in our own (real) history. They start by nuclear-war blackmail and spread misinformation among the neighboring planets, causing them to fight amongst each other and divert attention from Terminus. The real punch comes next. They set up a techno-religious hegemony, operating their tech under a veil of mysticism and magic. It’s hard to believe how well that worked for them. They simply won outright (details: please read the book). To compound matters, the neighboring planets also depended on Terminus for technology, and Terminus traders capitalized hard on the monopoly. Finally, the neighboring planets simply collapsed on themselves, marking the first successful expansion for Terminus-folk.
-
The whole book, “I, Robot” is a part-parody, part-serious take on all the various ways robots can “accidentally” bypass their programming and the ingenious ways to set them right. The book is a collection of short stories, each with a different robot and a different situation, but all connected by the same theme of robots and their interactions with humans. The stories explore the ethical implications of robotics and the complexities of human-robot relationships. No more spoilers here, it’s far too much fun reading the book oneself. Betcha you’ll lose your marbles at least thrice :D.
The series has had real impact on science, too (yes, the name “Robot” is also due to Asimov). All robots are programmed with the so-called “Three Laws of Robotics” (extended much, much later to include a zeroth law), a set of extremely relevant and realistic laws that robots are constrained by.
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Asimov explains that the laws are enforced by means of potential walls in robots’ positronic brains. Essentially, the thinking pathways are designed to travel down low-energy paths, and pathways that would lead to a violation of the laws are held back by a high potential wall, forcing other paths to be taken. Unfortunately for robots, sometimes there is no path that does not lead to a violation of the laws, and the robot is forced to attempt jumping a wall, most often leading to the robot’s brain’s destruction. While still futuristic for today’s standards, such an approach to enforcing laws in robots is not entirely unrealistic. The laws of robotics still influence research in artificial intel security.
Another central concept in the series is “Psychohistory”, a fictional science that combines history, sociology, and statistical mathematics to predict the future of large groups of people. The idea is that while individual actions are unpredictable, the behavior of large populations can be modeled and predicted with a high degree of accuracy. Scientist Hari Seldon first created this science, only to discover that the Galactic Empire was predestined to fall. Using his science, he injects the Foundations at ends of Galaxy, two groups of people filled with just the right people and instructed in the right ways so as to induce humanity along a path with a heavily-shortened period of chaos following the Empire’s collapse, with a speedy recovery and formation of a new Empire. To put it rather oversmartly but memorably, psychohistory is behind the foundation behind the Foundations.
It would be rather unsatisfactory if psychohistory was exact, since there is an obvious element of uncertainty in human dynamics, even taken together, especially when the predictions are made over a long time period. Indeed, this happens with almost disastrous consequences when a mutant with mental powers, completely unaccounted for by psychohistory, appears on the scene (Foundation and Empire). A small branch: mentalist powers play an increasingly prominent role in the series, with the scientific basis for the ability to influence and (slightly) control human minds described as a consequence of extensive understanding over millenia of the electromagnetic waves emanating from the brain upon its information processing. Mentalists are able to manipulate these waves to influence the thoughts and actions of others, a power that becomes crucial in the later books of the series. Lots of different modes of power in this series!
A quick note on Asimov’s writing style: it’s most often straightforword with little flowery prose, favoring clarity and storyline over . This approach makes his books accessible and easy to follow, though it can sometimes feel a bit dry. The characters tend to be lightly sketched (well, it isn’t a drama or high fantasy, cut him some slack!), and the dialogue may come across as stiff at times. Still, the sheer originality and depth of the ideas more than compensate for any stylistic limitations.
Finally, here is a brief chronology of the series (not including a few standalone books, e.g. Nemesis) goes like so:
Robot Series:
- I, Robot (1950) [Pilot to the whole saga. Set in the very near future; robots are just being invented and tinkered with]
- The Caves of Steel (1954) [Murder mystery. A thousand years later. Colonization is on, and the robot-powered terraformed planets house long-lived, disease-prone humans]
- The Naked Sun (1957) [Another whodunit. few decades later. Earthmen colonize again]
and sequels:
- The Robots of Dawn (1983) [Yet another, some years later. This time, a very powerful robot is the victim.]
- Robots and Empire (1985) [Two centuries later. Earth’s fate in the balance here, with colonizers trying to get rid of it.]
Empire Series (mid-life of the Foundation Universe, tidbits from the formation of the First Galactic Empire):
- The Stars, Like Dust (1951)
- The Currents of Space (1952)
- Pebble in the Sky (1950)
Foundation Series:
- Foundation (1951) [Fast-forward to 20,000 years in the future. The Galactic empire is failing. The first Foundation is set up at Terminus and faces a rocky start to existence.]
- Foundation and Empire (1952) [The empire (tried to) strike back. One last time. A slightly mentalist mutant proves to be an extremely strong antagonist.]
- Second Foundation (1953) [The reveal of the Second Foundation, a perfect south to Terminus’ north. If Terminus-folk played chess, they didn’t even see the full board as the Second Foundation did.]
and sequels:
- Foundation’s Edge (1982) [Something very important is orchestrated to happen, and the power behind is not Terminus or the Second Foundation. A third power was always hiding in the shadows.]
- Foundation and Earth (1986) [The backward edge to the Robot Series. The search for the Mother Planet, Earth and meeting the true powers of the galaxy. Of course, the aforementioned final transformation of the galaxy into something fantastic.]
and latest, prequels (On the development of Psychohistory and on Hari Seldon, the brain behind the Foundations):
- Prelude to Foundation (1988)
- Forward the Foundation (1993)
(as you might have guessed, the books without a brief explanation adjoining them have not been read yet.)
Well, as any wise Pundit will have you know, one must say their closing prayers: my apologies for the overly long review, any typos and incorrect grammatical usage, and the possibly obvious lack of a proper edit (too close to my three-day deadline, oops).
And then, that elusive section 7, Conclusion. The Foundation saga is a must-read for both the beginner to sci-fi and the seasoned sci-fi afficionado. Outside the sci-fi and all the fancy robots, tech and space travel, the series is a deep exploration of society, politics and ethics. If you haven’t read it yet, I highly recommend you do so. And if you have, I hope this review brings back memories fond and frustrating from the saga, as it did for me. Since this series has impacted me fairly, I’d like to plagiarize the Sunday times’ review of the Lord of the Rings and summarize:
“The science-fiction population is divided into two groups: those who have read the Foundation saga and those who are going to read it.”
That’s a wrap. Until next time on Saturday the fourteenth. Do ping me if you’ve got comments or suggestions. I’m very new to this business of blogging and could certainly use some help. You can reach me at the usual email address: kagaram@cse.iitb.ac.in.
Happy reading!
06 Jul 2024
Most people know the familiar feeling of regret when they give up on developing a habit that they know is good for them. It’s a feeling that’s hard to shake off, and it’s a feeling that’s hard to ignore. It simply feels, well, unbearable to continue with the habit, and resistance wins. I have been wondering for a while now why this might be the case, and I will share here what I have found so far.
The ancients developed habits and patterns of thinking for immediate (and preferably, as comfortable as possible) survival - that was top priority. The modern age has made this mostly unneccessary, since we have the luxury of safety. However, the pattern of immediate comfort is still with us, though no longer optimal.
The brain is still wired to seek out immediate comfort and/or avoid immediate discomfort. The big problem with this is that the habits that are good for us in the long run are often very uncomfortable in the short run. Let’s look at how a habit forms. This analysis is mostly based on by Hal Elrod.
A good habit generally takes thirty days to form. I would like you to take a habit you want - typically, becoming a “morning person” or churning out that “morning workout” works for most people - and imagine yourself through the process below of developing that habit.
The first two-three days comprise a sort of honeymoon period, where the habit is new and exciting, and you are constantly dreaming of yourself with the benefits of the habit. But then they do not, of course, show up after two days. The next week, it is ridiculously unbearable to continue the habit, as you believe less and less in its arrival. This is sort of like how we always expect that Amazon package to show up in two days, and when it doesn’t, we are pretty impatient every day after that.
A comical example of this phenomenon in real life is the abundance in parking space at the local gym just the week after New Year’s following long lines in the first week (yes, there was a study on this).
After ten days, it is excruciating to continue for one more day simply with willpower, and this is where it starts to make sense to move it to auto-pilot, with techniques like habit stacking. A few benefits might start to show up after two weeks, but they are not that significant. It is the uncomfortable phase of the habit formation process. You are not experienced, and so are slightly less reluctant to continue. The habit is not yet part of you, though, and its benefits are not great yet.
After three weeks, the habit has now become a part of your routine, and you start to associate with it. The benefits are now starting to show up, and you see the light at the end of the tunnel.
Quoting Hal Elrod in , “The third 10-day phase is crucial to sustaining your new habit, long term. The final 10 days is where you positively reinforce and associate pleasure with your new habit. You’ve been primarily associating pain and discomfort with it during the first 20 days. Instead of hating and resisting your new habit, you start feeling proud of yourself for making it this far.”
This is the unstoppable phase of the habit formation process. The habit is now slowly becoming a part of your identity, and you are starting to see the benefits of it. It is likely that you will continue the habit, since you have now seen the benefits of it in your life.
After thirty consecutive days with the habit, it has become a part of your identity and you will continue with it for a while.
It is difficult to build a good habit - there is no denying that - but it is far better to spend thirty days of discomfort to get a lifetime of benefits - pounds of pain to get that habit always beats tons of regret, logically speaking.
The problem is that as with the above process, the nice things about a habit are always delayed with uncomfortable things in the short run. Our wiring of the brain to avoid immediate discomfort is quite difficult to avoid.
However, when you realize that the benefits of a habit are simply delayed and not non-existent, and that the discomfort is temporary, it becomes slightly easier to continue with the habit.
To make it less difficult to build a habit, a few ideas follow.
If you think about it, we have an identity about what kind of person we are. For example, some people might think of themselves as “night owls”. The ancients had their own identities; they were known by their identities.
There is a difference between a person’s identity and character. Your identity is who you truly believe you, as a person are. Your character is how you function as a person. Some aspects of your character influence your interaction with the world, others influence your own personal feelings and happiness, and some are well, just hidden by will.
Anyways, no surprise that your identity determines, along with your environment and opportunities, your character.
The point is, a habit is simply a change in your character that you want to make. The best way to do that? Put it into your identity. You can’t control fully your environment (more on that later). But hell yeah you can control your identity. You can make it whatever you wish. Essentially, you now think of yourself as a “morning person” or a “workout person” to help build the corresponding habits of waking up early or working out.
Well, the obvious fallacy in this argument is that it isn’t true yet that you are a “morning person” or a “workout person”. How can you just “add” that into your identity when you have close to no overlap with it in terms of who you are currently?
Valid point. The solution is to start to make it part of your identity. That is, to affirm and re-affirm that you are becoming a morning person or a fit athlete. You are not lying to yourself here, and it is indeed true, when you start building the habit, that you are becoming a person with the habit. It is your identity now: I am learning this habit.
The great thing about this is that you are tricked into continuing, using momentum. After a few days, the fact that you are becoming a morning person is confirmed by your beginning practice of waking up early, and you reinforce the identity. The next day, your growing identity gives you a bit of a push to continue.
It is very much like the Duolingo streak, where you are tricked into continuing by the fact that you have a streak going.
The all-important thing is to really commit to becoming a morning person or a workout person. Only then can you believe that you are becoming one, and this identity trick will work. Committing is where 95% of the world struggles. They want something, but are unwilling to really decide to get it - because will it really be that bad now if I don’t make it?
Yes - in the ancient times, and so they didn’t need any tricks to force them to make it. Today, thanks to a reasonably stable life for most people, the answer to the question is usually in the negative. And so one concludes that one doesn’t need to do it so badly. The regret only comes years later, when the regret of not doing it is truly unbearable, knowing that the benefits of the habit would have been so great in compounding over time.
Once you are committed to building a habit, reinforce your identity as starting to become someone with the habit. With this identity and practicing the habit, after a week or so, your identity will already start to become “I am a morning person” or “I am a workout person”. That will go a great distance, and you have already won half the battle.
The discomfort of building the habit is now acknowledged as a necessary part of the process, and now that it is in your identity, you will forage forward because that is who you are. Slowly, but surely, the new identity will push the habit into your character.
The other thing that determines your character is your environment. You can probably guess what’s next. You want the habit in your character, why not put it in your environment too? Makes it even easier to get it into your character. Indeed it does! James Clear has a lot to say about this in , and for completeness I will try to present the essence of his ideas here.
Suppose the lights are turned off and your bedroom is dark tomorrow morning at 7:00 AM. Would you, if not used to being up this early, go out of your way to get yourself up and out of bed? I certainly wouldn’t. But if the lights are on, the sun is in your face, and you can hear the hustle and bustle of the day outside, you are much more likely to get up. I know I am. This is because your environment is now conducive to waking up early.
The time of day did not change. It was the environment that changed, which changed your reaction along with it.
James Clear suggests a ton of ways to make your environment conducive to building numerous habits, and I really recommend checking them out in ; some of them have really helped me build some habits.
For now, since I think that a lot of people want to build the habit of waking up early, I’m going to detail Hal Elrod’s idea (in ) of making it obvious to wake up early. If you want to build this too, I’d like you to imagine yourself through the process below.
It starts by moving your alarm clock or phone to the other side of the room - near enough that it still gets you out of your sleep and irritates you enough to want to turn it off, but far enough that it can only be reached with at least a few seconds completely off the bed. In the morning, you are woken up and forced to walk and turn off the alarm.
You are still probably only thinking about the discomfort of having to exist without that bed and blanket, and are really contemplating going back. You convince yourself it’s okay, since you are so tired and probably need the extra ten minutes of sleep before your next alarm. That’s when you do step two: you have a glass of water ready to drink right next to the alarm.
Most people are typically simply dehydrated in the morning, though it might seem like they are tired. Drinking a glass of water immediately after waking up can cause you to make the distinction. You drink the water, and are not really so tired anymore, but it’s still quite uncomfortable to be out of bed early, especially if there are people nearby who are still sleeping - because why me? The next thing is to go brush your teeth.
This is typically hard, and is a sort of commitment to waking up now. We will see how to make that easier in a bit. Once you brush, you have more or less committed to being up. As a final step, keep whatever you use for your routine - your workout clothes, your journal, your book, your phone, your computer - on your desk to make it easy for you to start the routine. Et voila, you are go for the day.
There are two things more important things that have existed as loopholes in the above process. The first is that - well, what if I am actually tired, even after drinking the water? Well, it is very likely then, that you have not slept early enough to wake up at this time.
Arguably a harder thing to fix is the sleep time - but that is precisely the thing that comes from your identity of starting to become a morning person. You are starting to wake up early, huh? You’ll need to sleep early then, otherwise you can’t. You are slowly committed to sleeping early. Another fix is simply to tell someone to force you into bed early.
Our parents did this when we were young. An interesting fix is to simply time the lights and the internet to go off at your bedtime, leaving you with no option but to sleep.
The second loophole is that of deciding to go to brush, i.e. starting the day at this time. Again, commitment is a fix. But there is another helpful technique here. It is known that the last thoughts before sleep is among the first thoughts after wake up. So, if you set your intentions the night before and take responsibility then for giving you some excitement or deadline for the next day, you are very likely to be pinged by the same thought in the morning, that can make all the difference in getting you to start the day then.
To summarize, we have seen two ways that can help make the process of building a habit less uncomfortable and more doable - by making it part of your identity and by making it part of your environment. Crudely put, the first is a mental trick, and the second is a physical trick.
Of course, there are others; one famous person times the following message “I am still slouching in bed” to be sent out publicly on Twitter at 6:00 AM the next day, and the embarrassment of that gets him out five minutes before, to push the sending back to the next morning. It’s what James Clear calls “Make it inevitable” in - there is simply no other option but to follow through.
Another is what is called habit stacking (see ), where it is in your routine to do something after/during something else you already do. For example, one techy kid connected his ipad’s internet to his gym cycle, so that he could only use watch Netflix while cycling. Or to time the internet to go off after dinner.
This is a great idea and acts like a walker till you can walk on your own. However, there is a caevat. It is a great way to start building a habit, but it is not a great way to sustain it. The idea is to make the habit so much a part of your identity and environment that you don’t need the walker anymore.
The last part of this article is about the habit-building process and how we are often miles off in our expectation of how it will be. The idea is very simple.
Most people overestimate what they can do in a day or week and underestimate what they can do in six months.
Many people might relate to this. I do, surely. The reason for this is quite simple - (1) our society at large believes in overnight successes, which are never close to the whole truth, never mind what the media says about it, and (2) we are terrible at estimating the power of compounding. This mindset is remedied over time, as you experience more. The important thing is to reinforce the fact that the mindset is not correct from your own personal experiences.
Of course, this mindset is not conducive to building habits, since the benefits of habits show up in a manner exactly opposite to what is expected. Understanding this mindset conciously gives a plethora of ways to build habits and avoid it from affecting you.
For example, one way to realize that “missing it twice” isn’t the way to go. Missing it once is okay. Circumstances could force it. But the moment you miss it once the circumstance is no more an obstacle, it compounds and is the beginning of a new habit - the inverse.
Another example is that of atomic habits, advocated by James Clear in the epoynmous book. He suggests that you should make your habit practice so small that they are easy to do, however badly you overestimate your day and realize a shortage of time. For example, if your goal is to wake up at 6:00 AM regularly, make sure for the first week that your alarm is set five minutes earlier than usual.
That’s it. It’s very hard to resist it now, since it doesn’t really seem like a change. The most important thing is showing up for the habit. Once five minutes earlier becomes your new normal, you can move it to ten minutes earlier, and so on. This is the idea of making the habit so small that it is hard to resist.
That the benefits of the habit will follow in full in due time follows from the fact that these changes compound crazily over time, however much our underestimation of the power of these atomic changes discourages us. Ask any bodybuilder.
In short, trust in the compound effect. The most successful people are not overnight successes. It has taken years of struggle to get to where they are. It is laughably stupid to think that you can get to where they are in a day or a week. But it is also equally stupid to think that you can’t get to where they are in a year or two. The power of compounding is immense, and it is the most important thing to remember when building habits.
To conclude, building a habit is not an easy process, but the right perspective and understanding of the process, coupled with a conducive environment and identity, can make it much easier. You’re en route to your wildest dreams, and the only thing stopping you is you. Give it a shot.
If this was a substantial light bulb for you, dear reader, I am glad for you. You see, a very large proportion of the world - 95% - is living a life of mediocrity and putting off (and regretting) the things that are good for them (see for more details). I believe humans are far more capable than that, and I hope that a better future awaits us. I will try to do my bit. I hope you will too, now.
Acknowledgements: This is mostly just techniques from a few remarkable people and my own experiences as I have tried to build habits. I have tried to put them together in a coherent manner, and I really do hope it helps you, if you think you needed it.
References
03 Mar 2024
This is a write-up of a problem that appeared on one of my course tests. The problem is as follows:
Let \(G\) be a simple undirected graph with positive integer edge-weights. A subgraph of \(G\) is called triangle-free if it does not contain any cycle of length 3 (aka no triangles). The problem is to find a maximum-weight triangle-free subgraph of \(G\).
The problem is \(NP\)-hard, so we might as well try approximating it instead. The most natural triangle-free subgraphs are the bipartite subgraphs of \(G\) - how large a bipartite subgraph can we construct from \(G\) in polynomial time?
Show how to compute a bipartite subgraph of weight at least \(W/2\), where \(W\) is the sum of the weights of the edges in \(G\).
Hint 1: The problem is a classic example of a greedy algorithm.
Hint 2: Can you guarantee that for every vertex \(v\), the edge weight between \(v\) and its neighbors in its part is at most the edge weight between \(v\) and its neighbors in the other part? That would imply the problem.
Hint 3: Starting from an arbitrary partition and then greedily moving vertices across the parts may not be a good idea. Start from empty bins. There is a \(\mathcal{O}(n + m)\) time algorithm.
Solution: Set up two empty bins, which will serve as the bipartition. Order the vertices arbitrarily - \(v_1, …, v_n\). Starting from \(i = 1\), put \(v_i\) into the bin with the smaller total edge weight to the vertices already in the bin. Notice that each edge of the graph will be looked at precisely once, and at least half the total weight of the edges goes across the bins. The running time is the sum of the degrees + \(cn\) overhead - \(\mathcal{O}(n + m)\).
The motivation for this algorithm comes from the following inductive proof of the existence of such a subgraph. We induct on the number of vertices \(n\) - the statement is vacuously true when \(n = 1\). For \(n = k+1\), delete an arbitrary vertex \(v\); the rest of the graph, by hypothesis, has such a subgraph - suppose the partition is \([U, V]\). Finally, put \(v\) into the bin with the smaller total edge weight to the vertices already in the bin.
Before we get back to our original problem, let us consider the "complement" problem: finding a minimum weight subgraph \(H'\) such that every triangle in \(G\) contains at least one edge of \(H'\). It is clear that any solution \(H'\) when complemented in \(G\) is triangle-free. Can we manage to get a low-enough solution to the complement, then?
Let \(\text{OPT}_2\) denote the weight of the minimum weight subgraph \(H'\) such that every triangle in \(G\) contains at least one edge of \(H'\). Describe an algorithm that computes a subgraph of weight at most \(3\text{OPT}_2\).
Hint 1: LP rounding/Primal Dual.
Solution: We first write down the LP (linear program) corresponding to the relaxation of the complement problem. Letting \(x_e\) denote the variable representing whether edge \(e\) is in our subgraph or not, we have the following LP relaxation:
\(\begin{aligned}
\text{minimize} \quad & \sum_{e \in E} w_e x_e \\
\text{subject to} \quad & x_{uv} + x_{vw} + x_{wu} \geq 1 \quad \text{for all triangles $uvw$ in $G$} \\
& x_e \in \{0, 1\} \quad \text{for all $e \in E$}.
\end{aligned}\)
This is actually an instance of the [set cover](https://en.wikipedia.org/wiki/Set_cover_problem) problem, which has an \(f\)-approximation rounding/Primal Dual algorithm (where \(f\) is the maximum number of sets containing an element of the universe - here, the elements are the triangles of the graph \(G\) and the sets are the edges; each edge covers triangles it is a part of and each triangle is covered by precisely 3 edges).
The rounding algorithm is easy; we solve the LP relaxation optimally, and then round every \(x_e\) with \(x_e^* \geq 1/3\) (\(*\) denoting the optimal solution) to \(1\), and the rest to \(0\). The resulting solution is integral, feasible for the integer linear program, and has scaled up the objective by at most a factor of \(3\).
The Primal Dual algorithm is also fairly simple, but we do not go into that here, at the moment.
We are ready for the first approximation algorithm for the original problem.
Let \(\text{OPT}_1\) denote the weight of the maximum weight triangle-free subgraph of \(G\). Describe an algorithm that computes a subgraph of weight at least \(\frac{3}{5}\text{OPT}_1\).
Hint 1: Leverage both algorithms we have already seen. How do you leverage two algorithms to get the best of both worlds?
Hint 2: Suppose the optimal value \(\text{OPT}_1 \leq \frac{5W}{6}\). Are we done? If this is not the case, does the algorithm we have for the complement problem help us?
Solution: Simply run both algorithms (of course, after running the second algorithm and getting a subgraph \(H'\) we must complement it in \(G\) to get a triangle-free subgraph), and take the larger of the two solutions. That way we are able to avoid the worst-case scenarios of each algorithm individually, if the worst-case scenarios are different for the two (they happen to be beautifully complementary in this case). Let \(h_1\) be the weight of the graph returned by the first algorithm, \(h_2\) by the second, and \(\mathcal{A} = \max(h_1, h_2)\) is the value returned by our final algorithm.
We have:
\[\begin{aligned}h_1 &\geq \frac{W}{2} = \frac{3}{5}\cdot\frac{5W}{6},\\
h_2 &\geq W - 3\text{OPT}_2 = W - 3(W - \text{OPT}_1) = 3\text{OPT}_1 - 2W.\end{aligned}\]
If the optimal value \(\text{OPT}_1\) is at most \(\frac{5W}{6}\), then indeed, \(\mathcal{A} \geq h_1 \geq \frac{3}{5}\text{OPT}_1\). If this is not the case, then \(\text{OPT}_1 > \frac{5W}{6}\), and
\[\mathcal{A} \geq h_2 \geq 3\text{OPT}_1 - 2W \geq 3\text{OPT}_1 - 2\left(\frac{6\text{OPT}_1}{5}\right) = \frac{3}{5}\text{OPT}_1.\]
Incredible, right?
It is also easy to see that bettering each of the two algorithms individually could improve the approximation factor of the final algorithm. The first algorithm is actually the best we can hope for; the max weight of a bipartite subgraph of the complete uniform-weight graph on \(n\) vertices is \(W\left(\frac{1}{2} + o_n(1)\right)\). The second algorithm, on the other hand, can be improved (thanks not to the fact that it is a set-cover; for the set cover that is the best we can hope for with that formulation, since the integrality gap is \(3\)) to an approximation factor of \(2\)!
The factor \(2\) approximation then establishes a \(2/3\)-approximation for our final algorithm, by completely analogous reasoning as above. It remains to show the \(2\)-approximation for the second algorithm.
Describe an algorithm that computes a subgraph of weight at most \(2\text{OPT}_2\).
Hint 1: Induct on the number of edges of \(G\). You wish for some \(x_e\) to be at least \(1/2\) so you can round it up to \(1\) and only scale the objective by a factor of \(2\). If there is an edge \(e\) with \(x_e^* = 0\), use induction to reduce the problem to a smaller graph. You might need to have to solve the LP optimally each time down this iterative reduction, this is okay.
Hint 2: The bad case is then when every \(x_e\) is positive. Complementary slackness comes to the rescue. Compute the exact dual optimal solution. Is this small enough that a \(2\)-approximation to this is quite easy to find?
Solution: We induct on the number of edges in the graph.
(the reader is invited to finish the solution until the author gets around to it 🫣)