Safe Withdrawal Rates - Part 2: Do we have enough data?
Last time, we reviewed the definition of safe withdrawal rates and explained the origins of the popular 4% rule. The idea was to estimate how much we could safely withdraw post-retirement by simulating how retirees from 1926 through 2015 would do. For each possible year in that range and possible retirement length, we checked whether a retiree who started then and withdrew a given amount each month would run out of money. As expected, the longer the desired duration of the retirement, the less we could safely withdraw each year. This is especially important for people who want to retire early, because many studies of safe withdrawal rates focus on “normal” retirement lengths of at most 30 years.
When we consider longer retirement lengths, a serious question is whether we have enough data to make a safe conclusion. For example, someone retiring early at age 35 who wants to ensure they have enough money till 95 is interested in a 60 year safe withdrawal rate! But if we’re using stock data from 1927 to 2015, we have less than 90 years of data. Moreover, if we follow the Trinity study methodology of examining the outcome of each possible starting year, the last cohort we can consider is 1955 – for every starting year after that, we need data beyond 2015 to know whether they would have made it. (Of course, if they’ve already run out of money, we can know that would have been a failure. But we can’t tell how the remaining cases will turn out. I do not count such failures because it appears that the Trinity study did not either.)
So what can we do? One possibility is to try to extend the data. I’m writing this in 2019, so we could update our ending date data by a few more years. From the other direction, some people have figured out estimates of stock and government bond returns from all the way back to 1871. That’s a lot of additional data! (Although one might wonder how relevant such old estimates are for figuring out how the market behaves today.)
While trying to get more data is a good idea, in this post I want to look into another matter: can we estimate how much a lack of data may be hurting us? And, can we extract even more insight from the data we do have?
What if the Trinity Study had been done in 1980?
Let’s start with a simple thought experiment. The Trinity study was initially done the late 1990s. But what if someone had tried to do the same study in 1980? Sufficiently powerful computers were already common enough at universities that this would have been straightforward to do. But they would have had even less data than we do now, so examining what they would have found can shed some light on what we might be missing now.
The table below shows what the minimum computed safe withdrawal rate would have been for different asset allocations if the Trinity study had been conducted using data up through the listed year. The numbers here show what is the largest withdrawal rate that had no failures using the given asset allocation and duration.
Minimum Safe Withdrawal Rate | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1980 | 1990 | 2000 | 2015 | ||||||||||
0% stocks | 15 years | 5.16% | 5.16% | 5.16% | 5.16% | ||||||||
30 years | 2.51% | 2.51% | 2.51% | 2.51% | |||||||||
45 years | 2.21% | 1.56% | 1.56% | 1.56% | |||||||||
25% stocks | 15 years | 5.85% | 5.55% | 5.55% | 5.55% | ||||||||
30 years | 3.68% | 3.68% | 3.30% | 3.30% | |||||||||
45 years | 3.51% | 2.60% | 2.60% | 2.60% | |||||||||
50% stocks | 15 years | 6.09% | 5.53% | 5.53% | 5.53% | ||||||||
30 years | 4.36% | 4.36% | 3.55% | 3.55% | |||||||||
45 years | 4.82% | 3.52% | 3.52% | 3.20% | |||||||||
75% stocks | 15 years | 6.22% | 5.46% | 5.46% | 5.46% | ||||||||
30 years | 4.71% | 4.71% | 3.72% | 3.72% | |||||||||
45 years | 4.40% | 4.12% | 4.12% | 3.40% | |||||||||
100% stocks | 15 years | 5.37% | 5.32% | 5.32% | 5.32% | ||||||||
30 years | 3.84% | 3.84% | 3.83% | 3.83% | |||||||||
45 years | 3.63% | 3.63% | 3.63% | 3.54% |
The first thing to observe is that as we have more data to the right, the SWR estimate tends to decrease. That’s to be expected, since as we examine more cohorts, there are more potential years in which a given withdrawal rate would have failed. However, for very short retirement durations (15 year), the estimate produced in 1990 is already as low as we get with the subsequent years. That makes sense, since we already have quite a bit of data relative to the length of the retirement.
The picture is quite different for 30 year and 45 year durations, for even a moderate amount of stocks. For those durations, we see that the SWR computed in 2000 is regularly lower than the one in 1990.
Often in debates about conservative withdrawal rates, you will see someone write that a 3.5% withdrawal rate with 75% stocks has never failed for a 30 year retirement. The problem is that in 1990, they could have said the same thing about a 4.7% withdrawal rate! That would have turned out to be seriously wrong, as revealed by more data. So how confident can we be that in 2035, a 3.5% withdrawal rate won’t look similarly misguided?
Is there anything we can do to improve our estimates today?
Bootstrapping
To try to understand the amount of uncertainty involved in the Trinity study methodology, we’re going to employ a process called bootstrapping. The idea is to simulate many years of historical returns, run the Trinity study procedure over each of these sequences, and see what estimates of a minimal safe withdrawal rate we would have gotten.
How do we generate simulated returns? Our simulated returns will be divided up into smaller blocks. We’ll start by picking a random year and month in our data set, looking up its inflation-adjusted returns, and putting that in the first block. Let’s say we started the block with February, 1963. Next, with some probability we either:
-
Extend our current block by including the following month. In our example, that would be March, 1963. Or,
-
We start a new block with a randomly selected year and month.
As a blocks get longer, we increase the probability of the second option. In the simulations I’ve run, I’ve adjusted these probabilities so that the average size of a block is about 5 years. We keep constructing blocks until the sum of their lengths is the same as our original data. This is a special kind of bootstrapping called stationary bootstrapping. It’s useful here because it accounts for the possibility of some cyclic/month-to-month correlations.
After building such a series of returns, we compute the safe withdrawal rate for each possible starting “year” in our new data set, and return the minimum of each. This tells us what withdrawal rate would have succeeded for every possible starting year in the data set. By repeating this process of generating new data and seeing what the Trinity study would have said with that data, we estimate how much the “no failure” withdrawal rate varies.
We can go one step further and do this while only sampling months before a particular year. For example, if we only draw months from 1926-1980 and construct a sequence that’s 60 years long, that tells us what someone doing the bootstrap at the end of 1980 would have been able to determine.
The table below shows what we would have concluded doing the bootstrap at different historical points. The entries in the table show the 5th percentile of computed “no failure” withdrawal rates, across 1000 separate bootstraps:
5th Percentile for Estimated SWR | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1980 | 1990 | 2000 | 2015 | ||||||||||
0% stocks | 15 years | 4.16% | 4.04% | 4.17% | 4.23% | ||||||||
30 years | 1.92% | 1.95% | 1.93% | 2.08% | |||||||||
45 years | 1.27% | 1.32% | 1.35% | 1.42% | |||||||||
25% stocks | 15 years | 4.66% | 4.58% | 4.53% | 4.72% | ||||||||
30 years | 2.43% | 2.42% | 2.48% | 2.51% | |||||||||
45 years | 1.89% | 1.81% | 1.88% | 1.95% | |||||||||
50% stocks | 15 years | 4.53% | 4.60% | 4.67% | 4.66% | ||||||||
30 years | 2.64% | 2.60% | 2.58% | 2.62% | |||||||||
45 years | 2.11% | 2.23% | 2.13% | 2.24% | |||||||||
75% stocks | 15 years | 4.01% | 4.04% | 4.14% | 3.85% | ||||||||
30 years | 2.29% | 2.46% | 2.46% | 2.37% | |||||||||
45 years | 2.27% | 2.00% | 2.18% | 1.96% | |||||||||
100% stocks | 15 years | 2.97% | 2.97% | 3.03% | 2.85% | ||||||||
30 years | 1.85% | 1.88% | 1.92% | 1.78% | |||||||||
45 years | 1.87% | 1.85% | 1.83% | 1.43% |
To understand this table, let’s look at the 2015, 100% stocks, 45 years entry, which is 1.43%. That means that in about 50 out of 1000 of our bootstrap trials, a withdrawal rate above 1.43% with those parameters would have failed in one of the starting years in that trial’s data set.
Four big takeaways from comparing the two tables:
-
The bootstrap rates are overwhelmingly lower than the ones from the first table, which was just generated from a single historical sequence.
-
The first table showed that for 45 year duration, the safe withdrawal rate always increased with more equities. This is not so in the bootstrap table: the 50% stock over 45 years option has an appreciably higher rate than 100% stock for the same duration. This is particularly interesting since many argue that you “need” very high equity allocation for long durations.
-
The bootstrapped rates from 1980 are all strictly lower than the 2015 non-bootstrapped results.
-
The bootstrap estimates from 1980 and 2015 differ by very small amounts, especially compared to the differences between 1980 and 2015 results from the first table. This is reassuring, since it means the analysis is a bit more robust.
So, does that mean we need to use 2% withdrawal rate to be safe for longer durations? Probably not! The numbers above are likely overly conservative. Remember, you should not interpret them as saying that there’s a 5% failure rate with withdrawal rates above those in the table, just that in 5% of some “simulated histories” there would have been at least one failure. Even under all of the assumptions of the model, the suggested failure rate for a single retirement trial is likely to be quite a bit lower, as we’ll see in later posts.
However, when someone says “X% has never failed before, therefore it is safe”, you ought to be very skeptical. The analysis above shows that in a plausible alternative world, only a much, much lower number would have “never failed”.