RSS

Category Archives: Uncategorized

Fooled by Randomness

A few months ago I wrote about Stock Market Timing. It was an analysis of the stock market returns over the last 40+ years which yielded an observation about certain months having better returns than others, and a small number of months (two) actually showing negative average returns. The gist of the analysis was that if someone traded in a way to be out of the market those two negative months and in the market the rest of the year, they would have outperformed the market (i.e. the buy & hold approach).

The conjecture was that there might be underlying seasonal factors causing this difference in performance. If true, then a trading strategy should be able to exploit this pattern and come out ahead. One would not have to understand the causal nature of there factors in order to exploit them. But success would depend on the existence of such factors and their causal relationship holding for the future as well.

In the time since I have read two books on randomness:

  • The Drunkard’s Walk How Randomness Rules our Lives by Leonard Mlodinov
  • Fooled by Randomness The Hidden Role of Chance in Life and in the Markets by Nassim Taleb

We are wired to seek patterns and meaning in the chaos of the world, even when they may not exist. [Nassim Taleb]

Listening to these audiobooks made me think again about the validity of the conclusions in the above post. While the postulated “summer break” trading approach clearly would have been successful for the last 40 years based on the observed data, the question stands whether it would continue to be successful in the future? This depends essentially on the question whether there are underlying factors with seasonal variation causing the pattern or whether the observed pattern is purely the result of randomness. Only in the former case (causal link) will this trading approach be likely to be successful.

I was also reminded of the insight that we often fall prey to confirmation bias: We seek out information confirming while failling to seek out evidence contradicting our thesis or beliefs. Somehow it feels nicer to confirm ones beliefs than to find reasons they are flawed. But after learning more about randomness, I started to have doubts about my earlier conclusion. I asked myself: What evidence would it take to falsify the conclusion of the superiority of the “summer break” trading strategy?

An easy way to find out what could happen is to use computer simulation. I thought if I ran a number of simulated stock market returns, what would their monthly returns look like? If typical random timeseries would show similar variations in monthly returns, that would make the observed market returns look more like random noise than a signal of underlying factors.

Enter stock market simulations. Many articles have likened stock market charts to random walks, typically using Markov Chains to simulate the timeseries of prices with random price swings up and down. For practical reasons, I chose to build on the model provided by Jason Cawley in a Wolfram demonstration project. From that site:

A decent first approximation of real market price activity is a lognormal random walk. But with a fixed volatility parameter, such models miss several stylized facts about real financial markets. Allowing the volatility to change through time according to a simple Markov chain provides a much closer approximation to real markets. Here the Markov chain has just two possible states: normal or elevated volatility. Either state tends to persist, with a small chance of transitioning to the opposite state at each time-step.

The model used above includes a handful of parameters such as an overall drift or the probability, strength and duration of a volatility period (called spike). Each scenario would have 20 runs over 250 steps (approximate number of trading days in a calendar year) and then visualize the trajectories of initial prices set to 100 for the 10, 30, 50, 70 and 90 percentile (when sorted by final valuation of the simulated stock). A typical image of such a scenario looks like this:

Changing the parameters will change the ampltiude of these charts, but their general shape remains similar. The demonstration site referenced above is interactive, i.e. you can change the parameter values and see in realtime how the curves adjust. This gives a much better feel for the type of shapes the model produces. They certainly look similar to real stock price charts. For example, when seeing a sample set of 10 timeseries, 5 of which model-generated and 5 real stock charts, it would be very difficult to tell them apart reliably.

To approximate the observed stock market timeseries over the last 40+ years, I made a few assumptions:

  • Assume each month has 21 trading days, hence a year has 12 * 21 = 252 trading days.
  • Set timeseries to 40 years * 252 trading days = 10,080 datapoints.

With the default modeling parameters picked by Jason Cawley, I ran dozens of simulations at 10,080 datapoints each. I then aggregated the monthly returns over those 40 simulated years to see what kind of variation would be observed per month. Here are some typical results:

These are qualitatively very similar to the actually observed monthly returns of the DJI over the last 40 years:

  • The majority of months are up with average returns between 0 and +2%
  • A few months in the year are down with average returns between 0 and -2%
  • The negative months are randomly distributed across the year

When we know that the data generator uses randomness, we are not surprised to see such variation and we don’t try to find underlying reasons which caused a particular generated pattern. With the actually observed stock market returns it’s an easy mistake to make.

Our desire for explanations often leads us to invent narratives that fit our preconceived notions, rather than accepting the randomness of events. [Nassim Taleb]

Varying the model parameters doesn’t change the outcome qualitatively. If the base trend is increased, as expected there are fewer or often no more months with negative average returns. If volatility is increased, the amplitude of average returns increases in both directions. There are further explorations one could make: Aggregating multiple individual runs into an aggregate index more closely ressembling the DJI (set of 30) or S&P 500 (set of 500). Yet the simulation experiment gave me the evidence that falsifies my earlier conclusion. At this point, I am ready to retract the conclusion postulated in the earlier post: The observed pattern is more likely just a random artefact and there is no underlying reason causing the fluctuations in average monthly returns. As such, the “summer break” trading strategy is no more likely to be successful in the future than any number of other strategies motivated by and retrofitted to randomly generated patterns.

The seduction of stories blinds us to the reality of randomness. [Nassim Taleb]

Occam’s razor, or the principle of parsimony, tells us that the simplest, most elegant explanation is usually the one closest to the truth. Randomness is simpler than underlying causal factors.

Hitchen’s razor states what can be asserted without evidence can also be dismissed without evidence. When it comes to data analytics, simulations based on randomness can provide valuable comparisons: If the observed data does not systematically and reliably differ from random noise, there is likely no signal! The Monte Carlo method should be a good friend of any data analyst!

I admit that I was wrong. I was fooled by randomness. This episode taught me to work against my own confirmation bias, as well as to use simulations based on randomness to see if it can provide counterfactual evidence and help distinguish between signal and noise.

 
1 Comment

Posted by on May 21, 2024 in Uncategorized

 

Coronavirus Cases at end of March

Coronavirus Cases at end of March

Just 2 weeks ago I looked at how the previous 4 weeks from mid-February to mid-March had qualitatively changed the situation for many countries in this pandemic. The pace of change is not letting up. Here I review my predictions from 2 weeks ago and summarize the weekly changes since then.

Predictions made 2 weeks ago on 3/17:

Numbers for the US:

  • 3/22: 28,000 (forecast), 33,276 (actual)
  • 3/29: 221,000 (forecast), 140,886 (actual)

The confirmed cases grew faster initially, then somewhat slower than the exponential best fit trend-line had forecast. It has been observed that such growth by contagion follows a Power-Law distribution (see ZDNet article), which resembles exponential growth initially, but then grows somewhat slower (linear line in log-log plot) compared to exponential growth (linear line in a log-linear plot).

Ranks:

  • 3/18: US will pass China in active cases, which will be 7th behind France and US.
  • 3/22: US will be top-ranked in new cases.
  • 3/31: US will be top-ranked in active cases.

All three predictions came to pass, the last one a few days sooner already. By now, the US has far and away the most active cases (100,000 more than Italy at rank 2), the most new cases (3x that of Spain at rank 2) and nearly twice as many confirmed cases as Italy at rank 2.

COVID-19 Cases Top 10 Mar 31

Covid-19 Case counts as of Mar-31. Source: worldometers.info/coronavirus

Qualitative scenario:

  • Until early March this pandemic was a China story.
  • In mid March the pandemic is a Europe story.
  • By end of March this will be a US story.

Here is how the percentage of confirmed cases has evolved throughout March:

CasesMarchByContinent

Stacked Area Chart of confirmed Covid-19 cases by continent. Data source Johns Hopkins CSSE.

At the beginning of March (left side of chart), Asia clearly dominated the case numbers, with China about 90% and South Korea about 4% of all cases. Italy had 2%, Iran 1% and France and Germany only 0.1% of all cases. While China already had ~80,000 confirmed cases, the US had only 74 confirmed cases just a month ago!

By mid-March (middle of chart), Europe had grown to 1/3 of all confirmed cases, with Asia / China accounting for most of the other 2/3 of cases. The US had about 3,500 confirmed cases, or 2% of the total (166,700) at the time. With active cases, Europe was starting to dominate the picture, with Italy (~20,600) having nearly twice as many active cases as China (~10,800). China was reporting very few new cases and lots of recovered. Italy and soon Spain were reporting ever-growing numbers of daily new cases, with slow growth on the recovered side (and sadly strong growth of deaths).

No at the end of March (right side of chart), the US is clearly the country with single-biggest confirmed and active case numbers. The Americas now accounts for ~25% of all cases, proportion growing. Asia / China’s portion has diminished to ~20% of all cases. And Europe has crested its highest percent (53.9% on 3/28) and is slowly reducing its proportion of worldwide confirmed cases.

Here is the distribution of all 189,510 confirmed cases in the US:

USCasesMar31

Confirmed Covid-19 cases in the US as of Mar-31. Source: Johns Hopkins dashboard.

The dashboard of Johns Hopkins University allows for the US to drill from country to state to county level. This is helpful in understanding where clusters of confirmed cases are. One can clearly see the large metropolitan areas with more and larger dots than in rural areas such as in the Western half of the continent.

Like many analysts, I have created my own dashboard based on the daily refreshed datasets from JHU GitHub. This has been an interesting exercise in many ways, partly due to the fast changing but freely and freshly available data, but also due to other examples of widely shared charts on social media.

One example of a new chart I haven’t seen elsewhere is a scatter plot of all countries with > 1,000 confirmed cases on a timeline through March.

ScatterChartCFR

CFR trajectory of Countries with >1,000 Confirmed cases in March; Size = # Deaths; Color = GDP per capita range.

This shows Spain and Italy being located in high single digit Case Fatality Rates (CFR) at the end of March. Italy’s and the US trajectory are highlighted. Italy’s CFR shot up and exceeded 10% – often attributed to the strain in their overwhelmed medical system. It’s also a less affluent country on the whole, but the hardest hit region of Lombardy is one of the richest in Italy, so it can’t be mainly attributed to an underfunded healthcare system. By contrast, the US CFR trajectory has stayed low throughout March and reached only about 2%.

As we are heading into April, it remains to be seen how well all these countries can flatten their curves, reduce the peak of confirmed / active cases and ultimately get through the pandemic with a minimum of deaths. No forecast tonight, but more analysis ahead in the weeks to come!

 

 
Leave a comment

Posted by on April 1, 2020 in Uncategorized