visualign

RSS

Fooled by Randomness

A few months ago I wrote about Stock Market Timing. It was an analysis of the stock market returns over the last 40+ years which yielded an observation about certain months having better returns than others, and a small number of months (two) actually showing negative average returns. The gist of the analysis was that if someone traded in a way to be out of the market those two negative months and in the market the rest of the year, they would have outperformed the market (i.e. the buy & hold approach).

The conjecture was that there might be underlying seasonal factors causing this difference in performance. If true, then a trading strategy should be able to exploit this pattern and come out ahead. One would not have to understand the causal nature of there factors in order to exploit them. But success would depend on the existence of such factors and their causal relationship holding for the future as well.

In the time since I have read two books on randomness:

The Drunkard’s Walk How Randomness Rules our Lives by Leonard Mlodinov
Fooled by Randomness The Hidden Role of Chance in Life and in the Markets by Nassim Taleb

We are wired to seek patterns and meaning in the chaos of the world, even when they may not exist. [Nassim Taleb]

Listening to these audiobooks made me think again about the validity of the conclusions in the above post. While the postulated “summer break” trading approach clearly would have been successful for the last 40 years based on the observed data, the question stands whether it would continue to be successful in the future? This depends essentially on the question whether there are underlying factors with seasonal variation causing the pattern or whether the observed pattern is purely the result of randomness. Only in the former case (causal link) will this trading approach be likely to be successful.

I was also reminded of the insight that we often fall prey to confirmation bias: We seek out information confirming while failling to seek out evidence contradicting our thesis or beliefs. Somehow it feels nicer to confirm ones beliefs than to find reasons they are flawed. But after learning more about randomness, I started to have doubts about my earlier conclusion. I asked myself: What evidence would it take to falsify the conclusion of the superiority of the “summer break” trading strategy?

An easy way to find out what could happen is to use computer simulation. I thought if I ran a number of simulated stock market returns, what would their monthly returns look like? If typical random timeseries would show similar variations in monthly returns, that would make the observed market returns look more like random noise than a signal of underlying factors.

Enter stock market simulations. Many articles have likened stock market charts to random walks, typically using Markov Chains to simulate the timeseries of prices with random price swings up and down. For practical reasons, I chose to build on the model provided by Jason Cawley in a Wolfram demonstration project. From that site:

A decent first approximation of real market price activity is a lognormal random walk. But with a fixed volatility parameter, such models miss several stylized facts about real financial markets. Allowing the volatility to change through time according to a simple Markov chain provides a much closer approximation to real markets. Here the Markov chain has just two possible states: normal or elevated volatility. Either state tends to persist, with a small chance of transitioning to the opposite state at each time-step.

The model used above includes a handful of parameters such as an overall drift or the probability, strength and duration of a volatility period (called spike). Each scenario would have 20 runs over 250 steps (approximate number of trading days in a calendar year) and then visualize the trajectories of initial prices set to 100 for the 10, 30, 50, 70 and 90 percentile (when sorted by final valuation of the simulated stock). A typical image of such a scenario looks like this:

Changing the parameters will change the ampltiude of these charts, but their general shape remains similar. The demonstration site referenced above is interactive, i.e. you can change the parameter values and see in realtime how the curves adjust. This gives a much better feel for the type of shapes the model produces. They certainly look similar to real stock price charts. For example, when seeing a sample set of 10 timeseries, 5 of which model-generated and 5 real stock charts, it would be very difficult to tell them apart reliably.

To approximate the observed stock market timeseries over the last 40+ years, I made a few assumptions:

Assume each month has 21 trading days, hence a year has 12 * 21 = 252 trading days.
Set timeseries to 40 years * 252 trading days = 10,080 datapoints.

With the default modeling parameters picked by Jason Cawley, I ran dozens of simulations at 10,080 datapoints each. I then aggregated the monthly returns over those 40 simulated years to see what kind of variation would be observed per month. Here are some typical results:

These are qualitatively very similar to the actually observed monthly returns of the DJI over the last 40 years:

The majority of months are up with average returns between 0 and +2%
A few months in the year are down with average returns between 0 and -2%
The negative months are randomly distributed across the year

When we know that the data generator uses randomness, we are not surprised to see such variation and we don’t try to find underlying reasons which caused a particular generated pattern. With the actually observed stock market returns it’s an easy mistake to make.

Our desire for explanations often leads us to invent narratives that fit our preconceived notions, rather than accepting the randomness of events. [Nassim Taleb]

Varying the model parameters doesn’t change the outcome qualitatively. If the base trend is increased, as expected there are fewer or often no more months with negative average returns. If volatility is increased, the amplitude of average returns increases in both directions. There are further explorations one could make: Aggregating multiple individual runs into an aggregate index more closely ressembling the DJI (set of 30) or S&P 500 (set of 500). Yet the simulation experiment gave me the evidence that falsifies my earlier conclusion. At this point, I am ready to retract the conclusion postulated in the earlier post: The observed pattern is more likely just a random artefact and there is no underlying reason causing the fluctuations in average monthly returns. As such, the “summer break” trading strategy is no more likely to be successful in the future than any number of other strategies motivated by and retrofitted to randomly generated patterns.

The seduction of stories blinds us to the reality of randomness. [Nassim Taleb]

Occam’s razor, or the principle of parsimony, tells us that the simplest, most elegant explanation is usually the one closest to the truth. Randomness is simpler than underlying causal factors.

Hitchen’s razor states what can be asserted without evidence can also be dismissed without evidence. When it comes to data analytics, simulations based on randomness can provide valuable comparisons: If the observed data does not systematically and reliably differ from random noise, there is likely no signal! The Monte Carlo method should be a good friend of any data analyst!

I admit that I was wrong. I was fooled by randomness. This episode taught me to work against my own confirmation bias, as well as to use simulations based on randomness to see if it can provide counterfactual evidence and help distinguish between signal and noise.

1 Comment

Posted by visualign on May 21, 2024 in Uncategorized

Global Innovation Index

18 Apr

I’ve long held a particular fascination for data visualizations which aggregate a large amount of information. Especially if that information allows to compare fairly abstract socio-economic concepts like wealth, health, or even happiness. In many cases, such information is aggregated into an index, an abstract number which is mostly used for comparative rankings. One example we examined previously is the Human Development Index.

At the same time, I was often underwhelmed by how such information is presented: Often reading the report feels like studying the phonebook with endless text and tables, and the lack of visualizations or an interactive website made gaining insight from such data very difficult.

A positive surprise in this regard is the Global Innovation Index 2023, published by the WIPO (World Intellectual Property Organization). It is freely available as both a 250 page pdf file as well as an interactive ranking website.

The full report can be downloaded at www.wipo.int/global_innovation_index.
The 132 interactive GII economy briefs can be accessed at www.wipo.int/gii-ranking.

This post delves into some of the visualizations used in the above publications.

What is the Global Innovation Index (GII)?

Innovation is a crucial source of economic growth and thus ultimately for the improvement of human quality of life. The Global Innovation Index (GII) is an aggregate measure relating the world’s countries with regards to their abilities for and success with innovation. From the Wikipedia page on the methodology:

The index is computed by taking a simple average of the scores in two sub-indices, the Innovation Input Index and Innovation Output Index, which are composed of five and two pillars respectively. Each of these pillars describe an attribute of innovation, and comprise up to five indicators, and their score is calculated by the weighted average method.

In the 2023 edition, some 132 countries are ranked by aggregating data from 80 factors in the above 7 dimensions into the GII.

Interesting Visualizations of GII

The report contains a lot of data. Assessing the 80 factors for each of the 132 countries gives more than 10,500 datapoints. Clearly one needs to use visualizations to show patterns and to ultimately gain insight from all this data. Thankfully, this publication includes an interactive website where you can select a country and then see its position in the visuals and drill into more details. Let’s look at some interesting visualizations contained in the report.

Innovation relative to income level

A core visual illustrating the GII ranking space is a scatter plot showing countries position with income on the x-axis (in GPD per capita, log-scale) and the GII Score on the y-axis. Size of a dot is a function of the country population.

There is a cubic spline interpolation for the median line. It shows two bends, separating a middle regime of faster growth of innovation per income increase than the regimes below and above. This suggests that on average, as a country reaches the threshold of the first bend around $20,000 GDP / capita such as Mexico ($22,440 GDP/c; GII score 31), one can expect a rapid increase in innovation score for income gains. For most countries this increase rate in the GII score continues all the way up to the richest countries such as the .

Innovation overperformers

One can see that the two most populous countries, India (GII 38.1, rank 40, GDP/capita $8,293) and China (GII 55.3, rank 12, GDP/capita $21,291) are far above the median line. Their GII score (innovation capability) is far higher than what would be expected on average from their income level. They are both innovation overperformers.

The colors above indicate a four-level stratification in innovation leaders and those performing above, at, or below expectation for the level of income. The color scheme is consistently used in other charts as well.

It makes sense to group countries by the level of average income and thus development and to look for the leaders within each group. One such comparison is given in this table:

Note both India and China leading their resp. income quartile group. When highlighting all innovation overperformers in the low-income and lower middle-income economies, we see the following:

I didn’t expect to see Burundi, Rwanda or the Republic of Moldova as innovation overperformers (relative to their peer income group). This is one example of where one can use the interactive site to further investigate and look at which factors they excel in.

Innovation efficiency

As there are input pillars and output pillars, one can scatter plot the countries to show how their input and output scores correlate, i.e. how efficient they are in converting inputs into outputs when it comes to innovation.

It’s interesting to see that Switzerland and Singapore, both in the top-right corner of the above chart, differ in this regard. Switzerland achieves the highest output score overall, despite its input score being slightly lower than that of Singapore. Something is getting lost in Singapore‘s case, as their input is among the highest of all countries, but their output is lower than the Top 10.

Ranking Heatmap

Another intermediate visual combining the large detail of a text table with heatmap colors is the summary ranking table with 132 lines and 8 columns (rank in GII total and each of the seven pillars).

The four colors are used again to show the four quartiles of 33 countries each, sorted by total rank. Not surprisingly, the top and bottom quartiles show a lot of dark green and dark blue, respectively. I.e. many countries who end up in the top quartile total GII rank also have pillar ranks in the top quartile. The second and third quartile show more color variety and make occasional outliers easily visible. Some countries would generally be higher but for one or two low outliers or vice versa. For example, Bolivia (total rank 97) has four (of seven) dark blue lowest quartiles pillar ranks, with one of them (Institutions) being rank 132 the lowest of all countries, but it has one dark green positive outlier in Market Sophistication (rank 16), right after Germany (14) and the Netherlands (15)! Looking at the onepager reveals the details for Bolivia in both pillars:

So even a country which is ranked dead-last on one pillar (Institutions) can be ranked first on a factor (4.1.3 Loans from microfinance institutions, % GDP) in another pillar (Market Sophistication)! While you may not be able to count on the rule of law in Bolivia, you’re likely to find good microfinance loans there.

One caveat here are the n/a areas where not enough data is available. Many other countries I checked for did not have any data on the above factor (4.1.3), so top rank may be relative in this context.

Compare Ranks

On the website you can select two countries and compare their ranks overall and drill-down by pillar and factors. Here is an example comparison between Mexico and the United States:

Although not too surprising (overall ranks Mexico 58 vs. USA 3), it’s interesting to see the degree of difference – say in market sophistication (ranks 57 vs. 1) or business sophistication (ranks 79 vs. 2). There are also some factors where the ranking is inverse to the overall picture (such as applied tariff rate, rank 13 vs. 49). In other words, even large differences in overall rank don’t imply that the higher ranked country is better in all factors contained in the GII.

This treeview chart also allows to explore the structure of the GII interactively. For each country, you can pull up a one-page summary showing all of those 80 factors and their respective values and subranks.

The printed report has 1 such summary page for each of the 132 countries – this is the section which reads like a phonebook. My only slight change request here would be to include the GII total score (not just the rank) on this page as well. The only way to see the GII score I found on the website was to use the tooltips on the ranking plot, such as here showing Mexico’s rank (58) and hovering over the USA (3) to see the scores (Mexico GII = 31.0, USA GII = 63.5).

One can also see a “bend” in the ranking plot where the absolute scores for the top half of all countries increase faster than for the bottom half.

Stratification by geography and by income

A common problem with data visualizations is information overload. Often less is more. Stratifying by income group and then just listing the top 3 gives an idea of which countries are leading in their region or peer group. Note again the leading ranks of both China and India.

GII Dynamo

One important aspect of annual rankings is the year-over-year change and thus the rank dynamics over time. In this diagram, the ranks (left-to-right) are plotted for the last 5 years (bottom-up). This makes lateral movements easy to spot. For example, the Republic of Korea made strong gains in 2021 (from rank 10 up to 5), but lost those again in 2023. Generally these rankings are fairly stable as can be seen by minor changes in the top 5 ranks. And Switzerland is the reigning champion in the GII rank (not just for the last 5, but for the last 13 years)!

I also find the chosen color map easy to understand and pleasing to the eye (although I would have swapped colors 7 and 9 to keep with the brightness gradient).

Summary

The Global Innovation Index publication provides a detailed analysis of seven pillars and 80 factors contributing to the index score for 132 countries. The presentation uses lots of charts and visualizations, as well as comprehensive one-page summary tables for each country.

There are many more aspects and factors than those discussed here, for example the number and value of unicorn companies, VC funding, listing of national companies and universities etc. There are even a few areas where the publication veers into policy guidance (how to best use the GII index with dos and don’ts) or geopolitical interpretations of the insights.

From my perspective it is a great example of modern data-driven publication, aggregating a broad set of curated data with source references into easy-to-understand charts and most importantly providing an interactive website for selection, pairwise comparison and drill-down into details. It is great to see such publications becoming part of a more data-driven discussion of national policy and even geopolitical trends.

Lastly, I can’t wait to see such aggregation being used more systematically for rankings in other areas where comparing is difficult to the broad range of factors, such as with consumer preferences, employee performance or company process maturity. Imagine an analyst firm ranking of sector companies in financial services where the top ranked companies could be analyzed in a similar way to the GII. That would be truly innovative.

Leave a comment

Posted by visualign on April 18, 2024 in Socioeconomic

Tags: data visualization, economics, index

On Stock Market Timing

14 Nov

Note: A few months after writing this post, I realized that I made a mistake: I was Fooled by Randomness. The analysis below is still interesting and parts of it are still correct, but the main conclusion was wrong.

Tracking the Market

There is widespread consensus that investing money in the stock market, despite its ever-present risks of losses, over the long run beats many other forms of investments like bonds or treasuries. To mitigate the risk of one or a few individual stocks falling, the general advice is to diversify your portfolio and spread investment across many different companies and sectors. The quintessential diversification is to invest in index funds which track a large basket of securities such as the S&P 500 or the Dow Jones Industrial Average (30 stocks). Investing in index funds is a form of tracking the market.

Historical data shows annual returns in the order of 11.8% for the S&P 500 over the last 60 years. However, there are big swings as can be seen in the following chart (Source article on Investopedia):

Moreover, there seems to be a consensus that one of the best strategies is simply to “Buy & Hold”, i.e. not to attempt any forms of timing the market. It certainly is easy to implement and there are plenty of data sources to track historical performance. If you have a long-term time horizon (i.e. many years or even decades), it can be a very good approach to just set & forget. But from the historical data, can one derive strategies which systematically do better than Buy & Hold? Let’s first look at the conventional wisdom (which says you can’t time the market), then let’s look at the data (which suggests that you can).

Timing the Market

Nobody can predict the market, certainly not on a daily basis. Since the general trend is upwards, the main argument against timing is the risk of missing out on the best days (which can not be predicted and can happen at any time). For example, there was a recent article in VisualCapitalist with this line of reasoning, expressing visually how much one would fall short over time when missing the best 10, 20, or more days.

This strikes me as a form of cherry picking: “If you missed out on the best days, you would fall behind.” The inverse of this argument is also true: “If you missed out on the worst days, you would surge ahead.”

Let’s look at some data. Specifically, I downloaded the publicly available historical data of the Dow Jones Industrial Average (DJIA) with daily open close levels from 1980-2023. Then I did a similar analysis for the hypothetical returns of the DJIA relative to “missing out” on the best 10 vs. the worst 10 days:

One observation is that the extremes are clustered around just a few events. 14 (of 20) extremes happened in the days around just two events: The Oct-Nov 2008 recession (Lehman Brothers bankrupcty) and the onset of the Covid-19 pandemic (Mar-2020). These two periods mark very high volatility, where you have both some of the worst and some of the best days in quick succession.

We can now look at the cumulative returns of the both scenarios of missing the 10 best and worst days during that 44 year period 1980 – 2023 (ending 1-Nov-2023):

(This analysis ignores transaction cost of buying and selling, as well as tax impact based on holding period. It is not meant to be investment advice, just an analysis of the oft-cited consensus that “Buy & Hold is the best strategy” and “market timing is impossible”.)

While it’s true that Buy & Hold is better than the hypothetical “Best 10 days missed” approach (with a 2.1x higher return), it is also true that Buy & Hold is worse than the “Worst 10 days missed” approach (with a 0.42x return). In both cases, there is more than a factor of 2 difference to the Buy & Hold reference strategy! This is a stark demonstration of the Pareto principle that very few events disproportionately affect the outcome. Here the missing of the 10 best (worst) trading days in a period of more than 11,000 trading days (44 years) changes the outcome by more than a factor of two! In fact, just sitting out the single-worst day of 10/19/1987 (the infamous “Black Monday” crash of 1987, single day loss of 22.6%) would result in a 29.2% better return (1 / (1 – 0.226) = 1.292)! A single one in 11,000+ events accounts for a near 30% difference in outcome!

Of course, one cannot predict the best or worst days, so the above hypothetical analysis cannot be converted into an actionable strategy. The general trend is upwards, and there are more up than down days. As a result, you generally want to be in the market more than out of the market. Hence it is intuitive that Buy & Hold is a good strategy. And when there are continuously incoming funds, buying quickly is far better than waiting on the side lines – as this good market timing study by Schwab comparing five different strategies is showing in detail. But all this certainly does not imply that Buy & Hold is the best strategy over the long run.

Stock market performance by Month

There have been many analyses of how the stock market is doing, by day of week, by month, quarter, year, presidential election or incumbent party, interest rates etc. While nobody can predict the up or down movement on a daily basis, one gets the sense that there are cyclical factors influencing the markets. One of the obvious cycles is the time of year. It affects not only external factors like the weather and with it consumer behavior, but also the planning and budgeting cycles of most companies in some form or another. Not every year will be the same, of course, but certain patterns may emerge over the long run (decades).

A good granularity into this is provided by looking at the stock market performance by month. One article on stockanalysis.com describes the average monthly stock return based on the S&P 500 from 1980- 2018:

Based on the DJIA data since 1980 I did a similar analysis for the monthly returns (1980-2023) as follows:

The details vary slightly, but the results are quite similar. A few things stand out:

Most months yield positive average returns.
April and November stand out as the two months with highest returns.
August and September are the only ones with negative returns.

Note these are averages over ~40 years, and there are some counter-examples such as years with bad April or November returns as well as years with good August or September returns. (Note also that the Compound Aggregate Growth Rate CAGR is a bit different over long series than the arithmetic average, but this distinction is barely relevant for the modeling here.)

Just like one can’t predict the market day by day, one can’t predict what the return of the next month in any given year is going to be. But this raises the question as to whether one might be able to exploit seasonal changes for a higher long-term return. An analogy might be the weather: While you can’t guarantee that any given day in summer is warm and sunny, there is a higher probability of that than in the other seasons. If you were to bet on it with your annual vacation planning, over the long run you should have favorable results.

Investing with a Summer Break

While several articles clearly show the persistent differences in monthly returns, I was a bit surprised not to have seen that translated into some strategies to improve on Buy & Hold. There also seems to be some sloppy reasoning, for example from this article on monthly returns on MoneyChimp:

“From the results, it looks like some months really are significantly better than average. November through January is a particularly strong stretch; and September is the “danger” month, with an overall negative return. Surprisingly, October shows positive returns on average, although October 1987 and 2008 were pretty hard to forget.
Note that December has been better than January, which contradicts two popular myths: the December Selloff, and the January Effect.
But also notice that there are lots of exceptions to the pattern. There have been bad Januaries, and great Septembers. And of course the biggest trend of all is that the market goes up over time. So maybe the lesson here is the usual one, that long-term buy and hold is the winning strategy.”

The first sentence highlights a signal (some months are better than average, some are worse),
yet the last sentence all but ignores that signal (safe to stick with “the usual” lesson).

To me, the obvious strategy would be one of annual selling at the end of July, sitting out the market in August and September, then buying at the beginning of October and holding until next year. Let’s call this the “Summer Break” strategy. It would be actionable, easy to implement with only one buy/sell cycle per year and only calendar dates as deciding factors.

Based on the DJIA data I simulated the Summer Break strategy and plotted the returns of an initial $10,000 investment at the beginning of 1980 until 1-Nov-2023 for Summer Break (orange) and Buy & Hold (blue):

While the Buy & Hold strategy yields $396,288 the Summer Break strategy yields $741,185 which is 187% or nearly 2x of the reference (Buy & Hold).

Of course, there are some years where Buy & Hold performs better than Summer Break.

In fact, the first 10 years Buy & Hold is slightly better (orange line often below blue line).
Only in 1990 does Summer Break pull and stay ahead, with 1996 almost leveling both strategies.
A select few years contribute the most to the difference, such as 2001 and 2002, when Summer Break leaps from 1.2x to 1.6x of Buy & Hold (see data callouts below).

The ratio and changes therein can be visualized as follows:

Criticism and Caveats

Critics might say it would take a long time to see the impact of this strategy. Past performance is not a guarantee of future results. And timing matters, i.e. if you miss the few years where there is a large difference you may not see the gains of this strategy for a long time. For example, if you started the Summer Break strategy right after the strong gains in 2001 and 2002, say on 1/1/2003, you would still be slightly behind the reference Buy & Hold strategy 19 years later. (Any 20-year or longer window, however, results in a higher return than Buy & Hold.)

These criticisims are all valid. However, they apply equally to the Buy & Hold strategy itself as well. For instance, historically, if you had bad timing on Buy & Hold, you could still have nominal losses after many years – for example it took 25 years to recover the previous high after the 1929 recession in 1954, or 18 years from 1964 to 1982. This is why above all investing in the stock market is considered to be more attractive the longer the time horizon, best measured in decades. (And why investing purely in stocks close to retirement may be too risky…)

There are of course constraints such as one not being able to achieve the exact sell price of the closing market, the overhead of trading cost, dividend payments or reinvestment, the tax impact of capital gains (especially when held for < 1 year), the inflation, etc. These factors are all ignored for this analysis. While they don’t change the fundamental conclusions here, they would certainly impact the specific returns achieved when applied in real markets.

Closing note: I thought this relatively straightforward analysis would be a good candidate for the new Advanced Data Analytics (ADA) plug-in of ChatGPT, which has received a lot of attention of late. So I started uploading the original .csv DJIA historical data file. ADA interpreted and described the data fine. However, when I aksed it to implement the Summer Break strategy (or variants thereof), it generated some Python code and produced charts, but they were incorrect. It frequently got thrown off by boundary conditions (like error due to not finding the last trading day before 1/1/1980 in the data). It did not properly aggregate the returns, and repeatedly set back to the original value at 100%. When I pointed this out, it said it would fix the code, but before long it just generated emtpy charts. I got the distinct impression of talking to an entity which understands some of the topic, but does not have a fully formed semantic model, and hence starts hallucinating some code together. I plan to dive deeper into this as a separate post, but found this a bit anti-climactic given all the hype out there.

1 Comment

Posted by visualign on November 14, 2023 in Financial

Sankey diagrams for Income Statements

27 Oct

We have posted about Sankey Diagrams some 11 years ago on this Blog. It is a remarkably useful visualization for the flow of continuous quantities like money, electricity, raw materials through systems. There have been other blogs dedicated entirely to Sankey diagrams, such as https://www.sankey-diagrams.com where the author wrote 660 posts during 2007 – 2021.

Yet, sadly, this type of diagram has still not made it into Excel or other mainstream chart libraries.

One interesting use of it has been popularized by AppEconomyInsight for Income Statements to visualize how companies make money. Here is a recent example for Meta (FaceBook):

It shows the size and diversity (or lack thereof) of a business revenue and profitability. It is so much more intuitive than the pages of numbers in the public income statements! One of the best examples of Visualign’s motto: Visualize Data. Generate Insight!

The website now has many dozens of charts depicting large publicly traded companies and the information from their quarterly income statements. The Sankey diagram has even become the website logo

We have also looked at four large tech companies, starting with this post back in 2012 on Side by Side: Apple, Microsoft, Google, Amazon. We had created a composite back then for these four companies to show just the sources of revenue (akin to the left side of the Sankey diagrams for the inflow to the central revenue bar):

Here is a similar composite of the Sankey diagrams from AppEconomyInsight for those four companies:

(Source: appeconomyinsight.com; Note: All recent but different quarters as some are not available or behind paywall.)

Such a side-by-side comparison reveals many interesting insights:

All companies have massive revenues in the order of $50-100B per quarter, with biggest by Amazon.
Software company Microsoft has the largest operating margin (40%) and net profit margin (36%).
Retail company Amazon has by far the smallest net profit margin (5%).
- Its AWS division accounts for 16% of revenue, but 70% of operating profit!
Advertising companies Meta and Google are trying to diversify sources of revenue.
- Google has been able to diversify via its Google Play and Cloud segments contributing nearly 22% of revenue and showing much stronger growth.
- Meta’s Reality Labs (RL) revenues are paltry (1% of overall revenue), and the RL losses reduced the other operating profits by more than a quarter (28%).
Hardware company Apple is still dependent on its dominant iPhone for nearly half of all revenue (49%), but its fast growing Services segment already generates more than a quarter (26%) of that.

From a profitability perspective, these diagrams also nicely show what business schools will tell you:

Making money from retail is hard.
Making money from software and services is easier.
Making money from hardware can be very profitable – if there is killer demand. One great example of that is the Apple iPhone demand over the last 15 years. Another great recent example is the demand for Nvidia GPU chips for data centers due to the generative AI breakthroughs of ChatGPT and similar models. See the recent breakdown of Nvidia with a staggering 46% net profit margin:

The above comes from an article aptly called the “Nvidia: The iPhone moment of AI”. Coincidentally, in the last article on the four biggest tech companies back in 2018, I speculated which would be the next tech company to reach the trillion dollar valuation. It was not Meta, but first Tesla and then Nvidia.

Tesla’s income statement Sankey diagram is also interesting to see:

It would be great to have a generalized template to convert traditional income statements into such charts. Even better would be an interactive version of the Sankey diagram where you could:

Highlight or Filter one of the revenue branches to see its profitability
Highlight or Filter one of the expense or profit branches to see its contributors

Of course such interactivity would require a more detailed underlying model of accounting flows, which may or may not be disclosed by the company.

Leave a comment

Posted by visualign on October 27, 2023 in Financial, Industrial

Coronavirus Cases at end of March

01 Apr

Just 2 weeks ago I looked at how the previous 4 weeks from mid-February to mid-March had qualitatively changed the situation for many countries in this pandemic. The pace of change is not letting up. Here I review my predictions from 2 weeks ago and summarize the weekly changes since then.

Predictions made 2 weeks ago on 3/17:

Numbers for the US:

3/22: 28,000 (forecast), 33,276 (actual)
3/29: 221,000 (forecast), 140,886 (actual)

The confirmed cases grew faster initially, then somewhat slower than the exponential best fit trend-line had forecast. It has been observed that such growth by contagion follows a Power-Law distribution (see ZDNet article), which resembles exponential growth initially, but then grows somewhat slower (linear line in log-log plot) compared to exponential growth (linear line in a log-linear plot).

Ranks:

3/18: US will pass China in active cases, which will be 7th behind France and US.
3/22: US will be top-ranked in new cases.
3/31: US will be top-ranked in active cases.

All three predictions came to pass, the last one a few days sooner already. By now, the US has far and away the most active cases (100,000 more than Italy at rank 2), the most new cases (3x that of Spain at rank 2) and nearly twice as many confirmed cases as Italy at rank 2.

Covid-19 Case counts as of Mar-31. Source: worldometers.info/coronavirus

Qualitative scenario:

Until early March this pandemic was a China story.
In mid March the pandemic is a Europe story.
By end of March this will be a US story.

Here is how the percentage of confirmed cases has evolved throughout March:

Stacked Area Chart of confirmed Covid-19 cases by continent. Data source Johns Hopkins CSSE.

At the beginning of March (left side of chart), Asia clearly dominated the case numbers, with China about 90% and South Korea about 4% of all cases. Italy had 2%, Iran 1% and France and Germany only 0.1% of all cases. While China already had ~80,000 confirmed cases, the US had only 74 confirmed cases just a month ago!

By mid-March (middle of chart), Europe had grown to 1/3 of all confirmed cases, with Asia / China accounting for most of the other 2/3 of cases. The US had about 3,500 confirmed cases, or 2% of the total (166,700) at the time. With active cases, Europe was starting to dominate the picture, with Italy (~20,600) having nearly twice as many active cases as China (~10,800). China was reporting very few new cases and lots of recovered. Italy and soon Spain were reporting ever-growing numbers of daily new cases, with slow growth on the recovered side (and sadly strong growth of deaths).

No at the end of March (right side of chart), the US is clearly the country with single-biggest confirmed and active case numbers. The Americas now accounts for ~25% of all cases, proportion growing. Asia / China’s portion has diminished to ~20% of all cases. And Europe has crested its highest percent (53.9% on 3/28) and is slowly reducing its proportion of worldwide confirmed cases.

Here is the distribution of all 189,510 confirmed cases in the US:

Confirmed Covid-19 cases in the US as of Mar-31. Source: Johns Hopkins dashboard.

The dashboard of Johns Hopkins University allows for the US to drill from country to state to county level. This is helpful in understanding where clusters of confirmed cases are. One can clearly see the large metropolitan areas with more and larger dots than in rural areas such as in the Western half of the continent.

Like many analysts, I have created my own dashboard based on the daily refreshed datasets from JHU GitHub. This has been an interesting exercise in many ways, partly due to the fast changing but freely and freshly available data, but also due to other examples of widely shared charts on social media.

One example of a new chart I haven’t seen elsewhere is a scatter plot of all countries with > 1,000 confirmed cases on a timeline through March.

CFR trajectory of Countries with >1,000 Confirmed cases in March; Size = # Deaths; Color = GDP per capita range.

This shows Spain and Italy being located in high single digit Case Fatality Rates (CFR) at the end of March. Italy’s and the US trajectory are highlighted. Italy’s CFR shot up and exceeded 10% – often attributed to the strain in their overwhelmed medical system. It’s also a less affluent country on the whole, but the hardest hit region of Lombardy is one of the richest in Italy, so it can’t be mainly attributed to an underfunded healthcare system. By contrast, the US CFR trajectory has stayed low throughout March and reached only about 2%.

As we are heading into April, it remains to be seen how well all these countries can flatten their curves, reduce the peak of confirmed / active cases and ultimately get through the pandemic with a minimum of deaths. No forecast tonight, but more analysis ahead in the weeks to come!

Leave a comment

Posted by visualign on April 1, 2020 in Uncategorized

Weekly changes in the Coronavirus pandemic

15 Mar

The global Coronavirus pandemic has caused a series of dramatic changes in markets, economy and policy over just a few weeks.

The known case counts have been tracked and published widely. It is a stunning demonstration of the power of exponential growth.

Mathematical tracking and modeling can help to predict and visualize the near-term and thus inform policy, similar to how meteorology relies on computer model forecasts of weather events.

I’m writing this on Sunday, Mar-15. Let’s look at the data of last few weeks and how the pandemic changed qualitatively. Underlying data comes from this GitHub repository of the Johns Hopkins CSSE.

8 weeks ago (Jan-19)

The Coronavirus outbreak originated in Wuhan, China about 8 weeks ago (mid January). The number of confirmed cases first exceeded 100 new cases on Jan-21. China imposed drastic lock down measures: On Jan-23 Wuhan city and on Jan-24, another 15 cities were shut down, putting 60+M people under lock down. Nevertheless, the number of confirmed cases continued to grow strongly for another 3-4 weeks, from under 1,000 to 70,000+ by Feb-16.

Here is the number of confirmed, active and recovered cases in China over the last 7 weeks.

Confirmed, Recovered and Active Cases in China over last 7 weeks

Active = confirmed – recovered – deaths. Although growing to about 3,000, the number of cases resulting in deaths does not change these graphs qualitatively.

It’s worth noting that the drastic lock-down measures were imposed at the beginning of the above timeline. This shows that even extremely drastic measures have a 3-4 week delay until they produce results in bending the case graph.

For the first 5 weeks (until Feb-23) there were hardly any confirmed cases outside of China.

Let’s look at the qualitative changes over the last 4 weeks.

4 weeks ago (Feb-16)

Two trends start to take shape:

The daily increase in new confirmed cases is shrinking dramatically
The number of recovered cases is growing exponentially (although at slower rate than the original confirmed cases)

As a result, the number of active cases begins to level off, peaks around 58,100 on Feb-17 and then starts to fall.

This is good news, as it demonstrates that the outbreak can be stopped and reversed. However, by this time it has begun spreading all over the world.

3 weeks ago (Feb-23)

China is still adding new cases, but at a slowing pace. On Feb-23 there are just over 77,000 confirmed cases, only a 10% increase from 1 week earlier. The recovered cases are growing faster than new cases, hence the active cases go down (first time under 50,000 on Feb-24).

Meanwhile, confirmed cases all over the world outside China are taking off, reaching nearly 2,000 by Feb-23. Italy has 155 confirmed cases and records the first 3 deaths.

Rest of world confirmed and active cases

2 weeks ago (Mar-1)

China has the outbreak under control:

The confirmed case count is just under 80,000. It will only grow another 1,000 for the next 2 weeks (81,003 as of today Mar-15).
There are more recovered cases (42,162) than active cases (34,898).

If China can keep up the lock-down measures, this is fast going in the right direction.

Outside of China the situation escalates quickly. By Mar-2, the confirmed case count for

World (without China) exceeds 10,000
Italy exceeds 2,000

The case counts in Italy show no signs of slowing down. The increase for the first time is greater than 300 new cases per day. In fact, today (2 weeks later) the increase has exploded 12-fold to 3,590 new cases in one day!

1 week ago (Mar-8)

For the first time, there are more active cases outside of than in China. Active cases on Mar-8:

World (without China): 24964
China: 20335

Moreover, China’s active case count continues to fall, while the world’s active cases grow exponentially.

Total confirmed cases in China and rest of world

There are very few countries (South Korea) which appear to be able to follow China’s path of controlling an epidemic in their country once it exceeds hundreds of cases.

South Korea’s increases are beginning to slow down, and Italy (7,375) surpasses South Korea (7,314) to rank highest in confirmed cases outside China.

Most other countries in the Top 10 confirmed cases at this point are seeing exponential growth with no sign of slowing down. What’s worse, they are only now beginning to implement lock-down measures. The WHO declares the coronavirus outbreak a global pandemic on Mar-11. That same day, Italy shuts down and closes all commercial activities, offices, cafes, shops. Only transportation, pharmacies, groceries will remain open. As we have seen, even if these measures were to be equally successful as in China, it would still take at least 2-3 weeks (i.e. end of March) before the active case load would flatten and peak out.

Today (Mar-15)

Today marks the first day with more confirmed cases outside (85,308) than in China (81,003). While 4 weeks ago China had 99% of all cases, it now has less than 50% of worldwide cases.

Just two days ago (Mar-13), Italy became the country with the most active cases (14,955), ahead of China in second place (13,569).

In this coming week, thanks to its continuing fall of active cases, China’s rank in active case count will drop behind several other countries like Iran, Spain, Germany, France, and the USA.

Active case in China and rest of world

What used to be a China problem is now a world problem. China has it under control. Most other countries are out of control.

Italy vs. USA

Confirmed Cases in Italy, USA and California

Source: Twitter, @sonyaharris_

This shows how similar the initial phase of exponential increase is, with different countries or states behind by a fixed number of days. (Here USA is 11 days behind Italy, CA is 7 days behind the entire US.) Without any drastic differences in interventions and with similar levels of testing, this table easily predicts the approximate number of confirmed cases. For example, the US will have 20,000+ confirmed cases by around Mar-25, with CA alone exceeding 20,000 cases by Apr-1.

Addendum

Summary of qualitative changes by timeline:

Jan-23: 600 confirmed cases, with 400 new on that day; Wuhan city shuts down
Feb-17: China active cases peak at 58,108 (3.5 weeks after shutdown).
Mar-1: China confirmed cases level off at 80,000. Over next 2 weeks, adds only ~1,000 more. More recovered cases (42,162) than active cases (34,898).
Mar-2: Rest of world > 10,000 and Italy > 2,000 confirmed cases.
Mar-8: More active cases (24,964) in rest of world than in China (20,335).
Mar-13: Italy has most active cases (14,955), ahead of China (13,569).
Mar-15: More confirmed cases (85,308) in rest of world than in China (81,003).
Last 4 weeks, China added ~10,500 confirmed cases.
Rest of world added ~10,200 just yesterday!

Model Estimates (Source: Medium article with Wuhan timeline analysis):

Number of actual infections about 25x that of confirmed cases
3-4 week delay between lock-down measures and peak of active cases
(more with less aggressive lock-down)
Peak active cases about 100x (~60,000) the confirmed cases at lock-down (600)
Final confirmed cases level off at about 100-150x the number on day of lockdown
Early on, the number of actual infections is about 800x the number of reported deaths.
A single day of delaying drastic measures can increase confirmed cases by ~40%

(Optimistic) Predictions for the US as of 3/15:

Case counts: 3,806 confirmed, 3,664 active, 69 deaths, 73 recovered.
Estimated 55,200 (800x deaths) – 95,150 (25x confirmed) actual cases;
we have between 50-100k actual cases and no severe lockdown measures in place yet!
Even if we locked down now (3/16) as severely as in Wuhan:
- We would still expect another 3-4 weeks of active case growth with peak at ~360,000
- We would expect a total of ~500,000 confirmed cases by ~ Apr-8
- If we wait just one more day (3/17), make that ~700,000 cases (40% or 200,000 more)
  If we wait two more days (3/18), the total doubles to ~1,000,000 cases
Assuming 1% fatality, this puts us at 5,000-10,000 deaths.

I’m no medical expert, but all I’m reading recently points to the actual numbers trending far worse than the above optimistic scenario predictions. Even the CDC has floated predictions of final US deaths ranging from 500,000 – 1.7 million in the next 12-18 months. A million people in the US could die from this!! Not sure why anyone would still brush this aside as no big deal. Statements made reflecting such attitude will not age well.

Addendum 3/17

The case numbers in this pandemic change very rapidly, as do the respective rankings. Here are some more observations and predictions for the United States:

Observations:

Cases ~200,000 confirmed, 8,000 deaths, 83,000 recovered and 109,000 active

Ranks:

Confirmed cases: China, Italy, Iran, Spain, Germany; US (8th)
Active cases: Italy, Spain, Iran, Germany, China ; US (8th)
New cases: Italy, Germany, Spain, US (4th), Iran
Deaths: China, Italy, Iran, Spain, France, US (5th)
New Deaths: Italy, Spain, Iran, France, US (5th)

Relative Growth:

US used to be 11 days behind Italy’s total numbers, now (3/17) only 10 days behind, gap closing (see factors below)

Active cases for Italy and the US (actuals and exponential trendlines)

Predictions for the US:

Case counts:

Estimated Confirmed 28,000 by 3/22 ; 221,000 by 3/29 (from best-fit exponential trendline of last 14 days)

Ranks:

Tomorrow (3/18) the US will have more active cases than China!
(China will be 7th behind France and the USA)
By next Sunday (3/22) US will be top ranked in new cases.
By end of March the US will be top ranked in active cases.

Contributing factors:

Population longer in denial, partly due to politicized atmosphere
Lock-down measures later in case growth and less drastic (each state individually)
US nearly the size of all EU, about 4x Germany or 5x Italy
US late in testing; today (3/17) not even all hospital cases get tests (delays actual numbers)
When tests become more widely available, numbers will grow at faster rate than model forecast
Italy ahead by 10 days, but last 4 days near linear growth (i.e. at inflection point) and
recovered (and 1-6% death) cases will reduce active case count

This pandemic used to be a China story until early March. Now in mid March this is a European story. By end of March this will be a US story.

Leave a comment

Posted by visualign on March 15, 2020 in Medical, Scientific

6 year growth: Apple, Microsoft, Google, Amazon

03 Jun

Back in 2012 we did a side-by-side comparison of the four largest technology companies and their quarterly growth and other financial metrics. A year later in 2013 the four companies were again compared using Wolfram Alpha to generate lots of charts and tables by simply typing in “Google vs. Amazon vs. Apple vs. Microsoft” in the search bar.

Today, six years later, this same exercise reveals very strong growth. The big companies are getting (much) bigger. Here are some comparisons:

With the underlying numbers:

(Market cap as of market close on 2/3/2012 and 6/1/2018; sources the respective 10-Q filings; scales are the same for left and right charts. Google refers to its parent Alphabet, Inc.)

When taken together, over the last six years the four companies have grown as follows:

Revenue more than doubled (+112%, 13.3% annualized)
Income grew only moderately (+31%, 4.6% annualized)
Market cap tripled (+202%, 20.2% annualized)
Employees almost quadrupled (+276%, 24.7% annualized)

Of course, the $ numbers need to be inflation-adjusted, but US inflation rates were around 2% or less between 2012-2018, which amounts to about 10% over that period of time. Hence inflation is not qualitatively influencing this analysis or comparison.

Amazon grew the most, with its market cap growing more than 9 fold (+833%) and its employees more than 8-fold (+763%) to more than half a million people. Back in 2012, all four combined just exceeded $1 trillion in market cap; this has swollen to $3.3 trillion.

These are the biggest nominal market cap values in history. When comparing them to the GDP of countries, they would each rank in the Top-20. According to 2018 GDP projections by the International Monetary Fund, Apple would rank 18 behind the Netherlands (17th, $945,327 million), the other three companies would rank 19 behind Turkey (18th, $909,855 million). The market cap of these four companies combined would rank 5th behind Germany (4th, $4,211,635 million). In other words, only the top four countries by GDP (United States, China, Japan and Germany) are bigger than the market cap of Apple, Microsoft, Google and Amazon combined.

These corporations are transnational entities with a global customer base. Arguably, their size and economic power has grown so rapidly that the legal, tax and trade frameworks governing their operations can’t always keep up. Similarly, when companies get so large and rich, they can buy startups and entice talent to join them at a rate newer entrants or even governments cannot match. Apple’s cash position at the end of Q3’2017 was roughly $270 billion (source Asymco). It is not obvious that consumers always benefit from companies growing that large (see monopoly and anti-trust laws). Thankfully, the current technology oligopoly leads to healthy competition.

As before, there remain significant differences in the revenue segmentation across these four companies:

Arguably, Microsoft has the broadest diversification and hence the most stability against disruptive innovation. Its three segments are not only roughly equal in size, but in turn contain a variety of different sub-segments. Productivity and Business Processes includes Office, Exchange, Skype, LinkedIn, Dynamics; Intelligent cloud includes Windows Server, SQL Server, Azure and Consulting Services; Personal Computing includes Windows, Devices, XBox and Search/Advertising. Microsoft’s Azure cloud services have closed the gap to Amazon’s AWS business and recently overtaken it by quarterly revenue.

If consumers were to search somewhere else than using Google, shop somewhere else than Amazon or buy no more iPhones, these companies would all shrink by an order of magnitude. Microsoft stands well positioned by comparison.

The following radar plot shows the above table numbers in a different perspective:

For each metric, 100% corresponds to the maximum of the four companies. Amazon has the most employees, Apple is the largest in quarterly revenue, profit and market cap. Some comments on the 2012 – 2018 changes:

Employees: Microsoft only added +35% of employees; Apple and Google more than doubled at about +160%; Amazon exploded by adding +763% to an almost 9-fold increase from 65,600 to 566,000.
While Microsoft had more than twice as many employees as Apple in 2012, they are the same size now (~123,000).
Profits: While the green line (market cap) in the radar plot almost looks like an even-sized rectangle, the red line (profit) is much tilted towards Apple and leaves comparatively little for Amazon.
Revenue per employee: Apple still takes the price in this rank (~$2million/year), with Google ($1.47million/year) and Microsoft ($0.87million/year). Amazon “only” earns $0.36million/year. In that metric, Amazon slipped from rank 2 to the bottom and Apple’s lead is not as strong as it was in 2012.

Much has been speculated about the future of the biggest technology companies and the nature of the next disruptions such as cloud, augmented reality (AR) and artificial intelligence (AI). Perhaps the biggest disruptor for this elite club of technology companies is Facebook, which only went public six years ago. FB currently has a $562 billion market cap. It is now bigger than these four were back in 2012, and about 70% of the size they are now. My own skepticism at the time of the Facebook IPO was proven wrong by its continued and strong growth. Their base of about 2 billion free accounts is by far the largest of any company ever. That said, I personally still have no Facebook account, while I’m using the products and services of each of the top four companies nearly every day! It will be interesting to see which one first breaks the $1 trillion market cap threshold.

Addendum 11/19/2018:

A lot has happened over the last 6 months. First, the above mentioned run-up continued and produced AAPL in early Aug-2018 as the first company to be publicly traded company worth $1 trillion. AMZN followed suit soon thereafter in early Sep-2018, but only stayed at that lofty valuation for a day or so. Here is a snapshot of the valuations as of Aug-31, 2018:

Later in the fall the tides turned, and four of the above five stocks are now in correction territory. Here is the above snapshot for today, Nov-19, 2018:

Here are the changes for all five companies summarized:

Microsoft appears to have weathered the recent turbulence much better than the other companies. MSFT is down only 6.8% over the last 10 weeks; AAPL and GOOG each lost 16-20%, which at these valuations amounts to $217B and $140B, respectively! And AMZN and FB each lost about 25% of their market cap.

The combined total market value loss of the five companies is near $788B or $157 each on average. It is amazing to see how volatile the tech market has become in recent months. [Note on 11/21: Coincidentally, the New York Times ran a headline story the next day 11/20 titled The Tech Stock Fall Lost These 5 Companies $800 Billion in Market Value; the only difference was they excluded Microsoft and included Netflix.]

The earlier post pointed out that Microsoft was very well positioned, strongly diversified in its business, under fresh leadership of its CEO Satya Nadella since 2015, investing in new technologies (cloud, AI, AR) and much more conservative personnel expansion during the good times. They are now number #2 and maybe on track to pass AAPL again on their way to becoming the most valuable company in the world.

Leave a comment

Posted by visualign on June 3, 2018 in Financial, Industrial

World Inequality and the Elephant Curve

16 Feb

In December 2017 the World Inequality Lab (WIL) published its first World Inequality Report 2018. The lab consists of a five-member board and 20+ researchers, mostly from the Paris School of Economics (Thomas Piketty et al.) and the University of California at Berkeley (Emmanuel Saez et al.). Compared to previous work on economic inequality it is fair to say that research has significantly advanced over the last 5 years along several directions:

The free report itself is available both online as well as in various download formats and eight languages. It aims to become a data-driven foundation for societal and policy discussions about inequality.
All underlying data are openly published (via the World Wealth & Income Database WID) to support reproducibility and stimulate further research.
The methodology to aggregate data is encompassing more sources, more attributes (including age, gender, etc.) and better informed estimates, across a wider spectrum of countries and geographies (all important for policy discussions).
The visualizations have evolved beyond limited measures such as the Gini-Index and now typically include interactive charts (such as the for example at http://wid.world/country/usa/)

This report is quite detailed and holistic. Aside from the Executive Summary, Introduction, Conclusion and Appendices, it consists of the following five parts:

AIM OF THE WORLD INEQUALITY REPORT 2018
NEW FINDINGS ON GLOBAL INCOME INEQUALITY
EVOLUTION OF PRIVATE AND PUBLIC CAPITAL OWNERSHIP
NEW FINDINGS ON GLOBAL WEALTH INEQUALITY
FUTURE OF GLOBAL INEQUALITY AND HOW IT SHOULD BE TACKLED

There are many interesting findings. Let me just provide three examples in this Blog, together with respective visualizations telling the “story in the data”.

Example 1: Inequality rising everywhere, but at different speeds

Here is a Figure E2a showing the Top 10% income shares across several large geographies over the period 1980-2016:

figure-e2a

From the report’s Executive Summary:

Since 1980, income inequality has increased rapidly in North America, China, India, and Russia. Inequality has grown moderately in Europe (Figure E2a). From a broad historical perspective, this increase in inequality marks the end of a postwar egalitarian regime which took different forms in these regions.

and further

The diversity of trends observed across countries since 1980 shows that income inequality dynamics are shaped by a variety of national, institutional and political contexts.
This is illustrated by the different trajectories followed by the former communist or highly regulated countries, China, India, and Russia (Figure E2a and b). The rise in inequality was particularly abrupt in Russia, moderate in China, and relatively gradual in India, reflecting different types of deregulation and opening-up policies pursued over the past decades in these countries.
The divergence in inequality levels has been particularly extreme between Western Europe and the United States, which had similar levels of inequality in 1980 but today are in radically different situations. While the top 1% income share was close to 10% in both regions in 1980, it rose only slightly to 12% in 2016 in Western Europe while it shot up to 20% in the United States. Meanwhile, in the United States, the bottom 50% income share decreased from more than 20% in 1980 to 13% in 2016 (Figure E3).

The latter is apparent from the supporting visualization in Figure E3, contrasting the Top 1% and Bottom 50% national income shares in the US with that of Western Europe:

figure-e3

figure-e3b

Although the y-axis does not start at 0% and is of different scale in both charts, the underlying story, i.e. the evolution of income shares of the rich (top 1%) and lower class (bottom 50%) over the last 35 years is apparent:

Income shares have changed significantly in the US:
- The Top 1% nearly doubled their income share from 11% to 20%
- The Bottom 50% saw their income share almost cut in half from 21% to 13%
Income shares have been fairly stable in Western Europe

Example 2: The elephant curve of global inequality

On this Blog we have written a lot about the Gini index. (See Gini posts) One of the limitations of the Gini index is that it reduces the entire inequality picture down to a single scalar value. Multiple distributions result in the same Gini index, which means that structural distribution changes may be masked out by a near constant Gini index.

For example, world inequality over the last 35 years has had both increasing effects (such as growth concentration at the top) as well as decreasing effects (raising hundreds of millions of people out of poverty in India and China). Visualizing the Gini index over time does not show this dynamic well.

Another chart to visualize this dynamic more clearly is the elephant curve – named after the shape of the animal. This curve lists all population groups in percentiles along the x-axis, sorted by increasing income from left to right. The first 99 % have the same x-axis spacing; the top 1% on the right is split into 10 subgroups of 0.1% each; the top 0.1% is again split into 10 subgroups of 0.01%, and finally the top 0.01% is again split into 10 subgroups of 0.001%. This gives a finer resolution near the top of the income distribution, highlighting the very disproportionate accrual of growth at the top. See Figure E4 for global inequality growth from 1980 – 2016:

figure-e4

The big bump on the left (head of the elephant) represents the large number of people lifted out of poverty (mostly in India and China). The steep rise on the right (trunk of the elephant) represents the disproportionate gains at the top of the economic income distribution. Again, from the Executive Summary:

How has inequality evolved in recent decades among global citizens? We provide the first estimates of how the growth in global income since 1980 has been distributed across the totality of the world population. The global top 1% earners has captured twice as much of that growth as the 50% poorest individuals. The bottom 50% has nevertheless enjoyed important growth rates. The global middle class (which contains all of the poorest 90% income groups in the EU and the United States) has been squeezed.

To underscore the last statement, here is the elephant curve of income growth from 1980-2016 for just the US-Canada and Western Europe (Figure 2.1.2):

figure-212

Note how in this chart, without China and India, the left side is flat, indicating that the lower economic classes have only had average or negligible income growth.

How did this translate into shares of growth captured by different groups? The top 1% of earners captured 28% of total growth—that is, as much growth as the bottom 81% of the population. The bottom 50% earners captured 9% of growth, which is less than the top 0.1%, which captured 14% of total growth over the 1980–2016 period. These values, however, hide large differences in the inequality trajectories followed by Europe and North America. In the former, the top 1% captured as much growth as the bottom 51% of the population, whereas in the latter, the top 1% captured as much growth as the bottom 88% of the population. (See chapter 2.3 for more details.)

It is noteworthy that the closer to the top, the higher the cumulative income growth, especially in the US. For example, Table 2.4.2 below shows that since 1980, US income has more than

doubled for the Top 10% (growth = 121%)
tripled for the Top 1% (204%)
quadrupled for the Top 0.1% (320%)
quintupled for the Top 0.01% (453%) and
septupled for the Top 0.001% (636%)

table-242

Another interesting finding from this is that pre-tax US income for the bottom 50% has essentially remained unchanged (growth = 1%) for an entire generation, with the bottom 20% even seeing their income shrink by 25%. Economic policies which exclude large portions of the population from growth for an entire generation are bound to increase tensions within that population, here primarily along the lines of economic class boundaries.

Example 3: Geographic breakdown of global income groups

In Part 2 the report looks at the share of Africans, Asians, Americans and Europeans in each of the global income groups and how this has changed over the last few decades. To illustrate, there are two snapshots in time, first at 1990 (Figure 2.1.5)

figure-215

and then at 2016 (Figure 2.1.6):

figure-216

Comparing these two area charts reveals a few interesting developments at the level of entire geographic regions:

In 1990, Asians were almost not represented within top global income groups. Indeed, the bulk of the population of India and China are found in the bottom half of the income distribution. At the other end of the global income ladder, US-Canada is the largest contributor to global top-income earners. Europe is largely represented in the upper half of the global distribution, but less so among the very top groups. The Middle East and Latin American elites are disproportionately represented among the very top global groups, as they both make up about 20% each of the population of the top 0.001% earners. It should be noted that this overrepresentation only holds within the top 1% global earners: in the next richest 1% group (percentile group p98p99), their share falls to 9% and 4%, respectively. This indeed reflects the extreme level of inequality of these regions, as discussed in chapters 2.10 and 2.11. Interestingly, Russia is concentrated between percentile 70 and percentile 90, and Russians did not make it into the very top groups. In 1990, the Soviet system compressed income distribution in Russia.

In 2016, the situation is notably different. The most striking evolution is perhaps the spread of Chinese income earners, which are now located throughout the entire global distribution. India remains largely represented at the bottom with only very few Indians among the top global earners.

The position of Russian earners was also stretched throughout from the poorest to the richest income groups. This illustrates the impact of the end of communism on the spread of Russian incomes. Africans, who were present throughout the first half of the distribution, are now even more concentrated in the bottom quarter, due to relatively low growth as compared to Asian countries. At the top of the distribution, while the shares of both North America and Europe decreased (leaving room for their Asian counterparts), the share of Europeans was reduced much more. This is because most large European countries followed a more equitable growth trajectory over the past decades than the United States and other countries, as will be discussed in chapter 2.3.

There are, of course, many more findings in this report. It is great to see that such rigorous data-driven analysis is made available free of charge and easy to consume (desktop, iPad, etc.). One can hope that such foundational work will lead to a more educated civic discussion about the current status of economic inequality, the impact of various policy tools as well as the geographic developments on these inequalities.

Leave a comment

Posted by visualign on February 16, 2018 in Socioeconomic

Tags: income inequality, inequality, wealth inequality

Data Visualizations in Healthcare

20 May

A couple of weeks ago I attended HIMSS 2017 in Orlando (Healthcare Information and Management Systems Society), the largest annual Healthcare IT event in the US. One of the big tenets of the show was system interoperability. There are lots of different vendors, few standards, and vast amounts of data being collected. There is an emerging set of APIs (such as HL7 FHIR) to ensure data can be shared among systems and follows a patient properly through the various providers she encounters during her episodes of care.

Somewhat serendipitously I came across a booth which had large prints of beautiful data visualizations on it. The booth was from Arcadia Healthcare Solutions. Given my background on data visualization and my being employed at Rennova Health Technology Solutions and responsible for our healthcare IT products and services, I was drawn to interpret these prints. They are also featured in an online data gallery, which I encourage you to explore.

One of these visualizations created by Jeff Solomon is called “The Health IT Space“. It displays highly aggregated data from various EHR (Electronic Health Record) systems. From the gallery:

The Electronic Health Record is a data gold mine. Each patient you see generates millions of detailed records in real time that can be extracted and analyzed for improved predictive algorithms, increased operational efficiency, better care quality, and so much more.

SmallAmbulatory

These graphs are stylized Entity Relationship diagrams from seven different EHRs. Nodes are data tables, and edges are relationships between these tables inferred from shared attributes.

MajorAmbulatory

The color-highlighted nodes are referring to patient data. The size of the node corresponds to the amount of records in the respective table.

LargeAmbulatory

Again, from the gallery description:

In each cluster, the core patient entity – the nucleus around which the rest of the data revolve – can be identified by its contrasting color. The tables containing the bulk of the clinically and operationally valuable data tend to form clusters of large, interconnected nodes, while a larger number of satellite tables house system configurations and other low-volume metadata that has very few relationships to the nucleus.

At the large end of entire health systems, the graph starts to look very busy:

LargeHealthSystem

It is pretty amazing how much data is being aggregated into these graphs. Nearly 5,000 database tables – hence the black areas where there are too many dots to separate them at this resolution. A combined number of 18 billion records! Nearly 300,000 relationships between these tables (again, the lines are too numerous to be distinguishable).

The Health IT Space I find it somewhat humbling to review these graphs. Our own MedicalMime EHR falls into the small category by these standards. Major and Large EHRs are at least one, maybe two orders of magnitude larger and more complex.

Interpreting the large amount of data contained in the EHR opens up many ways to improve healthcare, both medically for the patient as well as operationally for the providers. Visualizations can help us to better understand patterns and trends which otherwise would remain hidden.

The entire big picture of the above visualization is indicated on the right. For a Hi-Res version please contact the friendly folks from Arcadia Health Solutions directly from their website.

Another big trend at HIMSS’17 was Artificial Intelligence. Machine learning and predictive analytics received a lot of attention. Solutions like IBM’s Watson Health promise to bring world-class expertise into ordinary physician practices through subscription to hosted specialty knowledge in the cloud – whether curated by scientists or machine-learned using statistical techniques from big data. Health Catalysts healthcare.ai is an open platform to make machine learning techniques more accessible and bring them to small SW houses, not just the large companies with large R&D budgets. While certainly overhyped at the moment, acceptance for obtaining a “second medical opinion from the cloud / app” to improve clinical decisions is rising.

Leave a comment

Posted by visualign on May 20, 2017 in Medical

Digital Wages in the Gig Economy

26 Mar

A small research team from the Oxford Internet Institute has recently issued a report based on a three year investigation into the worldwide geographies of the so-called Gig-Economy, online work which allows many talented people in the low and middle income countries of the world to compete on a global stage. From the Executive Summary:

Online gig work is becoming increasingly important to workers living in low- and middle-income countries. Our multi-year and multi-method research project shows that online gig work brings about rewards such as potential higher incomes and increased worker autonomy, but also risks such as social isolation, lack of work–life balance, discrimination, and predatory intermediaries. We also note that online gig work platforms mostly operate outside regulatory and normative frameworks that could benefit workers.

One of the eye-catching and very information rich visualizations comes from a related Blog post by the “Connectivity, Inclusion, and Inequality Group” called “Uneven Geographies of Digital Wages“.

Dollar Inflow and Median Wage by Country

The cartogram depicts each country as a circle and sizes each country according to dollar inflow to each country during March 2013 (on the freelance work oDesk.com platform, rebranded in 2015 to Upwork). The shading of the inner circle indicates the median hourly rate published by digital workers in that country. The graphic broadly reveals that median wages are, perhaps unsurprisingly low in developing countries and are significantly higher in wealthier countries.

Another Blog post on the geographies of online work adds several more visualizations (based on 2013 data, so a bit dated by now). For instance, one world map highlights the relationship between supply and demand. It distinguishes between countries with a positive balance of payment (i.e. countries in which more work is sold than bought) and countries with a negative balance of payment (countries in which more work is bought than is sold). The figure more clearly delineates the geography of supply and demand: with much of the world’s demand coming from only a few places in the Global North.

Balance of payments

Another very interesting and dense visualization is a connectogram (see our previous post on Connectograms and the Circos tool) demonstrating the highly international trade in the online Gig-Economy: 89% of the trade measured by value happened between a client and a contractor who are in different countries. The network therefore attempts to illustrate the entirety of those international flows in one graph. It depicts countries as nodes (i.e. circles) and volumes of transactions between buyers and sellers in those countries as edges (i.e. the lines connecting countries). Country nodes are shaded according to the world region that they are in and sized according to the number of buyer transactions originating in them. Edges are coloured according to the flow of services: with the line shaded as the colour of the originating/selling region. Edges are also weighted according to the total volume of trade.

The Geographic Network of Sales

We see not just a complex many-to-many relationship of international trade, but also the large role that a few geographic relationships take (in particular, India and the Philippines selling to the United States).

Back to the Executive Summary of the above report:

The report’s central question is whether online gig work has any development potentials at the world’s economic margins. Its motive is to help platform operators to improve their positive impact, to help workers to take action to improve their situations, and to prompt policy makers and stakeholders interested in online gig work to revisit regulation as it applies to workers, clients, and platforms in their respective countries.

It is interesting to see these marketplaces evolve, in terms of the international, distributed nature, issues such as taxation, intermediation, opportunities and risks. There are also entirely new forms of social networks forming, based on blockchain powered token systems convertible into crypto-currencies (such as Steem). The core concept here is to eliminate not just geographical distance, but also risks from exchange rate fluctuations and predatory intermediaries. It remains to be seen to what degree this can act as a counterweight to technology-induced increasing inequality.

Leave a comment

Posted by visualign on March 26, 2017 in Industrial, Socioeconomic

Tags: inequality

Share this:

What is the Global Innovation Index (GII)?

Interesting Visualizations of GII

Innovation relative to income level

Innovation overperformers

Innovation efficiency

Ranking Heatmap

Compare Ranks

Stratification by geography and by income

GII Dynamo

Summary

Share this:

Tracking the Market

Timing the Market

Stock market performance by Month

Investing with a Summer Break

Criticism and Caveats

Share this:

Share this:

Share this:

8 weeks ago (Jan-19)

4 weeks ago (Feb-16)

3 weeks ago (Feb-23)

2 weeks ago (Mar-1)

1 week ago (Mar-8)

Today (Mar-15)

Italy vs. USA

Addendum

Addendum 3/17

Share this:

Share this:

Example 1: Inequality rising everywhere, but at different speeds

Example 2: The elephant curve of global inequality

Example 3: Geographic breakdown of global income groups

Share this:

Share this:

Share this:

Top Posts & Pages

Categories

Archives

Subscribe to Blog via Email

Blog Stats