RSS

Category Archives: Socioeconomic

Nonlinearity in Growth, Decay and Human Mortality

Processes of Growth and Decay abound in natural and economic systems. Growth processes determine biological structure and pattern formation, selection of species or ideas, the outcome of economic competition and of savings in financial portfolios. In this post we will examine a few different types of quantitative growth / decay and their qualitatively different outcomes.

Growth

In the media we often hear about nonlinear, exponential, or explosive growth as popular references to seemingly unstoppable increases. Buzzwords like “tipping point” or “singularity” appear on book titles and web sites. Mathematical models can help analytical understanding of such dynamic processes, while visualization can support a more intuitive understanding.

Let’s look three different growth processes: Linear, exponential, and hyperbolic (rows below) by specifically considering three different quantities (columns below):
The absolute amount (as a function of time),
the absolute rate of increase (derivative of that function), and
the relative rate of increase (relative to the amount)

Amounts, Rates, and Relative Rates of three growth processes: Linear, Exponential, Hyperbolic

Linear growth (blue lines) is the result of a constant rate or increment per time interval. The relative rate (size of increment in relation to existing quantity) is decreasing to zero.

Exponential growth (red lines) is the result of a linearly growing rate or increment per time interval. The relative rate is a constant. Think accrual of savings with fixed interest rate. Urban legend has it that Albert Einstein once declared compound interest – an exponential growth process – to be “the most powerful force in the universe”. Our intuition is ill-suited to deal properly with exponential effects, and in many ways it seems hard to conceive of even faster growth processes. However, even with exponential growth it takes an infinite time to reach an infinitely large amount.

Hyperbolic growth (brown lines) is the result of a quadratically growing rate. In this type of growth even the relative rate is increasing. This can be caused by auto-catalytic effects, in other words, the larger the amount, the larger the growth of the rate. As a result, such growth leads to infinite values at a finite value of t – also called a discontinuity or singularity.

When multiple entities grow and compete for limited resources, their growth will determine the outcome as a distribution of the resource as follows:

Linear growth leads to coexistence of all competitors; their ratios determined by their linear growth rates.
Exponential growth leads to reversible selection of a winner (with the highest relative growth rate). Reversible since a competitor with a higher relative growth rate will win, regardless of when it enters the competition.
Hyperbolic growth leads to irreversible selection of a winner (first to dominate). Irreversible since the relative growth rate of the dominant competitor dwarfs that of any newcomer.

Such processes have been studied in detail in biology (population dynamics, genetics, etc.) It’s straightforward to imagine the combination of random fluctuations, exponential (or faster) growth and ‘Winner-take-all’ selection as the main driving processes of self-organized pattern formation in biology, such as in leopard spots or zebra stripes, all the way to the complex structure-formation process of morphogenesis and embryology.

Yet such processes tend to also occur in economics. For example, the competition for PC operating system platforms was won by Microsoft’s Windows due to the strong advantages of incumbents (applications, tools, developers, ecosystem, etc.) Similar effects can be seen with social networks, where competitors (like FaceBook) become disproportionately stronger as a result of the size of their network. I suspect that it also plays a central role in the evolution of inequality, which can be viewed as the dynamic formation of structure (viewed as the unequal allocation of wealth across a population).

Two popular technology concepts owe their existence to nonlinear growth processes:

Exponential Growth: The empirical Moore’s Law states that computer power doubles every 18 months or so (similar for storage capacity, transistors on chips and network bandwidth). This allows us to forecast fairly accurately when machines will have certain capacities which seem unimaginable only a few decades earlier. For example, computer power increases by a factor of 1000 in only 15 years, or a million-fold in 30 years or the span of just one human generation!
Hyperbolic Growth: Futurist Ray Kurzweil has observed that the doubling period of many aspects of our knowledge society is shrinking. From this observation of an “ever-accelerating rate of technological change” he concludes in his latest book that “The Singularity Is Near“, with profound technological and philosophical implications.

In many cases, empirical growth observations and measurements can be compared with mathematical models to either verify or falsify hypothesis about the underlying mechanisms controlling the growth processes. For example, world population growth has been tracked closely. To understand the strong increase of world population as a whole over the last hundred years or so one needs to look at the drivers (birth and mortality rates) and their key influencing factors (medical advances, agriculture). Many countries still have high birth rates, while medical advances and better farming methods have driven down the mortality rates. As a result, population has grown exponentially for many decades. (See also the wonderful 2min video visualization of this concept linked to from the previous post on “7 Billion“.) Short of increasing the mortality rate, it is evident that population stabilization (i.e. reduction of growth to zero) can only be achieved by reducing the birth rate. This in turn influences the policy debates, for example to empower women so they have less children (better education and economic prospects, access to contraception, etc.). Here is a graphic on world population growth rates:

Population growth rates in percent (source: Wikipedia, 2011 estimates)

Compare this to the World maps showing population age structure in the Global Trends 2025 post. There is a strong correlation between how old a population is and how high the birth rates are. (Note Africa standing out in both graphs.)

Decay

Conversely one can study processes of decay or decline, again with qualitatively different outcomes for given rates of decline such as linear or exponential. One interesting, mathematically inspired analysis related to decay processes comes from the ‘Gravity and Levity’ Blog in the post “Your body wasn’t built to last: a lesson from human mortality rates“. The article starts out with the observation that our likelihood of dying say in the next year doubles every 8 years. Since the mortality rate is increasing exponentially, the likelihood of survival is decreasing super-exponentially. The empirical data matches the rates forecast by the Gompertz Law of mortality almost perfectly.

Death and Survival Probability in the US (Source: Wolfram Alpha)

If the death rate were to grow exponentially – i.e. with a fixed increase per time interval – the resulting survival probability would follow an exponential distribution. If, however, the death rate is growing super-exponentially – i.e. with a doubling per fixed time interval – the survival probability follows a Gompertz distribution.

Lets look at a table similar to the above, this time contrasting three decay processes (rows below): Linear, Exponential, Super-Exponential. (Again we consider the amount, absolute rate and relative rate (columns below) as follows (constants chosen to match initial condition F[0] = 1):

Amounts, Rates, and Relative Rates of three decay processes: Linear, Exponential, Super-Exponential

The linear decay (blue lines) is characterized by a constant rate and reaches zero at a time proportional to the initial amount, at which the relative rate has a discontinuity.

The exponential decay (red lines) is characterized by a constant relative rate and thus leads to a steady, but long-lasting decay (like radio-active decay).

The super-exponential decay (brown lines) leads to the amount following a Gompertz distribution (matching the shape of the US survival probability chart above). For a while the decay rate remains very small near zero. Then it ramps up quickly and leads to a steep decline in the amount, which in turn reduces the rate down as well. The relative rate keeps growing exponentially.

The above linked article goes on to analyze two hypotheses on dominant causes of human death: The single lightning bolt and the accumulated lightning bolt model. If the major causes of death were singular or cumulative accidents (like lightning bolts or murders), the resulting survival probability curves would have a much longer tail. In other words, we would see at least some percentage of human beings living to ages beyond 130 or even 150 years. Since such cases are practically never observed, the underlying process must be different and the lightning bolt model is not able to explain human mortality.

Instead, a so called “cops and criminals” model is proposed based upon biochemical processes in the human body. “Cops” are cells who patrol the body and eliminate bad mutations (“criminals”) which when unchecked can lead to death. From the above post:

The language of “cops and criminals” lends itself very easily to a discussion of the immune system fighting infection and random mutation. Particularly heartening is the fact that rates of cancer incidence also follow the Gompertz law, doubling every 8 years or so. Maybe something in the immune system is degrading over time, becoming worse at finding and destroying mutated and potentially dangerous cells.

Unfortunately, the full complexity of human biology does not lend itself readily to cartoons about cops and criminals. There are a lot of difficult questions for anyone who tries to put together a serious theory of human aging. Who are the criminals and who are the cops that kill them? What is the “incubation time” for a criminal, and why does it give “him” enough strength to fight off the immune response? Why is the police force dwindling over time? For that matter, what kind of “clock” does your body have that measures time at all?

There have been attempts to describe DNA degradation (through the shortening of your telomeres or through methylation) as an increase in “criminals” that slowly overwhelm the body’s DNA-repair mechanisms, but nothing has come of it so far. I can only hope that someday some brilliant biologist will be charmed by the simplistic physicist’s language of cops and criminals and provide us with real insight into why we age the way we do.

A web calculator for death and survival probability based on Gompertz Law can be found here.

Global Trends 2025

04 Jan

If you like to do some big-picture thinking, here is a document put together by the National Intelligence Council and titled “Global Trends”. It is published every five years to analyze trends and forecast likely scenarios of worldwide development fifteen years into the future. The most recent is called “Global Trends 2025” and was published in November 2008. It’s a 120 page document which can be downloaded for free in PDF format here.

To get a feel for the content, here are the chapter headers:

The Globalizing Economy
The Demographics of Discord
The New Players
Scarcity in the Midst of Plenty?
Growing Potential for Conflict
Will the International System Be Up to the Challenges?
Power-Sharing in a Multipolar World

From the NIC Global Trends 2025 project website:

Some of our preliminary assessments are highlighted below:

The whole international system—as constructed following WWII—will be revolutionized. Not only will new players—Brazil, Russia, India and China— have a seat at the international high table, they will bring new stakes and rules of the game.

The unprecedented transfer of wealth roughly from West to East now under way will continue for the foreseeable future.

Unprecedented economic growth, coupled with 1.5 billion more people, will put pressure on resources—particularly energy, food, and water—raising the specter of scarcities emerging as demand outstrips supply.

The potential for conflict will increase owing partly to political turbulence in parts of the greater Middle East.

As interesting as the topic may be, from a data visualization perspective the report is somewhat underwhelming. I counted just 5 maps and 5 charts in the entire document. The maps are interesting, such as the following on World Age Structure:

World Age Structure 2005

World Age Structure 2025 (Projected)

These maps show the different age of countries’ populations by geographical region. The Northern countries have less young people, and the aging trend is particularly strong for Eastern Europe and Japan. In 2025 almost all of the countries with very young population will be in Sub-Saharan Africa and the Arab Peninsula. Population growth will slow as a result; there will be approximately 8 billion people alive in 2025, 1 billion more than the 7 billion today.

In this day and age one is spoiled by interactive charts such as the Bubble-Charts of Gapminder’s Trendalyzer. Wouldn’t it be nice to have an interactive chart where you could set the Age intervals and perhaps filter in various ways (geographic regions, GDP, population, etc.) and then see the dynamic change of such colored world-maps over time? How much more insight would this convey about the changing demographics and relative sizes of age cohorts? Or perhaps display interactive population pyramids such as those found here by Jorge Camoes?

Another somewhat misguided ‘graphical angle’ are the slightly rotated graphics on the chapter headers. For example, Chapter 2 starts with this useful color-coded map of the Youth in countries of the Middle East. But why rotate it slightly and make the fonts less readable?

Youth in the Middle East (from Global Trends 2025 report)

I don’t want to be too critical; it’s just that reports put together with so much systematic research and focusing on long-range, international trends should employ more state-of-the-art visualizations, in particular interactive charts rather than just pages and pages of static text…

2 Comments

Posted by visualign on January 4, 2012 in Industrial, Socioeconomic

Tags: economics, geographical map, trends, world population

Scientific Research Trends

03 Jan

The site worldmapper.org has published hundreds of cartogram world maps; cartograms are geographic maps with the size of the depicted areas proportional to a specified metric. This leads to the distorted versions of countries or entire continents relative to the original geographical size we are used to. (We recently looked at cartograms of world mobile phone adoption here.)

One interesting set of cartograms from worldmapper.org relates to scientific research. The first shows the amounts of science papers (as of 2001) authored by people living in the respective areas:

Science Research (Number of research articles, Source: Worldmapper.org)

Another shows the growth in the above number between 1990 and 2001:

Science Growth (Change in Number of research articles, Source: Worldmapper.org)

From worldmapper.org:

This map shows the growth in scientific research of territories between 1990 and 2001. If there was no increase in scientific publications that territory has no area on the map.

In 1990, 80 scientific papers were published per million people living in the world, this increased to 106 per million by 2001. This increase was experienced primarily in territories with strong existing scientific research. However, the United States, with the highest total publications in 2001, experienced a smaller increase since 1990 than that in Japan, China, Germany and the Republic of Korea. Singapore had the greatest per person increase in scientific publications.

It is worth noting that the trends depicted are based on data one decade old. It is likely, however, that those trends have continued over the past decade, something which Neil deGrasse Tyson points out with concern regarding the relative decline of scientific research in America in this YouTube video:

Another point Tyson emphasizes is the near total absence of scientific research from the entire continent of Africa as evidenced by the disappearance of the continent on the cartogram. With about a billion people living there it is one of the stark visualizations of the challenges they face to escape from their poverty trap.

Underestimating Wealth Inequality

12 Dec

What are people’s perceptions about estimated, desirable and actual levels of economic inequality? Behavioral economist Dan Ariely from Duke University and Michael Norton from Harvard Business School conducted a survey of ~5,500 respondents across the United States to find out. Their survey asked questions about wealth inequality (as compared to income inequality), also known as net worth, essentially the value of all things owned minus all things owed (assets minus debt).

Addendum 3/9/2013: A recently posted 6min video illustrating these findings went viral (4 million+ views). It is worth watching:

The authors published the paper here and Dan Ariely blogged about it here in Sep 2010. One of the striking results is summarized in this chart of the wealth distribution across five quintiles:

From their Legend:

The actual United States wealth distribution plotted against the estimated and ideal distributions across all respondents. Because of their small percentage share of total wealth, both the ‘‘4th 20%’’ value (0.2%) and the ‘‘Bottom 20%’’ value (0.1%) are not visible in the ‘‘Actual’’ distribution.

It turned out that most respondents described a fairly equal distribution as the ideal – something similar to the wealth distribution in a country like Sweden. They estimated – correctly – that the U.S. has higher levels of wealth inequality. However, they nevertheless grossly underestimated the actual inequality, which is far higher still. Especially the bottom two quintiles are almost non-existent in the actual distribution. There was much more consensus than disagreement across groups from different sides of the political spectrum about this. From the current policy debates one would not have expected that. They go on to ask the question:

Given the consensus among disparate groups on the gap between an ideal distribution of wealth and the actual level of wealth inequality, why are more Americans, especially those with low income, not advocating for greater redistribution of wealth?

In the last chapter of their paper the authors offer several explanations of this phenomenon. One of them is the observation that the apparent drastic under-estimation of the degree of inequality seems to reveal a lack of awareness of the size of the gap. This is something that Data Visualization and interactive charts can help address. For example, Catherine Mulbrandon’s Blog Visualizing Economics does a great job in that regard.

The authors go on to look at other aspects from the perspective of psychology and behavioral economics. While fascinating in its own right, this excursion is beyond the scope of my Data Visualization Blog. They conclude their paper with general observations

…suggesting that even given increased awareness of the gap between ideal and actual wealth distributions, Americans may remain unlikely to advocate for policies that would narrow this gap.

2 Comments

Posted by visualign on December 12, 2011 in Socioeconomic

Tags: economics, estimation, inequality, psychology

Inequality on Twitter

06 Dec

A lot has been written about economic inequality as measured by distribution of income, wealth, capital gains, etc. In previous posts such as Inequality, Lorenz-Curves and Gini-Index or Visualizing Inequality we looked at various market inequalities (market share and capitalization, donations, etc.) and their respective Gini coefficients.

With the recent rise of social media we have other forms of economy, in particular the economy of time and attention. And we have at least some measures of this economy in the form of people’s activities, subscriptions, etc. Whether it’s Connections on LinkedIn, Friends on FaceBook, Followers on Twitter – all of the social media platforms have some social currencies for attention. (Influence is different from attention, and measuring influence is more difficult and controversial – see for example the discussions about Klout-scores.)

Another interesting aspect of online communities is that of participation inequality. Jakob Nielsen did some research on this and coined the well-known 90-9-1 rule:

“In most online communities, 90% of users are lurkers who never contribute, 9% of users contribute a little, and 1% of users account for almost all the action.”

The above linked article has two nice graphics illustrating this point:

Illustration of participation inequality in online communities (Source: Jakob Nielsen)

As a user of Twitter for about 3 years now I decided to do some simple analysis, wondering about the degrees of inequality I would find there. Imagine you want to spread the word about some new event and send out a tweet. How many people you reach depends on how many followers you have, how many of those retweet your message, how many followers they have, how many other messages they send out and so on. Let’s look at my first twitter account (“tlausser”); here are some basic numbers of my followers and their respective followers:

Followers of tlausser Followers on Twitter

Some of my followers have no followers themselves, one has nearly 100,000. On average, they have about 3600 followers; however, the total of about 385,000 followers is extremely unequally distributed. Here are three charts visualizing this astonishing degree of inequality:

Of 107 followers, the top 5 have ~75% of all followers that can be reached in two steps. The corresponding Gini index of 0.90 is an example of extreme inequality. From an advertising perspective, you would want to focus mostly on getting these 5% to react to your message (i.e. retweet). In a chart with linear scale the bottom half does barely register.

Most of my followers have between 100-1000 followers themselves, as can be seen from this log-scale Histogram.

What kind of distribution is the number of followers? It seems that Log[x] is roughly normal distributed.

As for participation inequality, let’s look at the number of tweets that those (107) followers send out.

Some of them have not tweeted anything, the chattiest has sent more than 16,000 tweets. On average, each follower has 1280 tweets; the total of 137,000 tweets is again highly unequally distributed for a Gini index of 0.77.

The top 10 make up about 2/3 of the entire conversation.

Again the bottom half hardly contributes to the number of tweets; however, the ramp in the top half is longer and not quite as steep as with the number of followers. Here is the log-scale Histogram:

I did the same type of analysis for several other Twitter Users in the central range (between 100-1000 follower). The results are similar, but certainly not yet robust enough to statistical sampling errors. (A larger scale analysis would require a higher twitter API limit than my free 350 per hour.)

These preliminary results indicate that there are high degrees of inequality regarding the number of tweets people send out and even more so regarding the number of followers they accumulate. How many tweets Twitter users send out over time is more evenly distributed. How many followers they get is less evenly distributed and thus leads to extremely high degrees of inequality. I presume this is caused in part due to preferential attachment as described in Barabasi’s book “Linked: The new science of networks“. Like with all forms of attention, who people follow depends a lot on who others are following. There is a very long tail of small numbers of followers for the vast majority of Twitter users.

That said, the degree of participation inequality I found was lower than the 90-9-1 rule, which corresponds to an extreme Gini index of about 0.96. Perhaps that’s a sign of the Twitter community having evolved over time? Or perhaps just a sign of my analysis sample being too small and not representative of the larger Twitterverse.

In some way these new media are refreshing as they allow almost anyone to publish their thoughts. However, it’s also true that almost all of those users remain in relative obscurity and only a very small minority gets the lion share of all attention. If you think economic inequality is too high, keep in mind that attention inequality is far higher. Both are impacting the policy debate in interesting ways.

Turning social media attention into income is another story altogether. In his recent Blog post “Turning social media attention into income“, author Srininvas Rao muses:

“The low barrier to entry created by social media has flooded the market with aspiring entrepreneurs, freelancers, and people trying to make it on their own. Standing out in it is only half the battle. You have to figure out how to turn social media attention into social media income. Have you successfully evolved from blogger to entrepreneur? What steps should I take next?”

10 Comments

Posted by visualign on December 6, 2011 in Industrial, Scientific, Socioeconomic

Tags: attention, Gini, inequality, twitter

World Cartogram of Mobile Phone Adoption

20 Nov

Under the slogan “Our Changing World”, FedEx has developed a website with various cartograms showing world-wide socio-economic changes based on publicly available data from sources such as World Bank, UNESCO, World Health Organization and others.

Cartograms visualize a particular metric by adjusting a country’s size corresponding to that metric. It leaves country neighborhood relationships (which we blogged about here) intact, but inflates or deflates countries, often dramatically so. Here is a series of three cartograms showing the adoption of mobile phones in the years 1995, 2000, and 2008. Size of each country is proportional to the density of mobile phones (average # mobile phones per 100 people).

Mobile Phone Density 1995

Mobile Phone Density 2000

Mobile Phone Density 2008

From the Topic Info on the Mobile Phone Presence display:

In 1996, mobile phones were a Nordic phenomenon. A Swede was twice as likely as an American to own one, and five times as likely as a German. Skip forward four years and the picture changed radically. Mobile phone usage boomed ten-fold across Europe; most European nations caught up with their northern neighbours. Eight years later. Africa suddenly loomed large. Mobile-phone penetration in same emerging economies now outstrips that of the developed world; Algeria tops the US. In most countries, mobile phone use is now ubiquitous. Lacking a mobile phone is more striking today than possessing one.

Indeed, it’s hard to find a country with very small mobile phone presence – and then to pinpoint it on the cartogram. One country I found was Cuba: While most countries in the Americas have between 50-100, Cuba has only 3 mobile phones per 100 people.

A few months ago Nathan Yau covered this topic on his FlowingData Blog here. As he already suggested, there are many more data to explore on FedEx’s website, so check it out for yourself here.

1 Comment

Posted by visualign on November 20, 2011 in Industrial, Scientific, Socioeconomic

Tags: cartogram, mobile, world population

The Observatory of Economic Complexity

14 Nov

In this second part we will look at the online interactive visualizations as a companion to the first part’s Atlas of Economic Complexity. It’s interesting that the authors chose the title “Observatory”, as if to convey that with a good (perhaps optical) instrument you can reveal otherwise hidden structure. To repeat one of the fundamental tenets of this Blog: Interactive graphics allow the user to explore data sets and thus to develop a better understanding of the structure and potentially create otherwise inaccessible insights. This is a good example.

The two basic dimensions for exploration of trade data are products and countries. The most recent world trade data is from 2009 and it ranges back between 20 to 50 years (varying by country). I worked with three types of charts: TreeMaps, Stacked Area Charts, and the Product Space network diagram. Let’s start with Germany’s Exports in 2009:

Hovering the cursor over a node highlights it’s details, here “Printing Presses”, a product type where Germany enjoys a high degree of Revealed Comparative Advantage (RCA). (For details on RCA or any other aspects of the product space concept and network diagram, please see the previous post on the Atlas of Economic Complexity.) We can now explore which other countries are exporting printing presses:

While Germany clearly dominates this world market with 55% at $2.7b in 2009 with RCA = 5.6, the time slider at the bottom (with data since 1975) reveals that it has actually held an even bigger lead for most of the last 35 years. For example, with it’s exports in Printing Presses Germany commanded 72% at 3.7b in 2001 with RCA = 6.3 From the timeline one can also see how the United States captured about 20% of this (then much smaller) market for a brief period between 1979 and 1983. During this time its RCA for Printing Presses was just a bit above 1.0 – which shows as a black square in the Product Space – but the United States has since lost this advantage and not seen any significant exports in this product type. Printing Presses being a fairly complex product, only a handful of countries are exporting them, almost all of them European and Japan. There might be an interesting correlation between complexity and inequality, as the capabilities for the production of complex products tend to cluster in a few countries worldwide which then dominate world exports accordingly.

Another powerful instrument are Stacked Area Charts. Here you can see how a country’s Imports or Exports evolve over time, either in terms of absolute value or relative share of product types. For example, let’s look at the last 30 years (1978-2008) of Export data for the United States:

This GIF file (click if not animated) shows several frames. In Value display style one can see the absolute size and how Exports grew roughly 10-fold from about $100b to $1t over the course of those 30 years. The Share display style focuses on relative size, with all Exports always representing 100%. In the Observatory one can hover over any product type and thus highlight that color band to see the evolution of this product type’s Exports over time. In the highlighted example here, we can see how ‘Cereal and Vegetable Oil’ (yellow band) shrank from around 15% in the late seventies to around 5% since the late nineties. ‘Chemicals and Health Related Products’ (purple band) has remained more or less constant around a 10% Export share. ‘Electronics’ bloomed in the mid eighties from less than 10% to 15-20% and stayed on the high end of that range until around the year 2000 before shrinking in the last decade down to about 10%.

As a final example, look at the relative size of imports of the United States over the last 40 years, (1968 – 2008, sorted by final value):

The biggest category is crude petroleum products at the bottom. During the two oil shocks in the seventies the percentage peaked near 30% of all imports. Then it went down and stayed below 10% between 1985 – 2005. Since then it’s percentage has been steadily rising and reached about 15% again. (The data isn’t enough up-to-date to illustrate the impact of the 2008 recession.) Such high expenses are crowding out other categories. When the consumer pays more at the pump there is less to spend for other product types. Another interesting aspect of this last chart is that the bottom two bands represent opposite ends of the product complexity spectrum: Petroleum (brown) on the low end, cars (blue) on the high end.

As always, the real power of interactive visualizations comes from interacting with them. So I encourage you to explore these data at the Observatory of Economic Complexity.

Caveats: I noticed a couple of minor areas which seem to be either incomplete, counter-intuitive, poor design choices or simply implementation bugs. To start, there is no help or documentation of the visualization tool itself. Many of the diagram types on the left are grayed out and it is not always apparent what selection of products, countries or chart type will enable certain subselections. For example, there is a chart type “Predictive Tools” with two subtypes “Density Bars” and “Stepping Stone” that always seem to be grayed out? The same applies to Maps (presumably geographic maps) – all subtypes are grayed out. Perhaps I am missing something – would appreciate any comments if that’s the case.

In the TreeMaps for import and export one can not see the overall value of the overall trade (top-level rectangle) or any of the categories (second-level rectangles). Only the tooltips will show the value of a specific product type or country (third-level rectangle). The color legend is designed for the product space and designates the 34 communities of product types. When you hover the mouse over one product type, say garments (in green), then all imports / exports other than that product type are grayed out. When you show a product import / export chart, however, those same colors are used to designate groups of countries with color indicating continents (blue for Europe, red for the Americas, green for Asia etc.). Yet when you hover over the product icon in the legend (say garment), then only it’s corresponding color’s countries remains highlighted, which doesn’t make sense and can be misleading.
When you play the timeline in a TreeMap, the frequent change in layout can be confusing. A change from one year to the next played back and forth slowly or multiple times can be instructive, but a quick series of too many changes (particularly without seeing the labels) is just confusing.

In the stacked area charts when you click on Build Visualization it always comes up in “Value” style, even if “Share” is selected. To get to the Share style, you have to select Value and then Share again.

TreeMaps and Stacked Area Charts critically depend on the availability of data for all products / countries displayed. For years before 1990 there appear to be pockets of only sparsely available data, which then falsely suggests world market dominance of those products or countries. For example, the TreeMap for Imports in Printing Presses for 1983 shows the United States with 97% taking practically the entire market. In 1984, it’s share shrinks to a more balanced 28% despite growing very rapidly; simply because data for other countries from Europe, Asia etc. seems to not be available prior to 1984. In such cases it would have been better to show the rest as gray rectangle instead of leaving it out (if world import data are available) or just not display any chart for years with grossly incomplete data.

Navigation is somewhat limited. For example, looking at a country chart (say United Kingdom), it would be great to click on any product type (say crude petroleum) and get to a corresponding Stacked Area Chart diagram for that product type. One can do so using the drop-down boxes on the right, but that’s less intuitive.

There are two export formats (PDF and SVG). The vector graphics is a good choice since the fonts can be rendered fine even in the small print. I obtained poor results with PDF, however, as often the texts in TreeMaps were not aligned properly and printed on top of one another.

None of the above is a serious problem or even a showstopper. It would be great, however, if there was a feedback link to provide such info back to the authors and help improve the utility of this observatory.

1 Comment

Posted by visualign on November 14, 2011 in Industrial, Scientific, Socioeconomic

Tags: complexity, economics, research

The Atlas of Economic Complexity

10 Nov

Here is a recipe: Bring together renowned faculties like the MIT Media Lab and Harvard’s Center for International Development. Combine novel ideas about economic measures with years of solid economic research. Leverage large sets of world trade data. Apply network graph theory algorithms and throw in some stunning visualizations. The result: The Atlas of Economic Complexity, a revolutionary way of looking at world trade and understanding variations in countries paths to prosperity.

The main authors are Professors Ricardo Hausmann from Harvard and Cesar Hidalgo from MIT (whose graphic work on Human Development Indices we have reviewed here). The underlying research began in 2006 with the idea of the product space which was published in Science in 2007. This post is the first in a two-part series covering both the atlas (theory, documentation) as well as the observatory (interactive visualization) of economic complexity. This research is an excellent example of how the availability of large amounts of data, computing power and free distribution via the Internet enable entirely new ways of looking at and understanding our world.

The Atlas of Economic Complexity is rooted in a set of ideas about how to measure economies based not just on the quantity of products traded, but also on the required knowledge and capabilities to produce them. World Trade data allows us to measure import and export product quantities directly, leading to indicators such as GDP, GDP per capita, Growth of GDP etc. However, we have no direct way to measure the knowledge required to create the products. A central observation is that complex products require more capabilities to produce, and countries who manufacture more complex products must possess more of these capabilities than others who do not. From Part I of the Atlas:

Ultimately, the complexity of an economy is related to the multiplicity of useful knowledge embedded in it. For a complex society to exist, and to sustain itself, people who know about design, marketing, finance, technology, human resource management, operations and trade law must be able to interact and combine their knowledge to make products. These same products cannot be made in societies that are missing parts of this capability set. Economic complexity, therefore, is expressed in the composition of a country’s productive output and reflects the structures that emerge to hold and combine knowledge.

Can we analyze world trade data in such a way as to tease out relative rankings in terms of these capabilities?

To this end, the authors start by looking at the trade web of countries exporting products. For each country, they examine how many different products it is capable of producing; this is called the country’s Diversity. And for each product, they look at how many countries can produce it; this is called the product’s Ubiquity. Based on these two measures, Diversity and Ubiquity, they introduce two complexity measures: The Economic Complexity Index (ECI, for a country) and the Product Complexity Index (PCI, for a product).

The mechanics of how these measures are calculated are somewhat sophisticated. Yet they encode some straightforward observations and are explained with some examples:

Take medical imaging devices. These machines are made in few places, but the countries that are able to make them, such as the United States or Germany, also export a large number of other products. We can infer that medical imaging devices are complex because few countries make them, and those that do tend to be diverse. By contrast, wood logs are exported by most countries, indicating that many countries have the knowledge required to export them. Now consider the case of raw diamonds. These products are extracted in very few places, making their ubiquity quite low. But is this a reflection of the high knowledge-intensity of raw diamonds? Of course not. If raw diamonds were complex, the countries that would extract diamonds should also be able to make many other things. Since Sierra Leone and Botswana are not very diversified, this indicates that something other than large volumes of knowledge is what makes diamonds rare.

A useful question is this: If a good cannot be produced in a country, where else can it be produced? Countries with higher economic complexity tend to produce more complex products which can not easily be produced elsewhere. The algorithms are specified in the Atlas, but we will skip over these details here. Let’s take a look at the ranking of some 128 world countries (selected above minimum population size and trade volume as well as for reliable trade data availability).

Why is Economic Complexity important? The Atlas devotes an entire chapter to this question. The most important finding here is that ECI is a better predictor of a country’s future growth than many other commonly used indicators that measure human capital, governance or competitiveness.

Countries whose economic complexity is greater than what we would expect, given their level of income, tend to grow faster than those that are “too rich” for their current level of economic complexity. In this sense, economic complexity is not just a symptom or an expression of prosperity: it is a driver.

They include a lot of scatter-plots and regression analysis measuring the correlation between the above and other indicators. Again, the interested reader is referred to the original work.

Another interesting question is how Economic Complexity evolves. In some ways this is like a chicken & egg problem: For a complex product you need a lot of capabilities. But for any capability to provide value you need some products that require it. If a new product requires several capabilities which don’t exist in a country, then starting the production of such a product in the country will be hard. Hence, a country’s products tend to evolve along the already existing capabilities. Measuring the similarities in required capabilities directly would be fairly complicated. However, as a first approximation, one can deduce that products which are more often produced by the same country tend to require similar capabilities.

So the probability that a pair of products is co-exported carries information about how similar these products are. We use this idea to measure the proximity between all pairs of products in our dataset (see Technical Box 5.1 on Measuring Proximity). The collection of all proximities is a network connecting pairs of products that are significantly likely to be co-exported by many countries. We refer to this network as the product space and use it to study the productive structure of countries.

Then the authors proceed to visualize the Product Space. It is a graph with some 774 nodes (products) and edges representing the proximity values between those nodes. Only the top 1% strongest proximity edges are shown to keep the average degree of the graph below 5 (showing too many connections results in visual complexity). Network Science Algorithms are used to discover the highly connected communities into which the products naturally group. Those 34 communities are then color-coded. Using a combination of Minimum-Spanning-Tree and Force-Directed layout algorithms the network is then laid out and manually optimized to minimize edge crossings. The resulting Product Space graph looks like this:

Here the node size is determined by world trade volume in the product. If you step back for a moment and reflect on how much data is aggregated in such a graph it is truly amazing! One variation of the graph determines size by the Product Complexity as follows:

In this graph one can see that products within a community are of similar complexity, supporting the idea that they require similar capabilities, i.e. have high proximity. From these visualizations one can now analyze how a country moves through product space over time. Specifically, in the report there are graphs for the four countries Ghana, Poland, Thailand, and Turkey over three points in time (1975, 1990, 2009). From the original document I put together a composite showing the first two countries, Ghana and Poland.

While Ghana’s ECI doesn’t change much, Poland grows into many products similar to those where they started in 1975. This clearly increases Poland’s ECI and contributes to the strong growth Poland has seen since 1975. (Black squares show products produced by the country with a Revealed Comparative Advantage RCA > 1.0.)

In all cases we see that new industries –new black squares– tend to lie close to the industries already present in these countries. The productive transformation undergone by Poland, Thailand and Turkey, however, look striking compared to that of Ghana. Thailand and Turkey, in particular, moved from mostly agricultural societies to manufacturing powerhouses during the 1975-2009 period. Poland, also “exploded” towards the center of the product space during the last two decades, becoming a manufacturer of most products in both the home and office and the processed foods community and significantly increasing its participation in the production of machinery. These transformations imply an increase in embedded knowledge that is reflected in our Economic Complexity Index. Ultimately, it is these transformations that underpinned the impressive growth performance of these countries.

The Atlas goes on to provide rankings of countries along five axes such as ECI, GDP per capita Growth, GDP Growth etc. The finding that higher ECI is a strong driver for GDP growth allows for predictions about GDP Growth until 2020. In that ranking there are Sub-Saharan East Africa countries on the top (8 of the Top 10), led by Uganda, Kenya and Tanzania. Here is the GDP Growth ranking in graphical form – the band around the Indian Ocean is where the most GDP Growth is going to happen during this decade.

Each country has its own Product Space map. It shows which products and capability sets the country already has, which other similar products it could produce with relatively few additional capabilities and where it is more severely lacking. As such it can provide both the country or a multi-national firm looking to expand with useful information. The authors sum up the chapter on how this Atlas can be used as follows:

A map does not tell people where to go, but it does help them determine their destination and chart their journey towards it. A map empowers by describing opportunities that would not be obvious in the absence of it. If the secret to development is the accumulation of productive knowledge, at a societal rather than individual level, then the process necessarily requires the involvement of many explorers, not just a few planners. This is why the maps we provide in this Atlas are intended for everyone to use.

We will look at the rich visualizations of the data sets in this Atlas in a forthcoming second installment of this series.

6 Comments

Posted by visualign on November 10, 2011 in Industrial, Scientific, Socioeconomic

Tags: complexity, economics, map, research

7 Billion

04 Nov

World population has just reached 7 Billion this week. Exploring the growth of population and related aspects such as consumption, land use, urbanization etc. lends itself very well to data visualization. In this context, the National Geographic Society has released a free iPad app called “7 Billion” together with its Special Series: 7 Billion website.

The iPad app features some interesting charts under the heading “The Shape Of Seven Billion”. These visualizations come in the form of cartograms, a type of map that ignores a country’s true physical size and scales the size according to other data. Here they show population (current 2011 vs. 1960, when world population was around 3 Billion).

Population Cartogram 2011 (Source: National Geographic iPad App 7 Billion)

The position of countries is roughly preserved, the size is proportionate to the country population, and the color legend shows the amount of growth since 1960. The strongest growth (red, more than 300%) happened in Africa and the Middle East. Europe, Russia and Japan had the least amount of growth (blue, under 50%). India and China are by far the most populous countries, with India growing faster than China.

Another interesting cartogram illustrates consumption (as measured in Gross Domestic Product, GDP). Here the reference year is 1980 and is shown first in black & white:

Consumption Chart 1980 (Source: National Geographic, iPad App 7-Billion)

Compare this to the current Consumption or GDP distribution as of 2011:

World Consumption Chart 2011 (Source: National Geographic iPad App 7 Billion)

The size of the countries here is proportionate to their GDP (in constant international dollars using purchase power parity rates). The color scale has red (more than $40,000 per capita) and blue (less than $3,000 per capita) on both ends of the spectrum. While the United States is clearly dominating this picture, Europe has about the same size and China isn’t far behind. However, China has had the world’s largest GDP increase of 1,506% since 1980 (~15 fold increase), whereas the GDP of the U.S. grew by 119% (a bit more than doubled) during the same period of time.

Ideally on would be able to see this cartogram animated over time with sizes of countries shrinking or growing and changing colors over time, similar to the Bubble Charts we looked at earlier on this Blog.

There are many other interesting charts in this interactive eBook style app. For example, here is a chart showing the population growth over time – a good visualization of the power of exponential growth.

World Population Growth and Projection (Source: National Geographic 7 Billion iPad App)

One graphic aims at explaining the main drivers behind the explosive growth over the last two centuries after relatively slow growth for millennia – the improvements in health care and resulting drop in death rate led to a period of far greater birth rates than death rates.

Population Growth as Function of Birth Rate minus Death Rate

An interesting visualization idea has been published in a video by NPR using buckets for each continents and visualizing birth rate as water drops into the bucket and death rates as drops out of the bucket. It is obvious that when more water is dropping in on the top (births) than dropping out at the bottom (deaths), then the buckets fill up.

As a final example, consider this chart visualizing our even faster growing environmental impact: Since there is not just the Population size, but at least two other factors – Affluence and Technology – the multiplicative impact is growing even faster. With the use of three dimensions and the formula I = P * A * T this yields a simple but effective illustration.

Multiplicative Human Impact through Population, Affluence and Technology

Of course a short Blog post can’t do justice to all aspects of an app or eBook. There is a lot more to this app than shown here. But I hope you got an impression as to how interactive graphics can help communicate abstract and quantitative ideas in a more intuitive way.

3 Comments

Posted by visualign on November 4, 2011 in Socioeconomic

Tags: cartogram, geographical map, world population

Number of Neighbors for World Countries

06 Oct

One important geographical aspect in economy is whether a country is land-locked. Another aspect is the number of neighbors a given country shares a border with. If we sort all 239 world countries, 75 (31%, almost one third) of them are island countries such as Madagascar or Australia where this number is zero. On the opposite end are countries with the most border connections. Here are the top 6 countries in descending order: China (16), Russia (14), Brazil (10), Sudan, Germany, and Democratic Republic of Congo (9 each). All other countries have 8 or less neighbors. Here is a visual breakdown:

The histogram shows the high frequency of island states; the range from 1 to 5 neighbors is fairly common, with a steep drop off in the frequency of 6 or more neighbors. Here is a world map with the same color-code:

WorldMap color-coded by number of neighboring countries

Large countries tend to have more neighbors (Russia (14), China (16), Brazil (10)), but there are obvious exceptions to this tendency (Canada (1), United States (2)). The number of neighbors depends not just on the size of the country itself, but on it’s neighbors’ sizes as well; for example, a small country such as Austria (land area size world rank: 116th) has a rather high number of 8 neighbors because many of them in turn are relatively small (Switzerland, Liechtenstein, Slovenia, etc.).

The average number of neighbors is about 2.7 and there are 323 such border relationships. These can be visualized as graphs with countries as vertices and borders as edges. (Note that to simplify the graphs I excluded all 75 islands = disconnected vertices except Australia.) There are two main partitions of this graph following the land-border geography: One with Europe, Asia and Africa and one with the Americas.

Border-Connected Countries in Europe, Asia, Africa

With the graph layout changed from “Spring Embedding” to “Spring Electrical Embedding” one obtains this interesting variation of the same graph which looks like a sword fish:

The "EurAsiAfrica Sword-fish"

The other partition of the Americas can be visualized in a circular embedding layout:

Europe, Asia, Africa (left) and Americas (right)

It is also interesting to look at the numbers for lengths of pairwise borders between two countries:

Number: 323 border-pairs

Minimum: 0.34 [km]

Maximum: 8893 [km]

Mean: 789.6 [km]

Total: 255048 [km]

Most pairwise borders are between 100 – 1000 km long, but they can as short as 1/3 km (China – Macau) or almost 9000 km (Canada – United States).

When we look at the entire border length for each country, we see familiar names on top of the ranking:
China: 22147 [km], Russia: 20293 [km], Brazil: 16857 [km], India: 14103 [km], Kazakhstan: 12185 [km], United States: 12034 [km]. It seems likely that the first four, the so called “BRIC” countries, owe part of their economic strength to their geography: Size, length of borders and number of neighbors influence the number of local trading partners and routes to them. There are many more correlations one can analyze such as between border length / number of neighbors and GDP / length of road network etc. One thing seems likely when it comes to the economy of world countries: Size matters, and so does Geography!

Epilog: This analysis was all performed using Wolfram’s Mathematica 8. The built-in curated CountryData provides access to more than 200 properties of the world countries, including things like Population, Area, GDP, etc. Some cleaning of the borders lengths data was required to deal with different spellings of the same country. (If you’re interested in the data or source-code, please contact me via email.) List manipulation and mathematical operations such as summation are very easy to do in the functional programming paradigm of Mathematica. Graphs are first-order data structures with numerous vertex and edge operators. Charting is also fairly powerful with BarCharts, ListPlots and more advanced graph charting options. Which other software provides all this flexibility in one integrated package?

6 Comments

Posted by visualign on October 6, 2011 in Recreational, Socioeconomic

Tags: border, geographical map, implementation, Mathematica, world countries

visualign

Category Archives: Socioeconomic

Nonlinearity in Growth, Decay and Human Mortality

Global Trends 2025

Scientific Research Trends

Underestimating Wealth Inequality

Inequality on Twitter

World Cartogram of Mobile Phone Adoption

The Observatory of Economic Complexity

The Atlas of Economic Complexity

7 Billion

Number of Neighbors for World Countries

Top Posts & Pages

Visualign Twitter

Categories

Archives

Subscribe to Blog via Email

Blog Stats

Category Archives: Socioeconomic

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Top Posts & Pages

Categories

Archives

Subscribe to Blog via Email

Blog Stats