# Category Archives: Scientific

## Nonlinearity in Growth, Decay and Human Mortality

Processes of Growth and Decay abound in natural and economic systems. Growth processes determine biological structure and pattern formation, selection of species or ideas, the outcome of economic competition and of savings in financial portfolios. In this post we will examine a few different types of quantitative growth / decay and their qualitatively different outcomes.

Growth

In the media we often hear about nonlinear, exponential, or explosive growth as popular references to seemingly unstoppable increases. Buzzwords like “tipping point” or “singularity” appear on book titles and web sites. Mathematical models can help analytical understanding of such dynamic processes, while visualization can support a more intuitive understanding.

Let’s look three different growth processes: Linear, exponential, and hyperbolic (rows below) by specifically considering three different quantities (columns below):
The absolute amount (as a function of time),
the absolute rate of increase (derivative of that function), and
the relative rate of increase (relative to the amount)

Amounts, Rates, and Relative Rates of three growth processes: Linear, Exponential, Hyperbolic

Linear growth (blue lines) is the result of a constant rate or increment per time interval. The relative rate (size of increment in relation to existing quantity) is decreasing to zero.

Exponential growth (red lines) is the result of a linearly growing rate or increment per time interval. The relative rate is a constant. Think accrual of savings with fixed interest rate. Urban legend has it that Albert Einstein once declared compound interest – an exponential growth process – to be “the most powerful force in the universe”. Our intuition is ill-suited to deal properly with exponential effects, and in many ways it seems hard to conceive of even faster growth processes. However, even with exponential growth it takes an infinite time to reach an infinitely large amount.

Hyperbolic growth (brown lines) is the result of a quadratically growing rate. In this type of growth even the relative rate is increasing. This can be caused by auto-catalytic effects, in other words, the larger the amount, the larger the growth of the rate. As a result, such growth leads to infinite values at a finite value of t – also called a discontinuity or singularity.

When multiple entities grow and compete for limited resources, their growth will determine the outcome as a distribution of the resource as follows:

• Linear growth leads to coexistence of all competitors; their ratios determined by their linear growth rates.
• Exponential growth leads to reversible selection of a winner (with the highest relative growth rate). Reversible since a competitor with a higher relative growth rate will win, regardless of when it enters the competition.
• Hyperbolic growth leads to irreversible selection of a winner (first to dominate). Irreversible since the relative growth rate of the dominant competitor dwarfs that of any newcomer.

Such processes have been studied in detail in biology (population dynamics, genetics, etc.) It’s straightforward to imagine the combination of random fluctuations, exponential (or faster) growth and ‘Winner-take-all’ selection as the main driving processes of self-organized pattern formation in biology, such as in leopard spots or zebra stripes, all the way to the complex structure-formation process of morphogenesis and embryology.

Yet such processes tend to also occur in economics. For example, the competition for PC operating system platforms was won by Microsoft’s Windows due to the strong advantages of incumbents (applications, tools, developers, ecosystem, etc.) Similar effects can be seen with social networks, where competitors (like FaceBook) become disproportionately stronger as a result of the size of their network. I suspect that it also plays a central role in the evolution of inequality, which can be viewed as the dynamic formation of structure (viewed as the unequal allocation of wealth across a population).

Two popular technology concepts owe their existence to nonlinear growth processes:

• Exponential Growth: The empirical Moore’s Law states that computer power doubles every 18 months or so (similar for storage capacity, transistors on chips and network bandwidth). This allows us to forecast fairly accurately when machines will have certain capacities which seem unimaginable only a few decades earlier. For example, computer power increases by a factor of 1000 in only 15 years, or a million-fold in 30 years or the span of just one human generation!
• Hyperbolic Growth: Futurist Ray Kurzweil has observed that the doubling period of many aspects of our knowledge society is shrinking. From this observation of an “ever-accelerating rate of technological change” he concludes in his latest book that “The Singularity Is Near“, with profound technological and philosophical implications.

In many cases, empirical growth observations and measurements can be compared with mathematical models to either verify or falsify hypothesis about the underlying mechanisms controlling the growth processes. For example, world population growth has been tracked closely. To understand the strong increase of world population as a whole over the last hundred years or so one needs to look at the drivers (birth and mortality rates) and their key influencing factors (medical advances, agriculture). Many countries still have high birth rates, while medical advances and better farming methods have driven down the mortality rates. As a result, population has grown exponentially for many decades. (See also the wonderful 2min video visualization of this concept linked to from the previous post on “7 Billion“.) Short of increasing the mortality rate, it is evident that population stabilization (i.e. reduction of growth to zero) can only be achieved by reducing the birth rate. This in turn influences the policy debates, for example to empower women so they have less children (better education and economic prospects, access to contraception, etc.). Here is a graphic on world population growth rates:

Population growth rates in percent (source: Wikipedia, 2011 estimates)

Compare this to the World maps showing population age structure in the Global Trends 2025 post. There is a strong correlation between how old a population is and how high the birth rates are. (Note Africa standing out in both graphs.)

Decay

Conversely one can study processes of decay or decline, again with qualitatively different outcomes for given rates of decline such as linear or exponential. One interesting, mathematically inspired analysis related to decay processes comes from the ‘Gravity and Levity’ Blog in the post “Your body wasn’t built to last: a lesson from human mortality rates“. The article starts out with the observation that our likelihood of dying say in the next year doubles every 8 years. Since the mortality rate is increasing exponentially, the likelihood of survival is decreasing super-exponentially. The empirical data matches the rates forecast by the Gompertz Law of mortality almost perfectly.

Death and Survival Probability in the US (Source: Wolfram Alpha)

If the death rate were to grow exponentially – i.e. with a fixed increase per time interval – the resulting survival probability would follow an exponential distribution. If, however, the death rate is growing super-exponentially – i.e. with a doubling per fixed time interval – the survival probability follows a Gompertz distribution.

Lets look at a table similar to the above, this time contrasting three decay processes (rows below): Linear, Exponential, Super-Exponential. (Again we consider the amount, absolute rate and relative rate (columns below) as follows (constants chosen to match initial condition F[0] = 1):

Amounts, Rates, and Relative Rates of three decay processes: Linear, Exponential, Super-Exponential

The linear decay (blue lines) is characterized by a constant rate and reaches zero at a time proportional to the initial amount, at which the relative rate has a discontinuity.

The exponential decay (red lines) is characterized by a constant relative rate and thus leads to a steady, but long-lasting decay (like radio-active decay).

The super-exponential decay (brown lines) leads to the amount following a Gompertz distribution (matching the shape of the US survival probability chart above). For a while the decay rate remains very small near zero. Then it ramps up quickly and leads to a steep decline in the amount, which in turn reduces the rate down as well. The relative rate keeps growing exponentially.

The above linked article goes on to analyze two hypotheses on dominant causes of human death: The single lightning bolt and the accumulated lightning bolt model. If the major causes of death were singular or cumulative accidents (like lightning bolts or murders), the resulting survival probability curves would have a much longer tail. In other words, we would see at least some percentage of human beings living to ages beyond 130 or even 150 years. Since such cases are practically never observed, the underlying process must be different and the lightning bolt model is not able to explain human mortality.

Instead, a so called “cops and criminals” model is proposed based upon biochemical processes in the human body. “Cops” are cells who patrol the body and eliminate bad mutations (“criminals”) which when unchecked can lead to death. From the above post:

The language of “cops and criminals” lends itself very easily to a discussion of the immune system fighting infection and random mutation.  Particularly heartening is the fact that rates of cancer incidence also follow the Gompertz law, doubling every 8 years or so.  Maybe something in the immune system is degrading over time, becoming worse at finding and destroying mutated and potentially dangerous cells.

Unfortunately, the full complexity of human biology does not lend itself readily to cartoons about cops and criminals.  There are a lot of difficult questions for anyone who tries to put together a serious theory of human aging.  Who are the criminals and who are the cops that kill them?  What is the “incubation time” for a criminal, and why does it give “him” enough strength to fight off the immune response?  Why is the police force dwindling over time?  For that matter, what kind of “clock” does your body have that measures time at all?

There have been attempts to describe DNA degradation (through the shortening of your telomeres or through methylation) as an increase in “criminals” that slowly overwhelm the body’s DNA-repair mechanisms, but nothing has come of it so far.  I can only hope that someday some brilliant biologist will be charmed by the simplistic physicist’s language of cops and criminals and provide us with real insight into why we age the way we do.

A web calculator for death and survival probability based on Gompertz Law can be found here.

Posted by on January 12, 2012 in Medical, Scientific, Socioeconomic

## Scientific Research Trends

The site worldmapper.org has published hundreds of cartogram world maps; cartograms are geographic maps with the size of the depicted areas proportional to a specified metric. This leads to the distorted versions of countries or entire continents relative to the original geographical size we are used to. (We recently looked at cartograms of world mobile phone adoption here.)

One interesting set of cartograms from worldmapper.org relates to scientific research. The first shows the amounts of science papers (as of 2001) authored by people living in the respective areas:

Science Research (Number of research articles, Source: Worldmapper.org)

Another shows the growth in the above number between 1990 and 2001:

Science Growth (Change in Number of research articles, Source: Worldmapper.org)

From worldmapper.org:

This map shows the growth in scientific research of territories between 1990 and 2001. If there was no increase in scientific publications that territory has no area on the map.

In 1990, 80 scientific papers were published per million people living in the world, this increased to 106 per million by 2001. This increase was experienced primarily in territories with strong existing scientific research. However, the United States, with the highest total publications in 2001, experienced a smaller increase since 1990 than that in Japan, China, Germany and the Republic of Korea. Singapore had the greatest per person increase in scientific publications.

It is worth noting that the trends depicted are based on data one decade old. It is likely, however, that those trends have continued over the past decade, something which Neil deGrasse Tyson points out with concern regarding the relative decline of scientific research in America in this YouTube video:

Another point Tyson emphasizes is the near total absence of scientific research from the entire continent of Africa as evidenced by the disappearance of the continent on the cartogram. With about a billion people living there it is one of the stark visualizations of the challenges they face to escape from their poverty trap.

Posted by on January 3, 2012 in Scientific, Socioeconomic

Tags: , ,

A lot has been written about economic inequality as measured by distribution of income, wealth, capital gains, etc. In previous posts such as Inequality, Lorenz-Curves and Gini-Index or Visualizing Inequality we looked at various market inequalities (market share and capitalization, donations, etc.) and their respective Gini coefficients.

With the recent rise of social media we have other forms of economy, in particular the economy of time and attention. And we have at least some measures of this economy in the form of people’s activities, subscriptions, etc. Whether it’s Connections on LinkedIn, Friends on FaceBook, Followers on Twitter – all of the social media platforms have some social currencies for attention. (Influence is different from attention, and measuring influence is more difficult and controversial – see for example the discussions about Klout-scores.)

Another interesting aspect of online communities is that of participation inequality. Jakob Nielsen did some research on this and coined the well-known 90-9-1 rule:

“In most online communities, 90% of users are lurkers who never contribute, 9% of users contribute a little, and 1% of users account for almost all the action.”

The above linked article has two nice graphics illustrating this point:

Illustration of participation inequality in online communities (Source: Jakob Nielsen)

As a user of Twitter for about 3 years now I decided to do some simple analysis, wondering about the degrees of inequality I would find there. Imagine you want to spread the word about some new event and send out a tweet. How many people you reach depends on how many followers you have, how many of those retweet your message, how many followers they have, how many other messages they send out and so on. Let’s look at my first twitter account (“tlausser”); here are some basic numbers of my followers and their respective followers:

Followers of tlausser Followers on Twitter

Some of my followers have no followers themselves, one has nearly 100,000. On average, they have about 3600 followers; however, the total of about 385,000 followers is extremely unequally distributed. Here are three charts visualizing this astonishing degree of inequality:

Of 107 followers, the top 5 have ~75% of all followers that can be reached in two steps. The corresponding Gini index of 0.90 is an example of extreme inequality. From an advertising perspective, you would want to focus mostly on getting these 5% to react to your message (i.e. retweet). In a chart with linear scale the bottom half does barely register.

Most of my followers have between 100-1000 followers themselves, as can be seen from this log-scale Histogram.

What kind of distribution is the number of followers? It seems that Log[x] is roughly normal distributed.

As for participation inequality, let’s look at the number of tweets that those (107) followers send out.

Some of them have not tweeted anything, the chattiest has sent more than 16,000 tweets. On average, each follower has 1280 tweets; the total of 137,000 tweets is again highly unequally distributed for a Gini index of 0.77.

The top 10 make up about 2/3 of the entire conversation.

Again the bottom half hardly contributes to the number of tweets; however, the ramp in the top half is longer and not quite as steep as with the number of followers. Here is the log-scale Histogram:

I did the same type of analysis for several other Twitter Users in the central range (between 100-1000 follower). The results are similar, but certainly not yet robust enough to statistical sampling errors. (A larger scale analysis would require a higher twitter API limit than my free 350 per hour.)

These preliminary results indicate that there are high degrees of inequality regarding the number of tweets people send out and even more so regarding the number of followers they accumulate. How many tweets Twitter users send out over time is more evenly distributed. How many followers they get is less evenly distributed and thus leads to extremely high degrees of inequality. I presume this is caused in part due to preferential attachment as described in Barabasi’s book “Linked: The new science of networks“. Like with all forms of attention, who people follow depends a lot on who others are following. There is a very long tail of small numbers of followers for the vast majority of Twitter users.

That said, the degree of participation inequality I found was lower than the 90-9-1 rule, which corresponds to an extreme Gini index of about 0.96. Perhaps that’s a sign of the Twitter community having evolved over time? Or perhaps just a sign of my analysis sample being too small and not representative of the larger Twitterverse.

In some way these new media are refreshing as they allow almost anyone to publish their thoughts. However, it’s also true that almost all of those users remain in relative obscurity and only a very small minority gets the lion share of all attention. If you think economic inequality is too high, keep in mind that attention inequality is far higher. Both are impacting the policy debate in interesting ways.

Turning social media attention into income is another story altogether. In his recent Blog post “Turning social media attention into income“, author Srininvas Rao muses:

“The low barrier to entry created by social media has flooded the market with aspiring entrepreneurs, freelancers, and people trying to make it on their own. Standing out in it is only half the battle. You have to figure out how to turn social media attention into social media income. Have you successfully evolved from blogger to entrepreneur? What steps should I take next?”

Posted by on December 6, 2011 in Industrial, Scientific, Socioeconomic

Tags: , , ,

## World Cartogram of Mobile Phone Adoption

Under the slogan “Our Changing World”, FedEx has developed a website with various cartograms showing world-wide socio-economic changes based on publicly available data from sources such as World Bank, UNESCO, World Health Organization and others.

Cartograms visualize a particular metric by adjusting a country’s size corresponding to that metric. It leaves country neighborhood relationships (which we blogged about here) intact, but inflates or deflates countries, often dramatically so. Here is a series of three cartograms showing the adoption of mobile phones in the years 1995, 2000, and 2008. Size of each country is proportional to the density of mobile phones (average # mobile phones per 100 people).

Mobile Phone Density 1995

Mobile Phone Density 2000

Mobile Phone Density 2008

From the Topic Info on the Mobile Phone Presence display:

In 1996, mobile phones were a Nordic phenomenon. A Swede was twice as likely as an American to own one, and five times as likely as a German. Skip forward four years and the picture changed radically. Mobile phone usage boomed ten-fold across Europe; most European nations caught up with their northern neighbours. Eight years later. Africa suddenly loomed large. Mobile-phone penetration in same emerging economies now outstrips that of the developed world; Algeria tops the US. In most countries, mobile phone use is now ubiquitous. Lacking a mobile phone is more striking today than possessing one.

Indeed, it’s hard to find a country with very small mobile phone presence – and then to pinpoint it on the cartogram. One country I found was Cuba: While most countries in the Americas have between 50-100, Cuba has only 3 mobile phones per 100 people.

A few months ago Nathan Yau covered this topic on his FlowingData Blog here. As he already suggested, there are many more data to explore on FedEx’s website, so check it out for yourself here.

1 Comment

Posted by on November 20, 2011 in Industrial, Scientific, Socioeconomic

## The Observatory of Economic Complexity

In this second part we will look at the online interactive visualizations as a companion to the first part’s Atlas of Economic Complexity. It’s interesting that the authors chose the title “Observatory”, as if to convey that with a good (perhaps optical) instrument you can reveal otherwise hidden structure. To repeat one of the fundamental tenets of this Blog: Interactive graphics allow the user to explore data sets and thus to develop a better understanding of the structure and potentially create otherwise inaccessible insights. This is a good example.

The two basic dimensions for exploration of trade data are products and countries. The most recent world trade data is from 2009 and it ranges back between 20 to 50 years (varying by country). I worked with three types of charts: TreeMaps, Stacked Area Charts, and the Product Space network diagram. Let’s start with Germany’s Exports in 2009:

Hovering the cursor over a node highlights it’s details, here “Printing Presses”, a product type where Germany enjoys a high degree of Revealed Comparative Advantage (RCA). (For details on RCA or any other aspects of the product space concept and network diagram, please see the previous post on the Atlas of Economic Complexity.) We can now explore which other countries are exporting printing presses:

While Germany clearly dominates this world market with 55% at \$2.7b in 2009 with RCA = 5.6, the time slider at the bottom (with data since 1975) reveals that it has actually held an even bigger lead for most of the last 35 years. For example, with it’s exports in Printing Presses Germany commanded 72% at 3.7b in 2001 with RCA = 6.3 From the timeline one can also see how the United States captured about 20% of this (then much smaller) market for a brief period between 1979 and 1983. During this time its RCA for Printing Presses was just a bit above 1.0 – which shows as a black square in the Product Space – but the United States has since lost this advantage and not seen any significant exports in this product type. Printing Presses being a fairly complex product, only a handful of countries are exporting them, almost all of them European and Japan. There might be an interesting correlation between complexity and inequality, as the capabilities for the production of complex products tend to cluster in a few countries worldwide which then dominate world exports accordingly.

Another powerful instrument are Stacked Area Charts. Here you can see how a country’s Imports or Exports evolve over time, either in terms of absolute value or relative share of product types. For example, let’s look at the last 30 years (1978-2008) of Export data for the United States:

This GIF file (click if not animated) shows several frames. In Value display style one can see the absolute size and how Exports grew roughly 10-fold from about \$100b to \$1t over the course of those 30 years. The Share display style focuses on relative size, with all Exports always representing 100%. In the Observatory one can hover over any product type and thus highlight that color band to see the evolution of this product type’s Exports over time. In the highlighted example here, we can see how ‘Cereal and Vegetable Oil’ (yellow band) shrank from around 15% in the late seventies to around 5% since the late nineties. ‘Chemicals and Health Related Products’ (purple band) has remained more or less constant around a 10% Export share. ‘Electronics’ bloomed in the mid eighties from less than 10% to 15-20% and stayed on the high end of that range until around the year 2000 before shrinking in the last decade down to about 10%.

As a final example, look at the relative size of imports of the United States over the last 40 years, (1968 – 2008, sorted by final value):

The biggest category is crude petroleum products at the bottom. During the two oil shocks in the seventies the percentage peaked near 30% of all imports. Then it went down and stayed below 10% between 1985 – 2005. Since then it’s percentage has been steadily rising and reached about 15% again. (The data isn’t enough up-to-date to illustrate the impact of the 2008 recession.) Such high expenses are crowding out other categories. When the consumer pays more at the pump there is less to spend for other product types. Another interesting aspect of this last chart is that the bottom two bands represent opposite ends of the product complexity spectrum: Petroleum (brown) on the low end, cars (blue) on the high end.

As always, the real power of interactive visualizations comes from interacting with them. So I encourage you to explore these data at the Observatory of Economic Complexity.

Caveats: I noticed a couple of minor areas which seem to be either incomplete, counter-intuitive, poor design choices or simply implementation bugs. To start, there is no help or documentation of the visualization tool itself. Many of the diagram types on the left are grayed out and it is not always apparent what selection of products, countries or chart type will enable certain subselections. For example, there is a chart type “Predictive Tools” with two subtypes “Density Bars” and “Stepping Stone” that always seem to be grayed out? The same applies to Maps (presumably geographic maps) – all subtypes are grayed out. Perhaps I am missing something – would appreciate any comments if that’s the case.

In the TreeMaps for import and export one can not see the overall value of the overall trade (top-level rectangle) or any of the categories (second-level rectangles). Only the tooltips will show the value of a specific product type or country (third-level rectangle). The color legend is designed for the product space and designates the 34 communities of product types. When you hover the mouse over one product type, say garments (in green), then all imports / exports other than that product type are grayed out. When you show a product import / export chart, however, those same colors are used to designate groups of countries with color indicating continents (blue for Europe, red for the Americas, green for Asia etc.). Yet when you hover over the product icon in the legend (say garment), then only it’s corresponding color’s countries remains highlighted, which doesn’t make sense and can be misleading.
When you play the timeline in a TreeMap, the frequent change in layout can be confusing. A change from one year to the next played back and forth slowly or multiple times can be instructive, but a quick series of too many changes (particularly without seeing the labels) is just confusing.

In the stacked area charts when you click on Build Visualization it always comes up in “Value” style, even if “Share” is selected. To get to the Share style, you have to select Value and then Share again.

TreeMaps and Stacked Area Charts critically depend on the availability of data for all products / countries displayed. For years before 1990 there appear to be pockets of only sparsely available data, which then falsely suggests world market dominance of those products or countries. For example, the TreeMap for Imports in Printing Presses for 1983 shows the United States with 97% taking practically the entire market. In 1984, it’s share shrinks to a more balanced 28% despite growing very rapidly; simply because data for other countries from Europe, Asia etc. seems to not be available prior to 1984. In such cases it would have been better to show the rest as gray rectangle instead of leaving it out (if world import data are available) or just not display any chart for years with grossly incomplete data.

Navigation is somewhat limited. For example, looking at a country chart (say United Kingdom), it would be great to click on any product type (say crude petroleum) and get to a corresponding Stacked Area Chart diagram for that product type. One can do so using the drop-down boxes on the right, but that’s less intuitive.

There are two export formats (PDF and SVG). The vector graphics is a good choice since the fonts can be rendered fine even in the small print. I obtained poor results with PDF, however, as often the texts in TreeMaps were not aligned properly and printed on top of one another.

None of the above is a serious problem or even a showstopper. It would be great, however, if there was a feedback link to provide such info back to the authors and help improve the utility of this observatory.

1 Comment

Posted by on November 14, 2011 in Industrial, Scientific, Socioeconomic

Tags: , ,

## The Atlas of Economic Complexity

Here is a recipe: Bring together renowned faculties like the MIT Media Lab and Harvard’s Center for International Development. Combine novel ideas about economic measures with years of solid economic research. Leverage large sets of world trade data. Apply network graph theory algorithms and throw in some stunning visualizations. The result: The Atlas of Economic Complexity, a revolutionary way of looking at world trade and understanding variations in countries paths to prosperity.

The main authors are Professors Ricardo Hausmann from Harvard and Cesar Hidalgo from MIT (whose graphic work on Human Development Indices we have reviewed here). The underlying research began in 2006 with the idea of the product space which was published in Science in 2007. This post is the first in a two-part series covering both the atlas (theory, documentation) as well as the observatory (interactive visualization) of economic complexity. This research is an excellent example of how the availability of large amounts of data, computing power and free distribution via the Internet enable entirely new ways of looking at and understanding our world.

The Atlas of Economic Complexity is rooted in a set of ideas about how to measure economies based not just on the quantity of products traded, but also on the required knowledge and capabilities to produce them. World Trade data allows us to measure import and export product quantities directly, leading to indicators such as GDP, GDP per capita, Growth of GDP etc. However, we have no direct way to measure the knowledge required to create the products. A central observation is that complex products require more capabilities to produce, and countries who manufacture more complex products must possess more of these capabilities than others who do not. From Part I of the Atlas:

Ultimately, the complexity of an economy is related to the multiplicity of useful knowledge embedded in it. For a complex society to exist, and to sustain itself, people who know about design, marketing, finance, technology, human resource management, operations and trade law must be able to interact and combine their knowledge to make products. These same products cannot be made in societies that are missing parts of this capability set. Economic complexity, therefore, is expressed in the composition of a country’s productive output and reflects the structures that emerge to hold and combine knowledge.

Can we analyze world trade data in such a way as to tease out relative rankings in terms of these capabilities?

To this end, the authors start by looking at the trade web of countries exporting products. For each country, they examine how many different products it is capable of producing; this is called the country’s Diversity. And for each product, they look at how many countries can produce it; this is called the product’s Ubiquity. Based on these two measures, Diversity and Ubiquity, they introduce two complexity measures: The Economic Complexity Index (ECI, for a country) and the Product Complexity Index (PCI, for a product).

The mechanics of how these measures are calculated are somewhat sophisticated. Yet they encode some straightforward observations and are explained with some examples:

Take medical imaging devices. These machines are made in few places, but the countries that are able to make them, such as the United States or Germany, also export a large number of other products. We can infer that medical imaging devices are complex because few countries make them, and those that do tend to be diverse. By contrast, wood logs are exported by most countries, indicating that many countries have the knowledge required to export them. Now consider the case of raw diamonds. These products are extracted in very few places, making their ubiquity quite low. But is this a reflection of the high knowledge-intensity of raw diamonds? Of course not. If raw diamonds were complex, the countries that would extract diamonds should also be able to make many other things. Since Sierra Leone and Botswana are not very diversified, this indicates that something other than large volumes of knowledge is what makes diamonds rare.

A useful question is this: If a good cannot be produced in a country, where else can it be produced? Countries with higher economic complexity tend to produce more complex products which can not easily be produced elsewhere. The algorithms are specified in the Atlas, but we will skip over these details here. Let’s take a look at the ranking of some 128 world countries (selected above minimum population size and trade volume as well as for reliable trade data availability).

Why is Economic Complexity important? The Atlas devotes an entire chapter to this question. The most important finding here is that ECI is a better predictor of a country’s future growth than many other commonly used indicators that measure human capital, governance or competitiveness.

Countries whose economic complexity is greater than what we would expect, given their level of income, tend to grow faster than those that are “too rich” for their current level of economic complexity. In this sense, economic complexity is not just a symptom or an expression of prosperity: it is a driver.

They include a lot of scatter-plots and regression analysis measuring the correlation between the above and other indicators. Again, the interested reader is referred to the original work.

Another interesting question is how Economic Complexity evolves. In some ways this is like a chicken & egg problem: For a complex product you need a lot of capabilities. But for any capability to provide value you need some products that require it. If a new product requires several capabilities which don’t exist in a country, then starting the production of such a product in the country will be hard. Hence, a country’s products tend to evolve along the already existing capabilities. Measuring the similarities in required capabilities directly would be fairly complicated. However, as a first approximation, one can deduce that products which are more often produced by the same country tend to require similar capabilities.

So the probability that a pair of products is co-exported carries information about how similar these products are. We use this idea to measure the proximity between all pairs of products in our dataset (see Technical Box 5.1 on Measuring Proximity). The collection of all proximities is a network connecting pairs of products that are significantly likely to be co-exported by many countries. We refer to this network as the product space and use it to study the productive structure of countries.

Then the authors proceed to visualize the Product Space. It is a graph with some 774 nodes (products) and edges representing the proximity values between those nodes. Only the top 1% strongest proximity edges are shown to keep the average degree of the graph below 5 (showing too many connections results in visual complexity). Network Science Algorithms are used to discover the highly connected communities into which the products naturally group. Those 34 communities are then color-coded. Using a combination of Minimum-Spanning-Tree and Force-Directed layout algorithms the network is then laid out and manually optimized to minimize edge crossings. The resulting Product Space graph looks like this:

Here the node size is determined by world trade volume in the product. If you step back for a moment and reflect on how much data is aggregated in such a graph it is truly amazing! One variation of the graph determines size by the Product Complexity as follows:

In this graph one can see that products within a community are of similar complexity, supporting the idea that they require similar capabilities, i.e. have high proximity. From these visualizations one can now analyze how a country moves through product space over time. Specifically, in the report there are graphs for the four countries Ghana, Poland, Thailand, and Turkey over three points in time (1975, 1990, 2009). From the original document I put together a composite showing the first two countries, Ghana and Poland.

While Ghana’s ECI doesn’t change much, Poland grows into many products similar to those where they started in 1975. This clearly increases Poland’s ECI and contributes to the strong growth Poland has seen since 1975. (Black squares show products produced by the country with a Revealed Comparative Advantage RCA > 1.0.)

In all cases we see that new industries –new black squares– tend to lie close to the industries already present in these countries. The productive transformation undergone by Poland, Thailand and Turkey, however, look striking compared to that of Ghana. Thailand and Turkey, in particular, moved from mostly agricultural societies to manufacturing powerhouses during the 1975-2009 period. Poland, also “exploded” towards the center of the product space during the last two decades, becoming a manufacturer of most products in both the home and office and the processed foods community and significantly increasing its participation in the production of machinery. These transformations imply an increase in embedded knowledge that is reflected in our Economic Complexity Index. Ultimately, it is these transformations that underpinned the impressive growth performance of these countries.

The Atlas goes on to provide rankings of countries along five axes such as ECI, GDP per capita Growth, GDP Growth etc. The finding that higher ECI is a strong driver for GDP growth allows for predictions about GDP Growth until 2020. In that ranking there are Sub-Saharan East Africa countries on the top (8 of the Top 10), led by Uganda, Kenya and Tanzania. Here is the GDP Growth ranking in graphical form – the band around the Indian Ocean is where the most GDP Growth is going to happen during this decade.

Each country has its own Product Space map. It shows which products and capability sets the country already has, which other similar products it could produce with relatively few additional capabilities and where it is more severely lacking. As such it can provide both the country or a multi-national firm looking to expand with useful information. The authors sum up the chapter on how this Atlas can be used as follows:

A map does not tell people where to go, but it does help them determine their destination and chart their journey towards it. A map empowers by describing opportunities that would not be obvious in the absence of it. If the secret to development is the accumulation of productive knowledge, at a societal rather than individual level, then the process necessarily requires the involvement of many explorers, not just a few planners. This is why the maps we provide in this Atlas are intended for everyone to use.

We will look at the rich visualizations of the data sets in this Atlas in a forthcoming second installment of this series.

Posted by on November 10, 2011 in Industrial, Scientific, Socioeconomic

Tags: , , ,

## Implementation of TreeMap

After posting on TreeMaps twice before (TreeMap of the Market and original post here) I wanted to better understand how they can be implemented.

In his book “Visualize This” – which we reviewed here – author Nathan Yau has a short chapter on TreeMaps, which he also published on his FlowingData Blog here. He is working with the statistical programming language R and uses a library which implements TreeMaps. While this allows for very easy creation of a TreeMap with just a few lines of code, from the perspective of how the TreeMap is constructed this is still a black box.

I searched for existing implementations of TreeMaps in Mathematica (which I am using for many visualization projects). Surprisingly I didn’t find any implementations, despite the 20 year history of both the Mathematica platform and the TreeMap concept. So I decided to learn by implementing a TreeMap algorithm myself.

Let’s recap: A TreeMap turns a tree of numeric values into a planar, space-filling map. A rectangular area is subdivided into smaller rectangles with sizes in relation to the values of the tree nodes. The color can be mapped based on either that same value or some other corresponding value.

One algorithm for TreeMaps is called slice-and-dice. It starts at the top-level and works recursively down to the leaf level of the tree. Suppose you have N values at any given level of the tree and a corresponding rectangle.
a) Sort the values in descending order.
b) Select the first k values (0<k<N) which sum to at least the split-ratio of the values total.
c) Split the rectangle into two parts according to split-ratio along its longer side (to avoid very narrow shapes).
d) Allocate the first k values to the split-off part, the remaining N-k values to the rest of the rectangle.
e) Repeat as long as you have sublists with more than one value (N>1) at current level.
f) For each node at current level, map its sub-tree onto the corresponding rectangle (until you reach leaf level).

As an example, consider the list of values {6,5,4,3,2,1}. Their sum is 21. If we have a split-ratio parameter of say 0.4, then we split the values into {6,5} and {4,3,2,1} since the ratio (6+5)/21 = 0.53 > 0.4, then continue with {6,5} in the first portion of the rectangle and with {4,3,2,1} in the other portion.

Let's look at the results of such an algorithm. Here I'm using a two-level tree with a branching factor of 6 and random values between 0 (dark) and 100 (bright). The animation is iterating through various split-ratios from 0.1 to 0.9:

Notice how the layout changes as a result of the split-ratio parameter. If it’s near 0 or 1, then we tend to get thinner stripes; when it’s closer to 0.5 we get more square shaped containers (i.e. lower aspect ratios).

The recursive algorithm becomes apparent when we use a tree with two levels. You can still recognize the containers from level 1 which are then sub-divided at level 2:

One of the fundamental tenets of this Blog is that interactive visualizations lead to better understanding of structure in the data or of the dynamic properties of a model. You can interact with this algorithm in the TreeMap model in Computable Document Format (CDF). Simply click on the graphic above and you get redirected to a site where you can interact with the model (requires one-time loading of the free CDF Browser Plug-In). You can change the shape of the outer rectangle, adjust the tree level and split-ratio and pick different color-schemes. The values are shown as Tooltips when you hover over the corresponding rectangle. You also have access to the Mathematica source code if you want to modify it further. Here is a TreeMap with three levels:

Of course a more complete implementation would allow to vary the color-controlling parameter, to filter the values and to re-arrange the dimensions as different levels of the tree. Perhaps someone can start with this Mathematica code and take it to the next level. The previous TreeMap post points to several tools and galleries with interactive applications so you can experiment with that.

Lastly, I wanted to point out a good article by the creator of TreeMaps, Ben Shneiderman. In this 2006 paper called “Discovering Business intelligence Using Treemap Visualizations” he cites various BI applications of TreeMaps. Several studies have shown that TreeMaps allow users to recognize certain patterns in the data (like best and worst performing sales reps or regions) faster than with other more traditional chart techniques. No wonder that TreeMaps are finding their way into more and more tools and Dashboard applications.

Posted by on November 9, 2011 in Industrial, Scientific

## Fractals

While browsing the web for some Mathematica resources I came across Paul Nylander’s website on Fractals and other computer-created illustrations. Amazing stuff! Here are just a few images from his website. He has lots of information and often source-code with the images as well. Go check it out.

Posted by on September 27, 2011 in Art, Scientific

Tags: ,

## Visualizing Inequality

Measuring and visualizing inequality is often the starting point for further analysis of underlying causes. Only with such understanding can one systematically influence the degree of inequality or take advantage of it. In previous posts on this Blog we have already looked at some approaches, such as the Lorenz-Curve and Gini-Index or the Whale-Curve for Customer Profitability Analysis. Here I want to provide another visual method and look at various examples.

Inequality is very common in economics. Competitors have different share of and capitalization in a market. Customers have different profitability for a company. Employees have different incomes across the industry. Countries have different GDP in the world economy. Households have different income and wealth in a population.

The Gini Index is an aggregate measure for the degree of inequality of any given distribution. It ranges from 0.0 or perfect equality, i.e. every element contributes the same amount to 1.0 or the most extreme inequality, i.e. one element contributes everything and all other elements contribute nothing. (The previous post referenced above contains links to articles for the definition and calculation of the Gini index.)

There are several ways to visualize inequality, including the Lorenz-Curve. Here we look at one form of pie-charts for some discrete distributions. As a first example, consider the distribution of market capitalization among the Top-20 technology companies (Source: Nasdaq, Date: 9/17/11):

Market Cap of Top 20 Technology Companies on the Nasdaq

Apple, the largest company by far, is bigger than the bottom 10 combined. The first four (20%) companies – Apple, Microsoft, IBM, Google – are almost half of the entire size and thus almost the size of the other 16 (80%) combined. The pie-chart gives an intuitive sense of the inequality. The Gini Index gives a precise mathematical measure; for this discrete distribution it is 0.47

Another example is a look at the top PC shipments in the U.S. (Source: IDC, Date: Q2’11)

U.S. PC Shipments in Q2'11

There is a similar degree of inequality (Gini = 0.46). In fact, this degree of inequality (Gini index ~ 0.5) is not unusual for such distributions in mature industries with many established players. However, consider the tablet market, which is dominated by Apple’s iOS (Source: Strategy Analytics, Date: Q2’11)

Worldwide Tablet OS shipments in Q2'11

Apple’s iOS captures 61%, Android 30%, and the other 3 categories combined are under 10%. This is a much stronger degree of inequality with Gini = 0.74

To pick an example from a different industry, here are the top 18 car brands sold in the U.S. (Source: Market Data Center at WSJ.COM; Date: Aug-2011):

U.S. Total Car Sales in Aug-11

When comparing different the Gini index values for these kinds of distributions it is important to realize the impact of the number of elements. More elements in the distribution (say Top-50 instead of Top-20) usually increases the Gini index. This is due to the impact of additional very small players. Suppose for example, instead of the Top-18 you left out the two companies with the smallest sales, namely Saab and Subaru, and plotted only the Top-16. Their combined sales are less than 0.4% of the total, so one wouldn’t expect to miss much. Yet you get a Gini index of 0.49 instead of 0.54. So with discrete distributions and a relatively small number elements one risks comparing apples to oranges when there are different number of elements.

Consider as a last example a comparison of the above with two other distributions from my own personal experience – the list of base salaries of 30 employees reporting to me at one of my previous companies as well as the list of contributions to a recent personal charity fundraising campaign.

Gini Index Comparison

What’s interesting is that the salary distribution has by far the lowest amount of inequality. You wouldn’t believe that from the feelings of employees where many believe they are not getting their fair share and others are getting so much more… In fact, the skills and value contributions to the employer are probably far more unequal than the salaries! (Check out Paul Graham’s essays on “Great Hackers” for more on this topic!)
And when it comes to donations, the amount people are willing to give to charitable causes differs immensely. We have seen this already in a previous post on Gini-Index with recent U.S. political donations showing an astounding inequality of Gini index = 0.89. I challenge you to find a distribution across so many elements (thousands) which has greater inequality. If you find one, please comment on this Blog or email me as I’d like to know about it.

Posted by on September 22, 2011 in Industrial, Scientific, Socioeconomic

## Bit.ly link analysis on half-life of web content

The team at URL-shortening website Bit.ly has posted an interesting analysis on the attention span to links shared on the Internet via different social media platforms. This provides some quantification to what some have termed internet impatience. Most shared web links experience an initial burst of attention immediately after publication followed by a steep decay to near-zero relative activity. A useful measure is a link’s half-life, defined as the time interval between its peak frequency and half of the rest of all clicks over its lifetime.

From the Bit.ly Blog:

So we looked at the half life of 1,000 popular bitly links and the results were surprisingly similar. The mean half life of a link on twitter is 2.8 hours, on facebook it’s 3.2 hours and via ‘direct’ sources (like email or IM clients) it’s 3.4 hours. So you can expect, on average, an extra 24 minutes of attention if you post on facebook than if you post on twitter.

Distribution of web link half-lifes (Source: Bit.ly Blog)

This half-life distribution plot (x-axis 1 day = 86.400 seconds) of content shared via bit.ly links shows some interesting patterns:

• In general, content half-life is about 3 hours (10.000 sec)
• Content half-life does not depend on the medium through which it is shared
• YouTube content has a different distribution and a considerably longer half-life (about 7 hours)

One is tempted to relate such stats to one’s own browsing experience or look at systematic analysis of how people deal with shared links. For example, Microsoft’s Outlook team did extensive usability research on how people deal with incoming email so as to improve the usability of their mail reader. It was found that most emails fall into one of three categories (Open & Read immediately, Ignore & Discard, File & Flag for future reading). I speculate that bit.ly links received in Twitter or email will be similar, perhaps with the added category of retweet or forward (in the case of a story going viral). YouTube being different can perhaps be attributed to the fact that many videos require more time so we make a more deliberate decision as to whether and when we want to spend that time. For instance, one might say I want to watch this video tonight when I get home from work, which would fit with the 7 hours half-life.

In any event, such statistics show us that when it comes to clicking on shared links, our behavior is fairly predictable and probably driven by simple habits rather than complex thought. On one hand this allows good estimates for the expected life-time clicks. On the other hand, it can be a bit disconcerting to realize that our clicking behavior may be controlled by rather simple behavioral drivers (habitual classification, desire for instant gratification, out-of-sight out-of-mind, etc.). For instance, we usually give the most recent incoming news priority over other criteria of personal content preference. But is the latest really the greatest? I suspect that just like impulse-shopping there is a lot of impulse-clicking. And who does not know the exhausted feeling of getting lost while browsing and in hindsight regretting not having made the best use of one’s time… Perhaps this hints at more opportunities for more personalized and content-preference filtered news delivery mechanisms (such as the News reader app Zite, recently acquired by CNN).

1 Comment

Posted by on September 9, 2011 in Scientific, Socioeconomic