RSS

Category Archives: Industrial

Global Trends 2025

Global Trends 2025

If you like to do some big-picture thinking, here is a document put together by the National Intelligence Council and titled “Global Trends”. It is published every five years to analyze trends and forecast likely scenarios of worldwide development fifteen years into the future. The most recent is called “Global Trends 2025” and was published in November 2008. It’s a 120 page document which can be downloaded for free in PDF format here.

To get a feel for the content, here are the chapter headers:

  1. The Globalizing Economy
  2. The Demographics of Discord
  3. The New Players
  4. Scarcity in the Midst of Plenty?
  5. Growing Potential for Conflict
  6. Will the International System Be Up to the Challenges?
  7. Power-Sharing in a Multipolar World

From the NIC Global Trends 2025 project website:

Some of our preliminary assessments are highlighted below:

  • The whole international system—as constructed following WWII—will be revolutionized. Not only will new players—Brazil, Russia, India and China— have a seat at the international high table, they will bring new stakes and rules of the game.
  • The unprecedented transfer of wealth roughly from West to East now under way will continue for the foreseeable future.
  • Unprecedented economic growth, coupled with 1.5 billion more people, will put pressure on resources—particularly energy, food, and water—raising the specter of scarcities emerging as demand outstrips supply.
  • The potential for conflict will increase owing partly to political turbulence in parts of the greater Middle East.

As interesting as the topic may be, from a data visualization perspective the report is somewhat underwhelming. I counted just 5 maps and 5 charts in the entire document. The maps are interesting, such as the following on World Age Structure:

World Age Structure 2005

World Age Structure 2025 (Projected)

These maps show the different age of countries’ populations by geographical region. The Northern countries have less young people, and the aging trend is particularly strong for Eastern Europe and Japan. In 2025 almost all of the countries with very young population will be in Sub-Saharan Africa and the Arab Peninsula. Population growth will slow as a result; there will be approximately 8 billion people alive in 2025, 1 billion more than the 7 billion today.

In this day and age one is spoiled by interactive charts such as the Bubble-Charts of Gapminder’s Trendalyzer. Wouldn’t it be nice to have an interactive chart where you could set the Age intervals and perhaps filter in various ways (geographic regions, GDP, population, etc.) and then see the dynamic change of such colored world-maps over time? How much more insight would this convey about the changing demographics and relative sizes of age cohorts? Or perhaps display interactive population pyramids such as those found here by Jorge Camoes?

Another somewhat misguided ‘graphical angle’ are the slightly rotated graphics on the chapter headers. For example, Chapter 2 starts with this useful color-coded map of the Youth in countries of the Middle East. But why rotate it slightly and make the fonts less readable?

Youth in the Middle East (from Global Trends 2025 report)

I don’t want to be too critical; it’s just that reports put together with so much systematic research and focusing on long-range, international trends should employ more state-of-the-art visualizations, in particular interactive charts rather than just pages and pages of static text…

 
2 Comments

Posted by on January 4, 2012 in Industrial, Socioeconomic

 

Tags: , , ,

Inequality on Twitter

Inequality on Twitter

A lot has been written about economic inequality as measured by distribution of income, wealth, capital gains, etc. In previous posts such as Inequality, Lorenz-Curves and Gini-Index or Visualizing Inequality we looked at various market inequalities (market share and capitalization, donations, etc.) and their respective Gini coefficients.

With the recent rise of social media we have other forms of economy, in particular the economy of time and attention. And we have at least some measures of this economy in the form of people’s activities, subscriptions, etc. Whether it’s Connections on LinkedIn, Friends on FaceBook, Followers on Twitter – all of the social media platforms have some social currencies for attention. (Influence is different from attention, and measuring influence is more difficult and controversial – see for example the discussions about Klout-scores.)

Another interesting aspect of online communities is that of participation inequality. Jakob Nielsen did some research on this and coined the well-known 90-9-1 rule:

“In most online communities, 90% of users are lurkers who never contribute, 9% of users contribute a little, and 1% of users account for almost all the action.”

The above linked article has two nice graphics illustrating this point:

Illustration of participation inequality in online communities (Source: Jakob Nielsen)

As a user of Twitter for about 3 years now I decided to do some simple analysis, wondering about the degrees of inequality I would find there. Imagine you want to spread the word about some new event and send out a tweet. How many people you reach depends on how many followers you have, how many of those retweet your message, how many followers they have, how many other messages they send out and so on. Let’s look at my first twitter account (“tlausser”); here are some basic numbers of my followers and their respective followers:

Followers of tlausser Followers on Twitter

Some of my followers have no followers themselves, one has nearly 100,000. On average, they have about 3600 followers; however, the total of about 385,000 followers is extremely unequally distributed. Here are three charts visualizing this astonishing degree of inequality:

Of 107 followers, the top 5 have ~75% of all followers that can be reached in two steps. The corresponding Gini index of 0.90 is an example of extreme inequality. From an advertising perspective, you would want to focus mostly on getting these 5% to react to your message (i.e. retweet). In a chart with linear scale the bottom half does barely register.

Most of my followers have between 100-1000 followers themselves, as can be seen from this log-scale Histogram.

What kind of distribution is the number of followers? It seems that Log[x] is roughly normal distributed.

As for participation inequality, let’s look at the number of tweets that those (107) followers send out.

Some of them have not tweeted anything, the chattiest has sent more than 16,000 tweets. On average, each follower has 1280 tweets; the total of 137,000 tweets is again highly unequally distributed for a Gini index of 0.77.

The top 10 make up about 2/3 of the entire conversation.

Again the bottom half hardly contributes to the number of tweets; however, the ramp in the top half is longer and not quite as steep as with the number of followers. Here is the log-scale Histogram:

I did the same type of analysis for several other Twitter Users in the central range (between 100-1000 follower). The results are similar, but certainly not yet robust enough to statistical sampling errors. (A larger scale analysis would require a higher twitter API limit than my free 350 per hour.)

These preliminary results indicate that there are high degrees of inequality regarding the number of tweets people send out and even more so regarding the number of followers they accumulate. How many tweets Twitter users send out over time is more evenly distributed. How many followers they get is less evenly distributed and thus leads to extremely high degrees of inequality. I presume this is caused in part due to preferential attachment as described in Barabasi’s book “Linked: The new science of networks“. Like with all forms of attention, who people follow depends a lot on who others are following. There is a very long tail of small numbers of followers for the vast majority of Twitter users.

That said, the degree of participation inequality I found was lower than the 90-9-1 rule, which corresponds to an extreme Gini index of about 0.96. Perhaps that’s a sign of the Twitter community having evolved over time? Or perhaps just a sign of my analysis sample being too small and not representative of the larger Twitterverse.

In some way these new media are refreshing as they allow almost anyone to publish their thoughts. However, it’s also true that almost all of those users remain in relative obscurity and only a very small minority gets the lion share of all attention. If you think economic inequality is too high, keep in mind that attention inequality is far higher. Both are impacting the policy debate in interesting ways.

Turning social media attention into income is another story altogether. In his recent Blog post “Turning social media attention into income“, author Srininvas Rao muses:

“The low barrier to entry created by social media has flooded the market with aspiring entrepreneurs, freelancers, and people trying to make it on their own. Standing out in it is only half the battle. You have to figure out how to turn social media attention into social media income. Have you successfully evolved from blogger to entrepreneur? What steps should I take next?”

 
10 Comments

Posted by on December 6, 2011 in Industrial, Scientific, Socioeconomic

 

Tags: , , ,

World Cartogram of Mobile Phone Adoption

World Cartogram of Mobile Phone Adoption

Under the slogan “Our Changing World”, FedEx has developed a website with various cartograms showing world-wide socio-economic changes based on publicly available data from sources such as World Bank, UNESCO, World Health Organization and others.

Cartograms visualize a particular metric by adjusting a country’s size corresponding to that metric. It leaves country neighborhood relationships (which we blogged about here) intact, but inflates or deflates countries, often dramatically so. Here is a series of three cartograms showing the adoption of mobile phones in the years 1995, 2000, and 2008. Size of each country is proportional to the density of mobile phones (average # mobile phones per 100 people).

Mobile Phone Density 1995

Mobile Phone Density 2000

Mobile Phone Density 2008

From the Topic Info on the Mobile Phone Presence display:

In 1996, mobile phones were a Nordic phenomenon. A Swede was twice as likely as an American to own one, and five times as likely as a German. Skip forward four years and the picture changed radically. Mobile phone usage boomed ten-fold across Europe; most European nations caught up with their northern neighbours. Eight years later. Africa suddenly loomed large. Mobile-phone penetration in same emerging economies now outstrips that of the developed world; Algeria tops the US. In most countries, mobile phone use is now ubiquitous. Lacking a mobile phone is more striking today than possessing one.

Indeed, it’s hard to find a country with very small mobile phone presence – and then to pinpoint it on the cartogram. One country I found was Cuba: While most countries in the Americas have between 50-100, Cuba has only 3 mobile phones per 100 people.

A few months ago Nathan Yau covered this topic on his FlowingData Blog here. As he already suggested, there are many more data to explore on FedEx’s website, so check it out for yourself here.

 
1 Comment

Posted by on November 20, 2011 in Industrial, Scientific, Socioeconomic

 

Tags: , ,

The Observatory of Economic Complexity

The Observatory of Economic Complexity

In this second part we will look at the online interactive visualizations as a companion to the first part’s Atlas of Economic Complexity. It’s interesting that the authors chose the title “Observatory”, as if to convey that with a good (perhaps optical) instrument you can reveal otherwise hidden structure. To repeat one of the fundamental tenets of this Blog: Interactive graphics allow the user to explore data sets and thus to develop a better understanding of the structure and potentially create otherwise inaccessible insights. This is a good example.

The two basic dimensions for exploration of trade data are products and countries. The most recent world trade data is from 2009 and it ranges back between 20 to 50 years (varying by country). I worked with three types of charts: TreeMaps, Stacked Area Charts, and the Product Space network diagram. Let’s start with Germany’s Exports in 2009:

Hovering the cursor over a node highlights it’s details, here “Printing Presses”, a product type where Germany enjoys a high degree of Revealed Comparative Advantage (RCA). (For details on RCA or any other aspects of the product space concept and network diagram, please see the previous post on the Atlas of Economic Complexity.) We can now explore which other countries are exporting printing presses:

While Germany clearly dominates this world market with 55% at $2.7b in 2009 with RCA = 5.6, the time slider at the bottom (with data since 1975) reveals that it has actually held an even bigger lead for most of the last 35 years. For example, with it’s exports in Printing Presses Germany commanded 72% at 3.7b in 2001 with RCA = 6.3 From the timeline one can also see how the United States captured about 20% of this (then much smaller) market for a brief period between 1979 and 1983. During this time its RCA for Printing Presses was just a bit above 1.0 – which shows as a black square in the Product Space – but the United States has since lost this advantage and not seen any significant exports in this product type. Printing Presses being a fairly complex product, only a handful of countries are exporting them, almost all of them European and Japan. There might be an interesting correlation between complexity and inequality, as the capabilities for the production of complex products tend to cluster in a few countries worldwide which then dominate world exports accordingly.

Another powerful instrument are Stacked Area Charts. Here you can see how a country’s Imports or Exports evolve over time, either in terms of absolute value or relative share of product types. For example, let’s look at the last 30 years (1978-2008) of Export data for the United States:

This GIF file (click if not animated) shows several frames. In Value display style one can see the absolute size and how Exports grew roughly 10-fold from about $100b to $1t over the course of those 30 years. The Share display style focuses on relative size, with all Exports always representing 100%. In the Observatory one can hover over any product type and thus highlight that color band to see the evolution of this product type’s Exports over time. In the highlighted example here, we can see how ‘Cereal and Vegetable Oil’ (yellow band) shrank from around 15% in the late seventies to around 5% since the late nineties. ‘Chemicals and Health Related Products’ (purple band) has remained more or less constant around a 10% Export share. ‘Electronics’ bloomed in the mid eighties from less than 10% to 15-20% and stayed on the high end of that range until around the year 2000 before shrinking in the last decade down to about 10%.

As a final example, look at the relative size of imports of the United States over the last 40 years, (1968 – 2008, sorted by final value):

The biggest category is crude petroleum products at the bottom. During the two oil shocks in the seventies the percentage peaked near 30% of all imports. Then it went down and stayed below 10% between 1985 – 2005. Since then it’s percentage has been steadily rising and reached about 15% again. (The data isn’t enough up-to-date to illustrate the impact of the 2008 recession.) Such high expenses are crowding out other categories. When the consumer pays more at the pump there is less to spend for other product types. Another interesting aspect of this last chart is that the bottom two bands represent opposite ends of the product complexity spectrum: Petroleum (brown) on the low end, cars (blue) on the high end.

As always, the real power of interactive visualizations comes from interacting with them. So I encourage you to explore these data at the Observatory of Economic Complexity.

Caveats: I noticed a couple of minor areas which seem to be either incomplete, counter-intuitive, poor design choices or simply implementation bugs. To start, there is no help or documentation of the visualization tool itself. Many of the diagram types on the left are grayed out and it is not always apparent what selection of products, countries or chart type will enable certain subselections. For example, there is a chart type “Predictive Tools” with two subtypes “Density Bars” and “Stepping Stone” that always seem to be grayed out? The same applies to Maps (presumably geographic maps) – all subtypes are grayed out. Perhaps I am missing something – would appreciate any comments if that’s the case.

In the TreeMaps for import and export one can not see the overall value of the overall trade (top-level rectangle) or any of the categories (second-level rectangles). Only the tooltips will show the value of a specific product type or country (third-level rectangle). The color legend is designed for the product space and designates the 34 communities of product types. When you hover the mouse over one product type, say garments (in green), then all imports / exports other than that product type are grayed out. When you show a product import / export chart, however, those same colors are used to designate groups of countries with color indicating continents (blue for Europe, red for the Americas, green for Asia etc.). Yet when you hover over the product icon in the legend (say garment), then only it’s corresponding color’s countries remains highlighted, which doesn’t make sense and can be misleading.
When you play the timeline in a TreeMap, the frequent change in layout can be confusing. A change from one year to the next played back and forth slowly or multiple times can be instructive, but a quick series of too many changes (particularly without seeing the labels) is just confusing.

In the stacked area charts when you click on Build Visualization it always comes up in “Value” style, even if “Share” is selected. To get to the Share style, you have to select Value and then Share again.

TreeMaps and Stacked Area Charts critically depend on the availability of data for all products / countries displayed. For years before 1990 there appear to be pockets of only sparsely available data, which then falsely suggests world market dominance of those products or countries. For example, the TreeMap for Imports in Printing Presses for 1983 shows the United States with 97% taking practically the entire market. In 1984, it’s share shrinks to a more balanced 28% despite growing very rapidly; simply because data for other countries from Europe, Asia etc. seems to not be available prior to 1984. In such cases it would have been better to show the rest as gray rectangle instead of leaving it out (if world import data are available) or just not display any chart for years with grossly incomplete data.

Navigation is somewhat limited. For example, looking at a country chart (say United Kingdom), it would be great to click on any product type (say crude petroleum) and get to a corresponding Stacked Area Chart diagram for that product type. One can do so using the drop-down boxes on the right, but that’s less intuitive.

There are two export formats (PDF and SVG). The vector graphics is a good choice since the fonts can be rendered fine even in the small print. I obtained poor results with PDF, however, as often the texts in TreeMaps were not aligned properly and printed on top of one another.

None of the above is a serious problem or even a showstopper. It would be great, however, if there was a feedback link to provide such info back to the authors and help improve the utility of this observatory.

 
1 Comment

Posted by on November 14, 2011 in Industrial, Scientific, Socioeconomic

 

Tags: , ,

The Atlas of Economic Complexity

The Atlas of Economic Complexity

Here is a recipe: Bring together renowned faculties like the MIT Media Lab and Harvard’s Center for International Development. Combine novel ideas about economic measures with years of solid economic research. Leverage large sets of world trade data. Apply network graph theory algorithms and throw in some stunning visualizations. The result: The Atlas of Economic Complexity, a revolutionary way of looking at world trade and understanding variations in countries paths to prosperity.

The main authors are Professors Ricardo Hausmann from Harvard and Cesar Hidalgo from MIT (whose graphic work on Human Development Indices we have reviewed here). The underlying research began in 2006 with the idea of the product space which was published in Science in 2007. This post is the first in a two-part series covering both the atlas (theory, documentation) as well as the observatory (interactive visualization) of economic complexity. This research is an excellent example of how the availability of large amounts of data, computing power and free distribution via the Internet enable entirely new ways of looking at and understanding our world.

The Atlas of Economic Complexity is rooted in a set of ideas about how to measure economies based not just on the quantity of products traded, but also on the required knowledge and capabilities to produce them. World Trade data allows us to measure import and export product quantities directly, leading to indicators such as GDP, GDP per capita, Growth of GDP etc. However, we have no direct way to measure the knowledge required to create the products. A central observation is that complex products require more capabilities to produce, and countries who manufacture more complex products must possess more of these capabilities than others who do not. From Part I of the Atlas:

Ultimately, the complexity of an economy is related to the multiplicity of useful knowledge embedded in it. For a complex society to exist, and to sustain itself, people who know about design, marketing, finance, technology, human resource management, operations and trade law must be able to interact and combine their knowledge to make products. These same products cannot be made in societies that are missing parts of this capability set. Economic complexity, therefore, is expressed in the composition of a country’s productive output and reflects the structures that emerge to hold and combine knowledge.

Can we analyze world trade data in such a way as to tease out relative rankings in terms of these capabilities?

To this end, the authors start by looking at the trade web of countries exporting products. For each country, they examine how many different products it is capable of producing; this is called the country’s Diversity. And for each product, they look at how many countries can produce it; this is called the product’s Ubiquity. Based on these two measures, Diversity and Ubiquity, they introduce two complexity measures: The Economic Complexity Index (ECI, for a country) and the Product Complexity Index (PCI, for a product).

The mechanics of how these measures are calculated are somewhat sophisticated. Yet they encode some straightforward observations and are explained with some examples:

Take medical imaging devices. These machines are made in few places, but the countries that are able to make them, such as the United States or Germany, also export a large number of other products. We can infer that medical imaging devices are complex because few countries make them, and those that do tend to be diverse. By contrast, wood logs are exported by most countries, indicating that many countries have the knowledge required to export them. Now consider the case of raw diamonds. These products are extracted in very few places, making their ubiquity quite low. But is this a reflection of the high knowledge-intensity of raw diamonds? Of course not. If raw diamonds were complex, the countries that would extract diamonds should also be able to make many other things. Since Sierra Leone and Botswana are not very diversified, this indicates that something other than large volumes of knowledge is what makes diamonds rare.

A useful question is this: If a good cannot be produced in a country, where else can it be produced? Countries with higher economic complexity tend to produce more complex products which can not easily be produced elsewhere. The algorithms are specified in the Atlas, but we will skip over these details here. Let’s take a look at the ranking of some 128 world countries (selected above minimum population size and trade volume as well as for reliable trade data availability).

Why is Economic Complexity important? The Atlas devotes an entire chapter to this question. The most important finding here is that ECI is a better predictor of a country’s future growth than many other commonly used indicators that measure human capital, governance or competitiveness.

Countries whose economic complexity is greater than what we would expect, given their level of income, tend to grow faster than those that are “too rich” for their current level of economic complexity. In this sense, economic complexity is not just a symptom or an expression of prosperity: it is a driver.

They include a lot of scatter-plots and regression analysis measuring the correlation between the above and other indicators. Again, the interested reader is referred to the original work.

Another interesting question is how Economic Complexity evolves. In some ways this is like a chicken & egg problem: For a complex product you need a lot of capabilities. But for any capability to provide value you need some products that require it. If a new product requires several capabilities which don’t exist in a country, then starting the production of such a product in the country will be hard. Hence, a country’s products tend to evolve along the already existing capabilities. Measuring the similarities in required capabilities directly would be fairly complicated. However, as a first approximation, one can deduce that products which are more often produced by the same country tend to require similar capabilities.

So the probability that a pair of products is co-exported carries information about how similar these products are. We use this idea to measure the proximity between all pairs of products in our dataset (see Technical Box 5.1 on Measuring Proximity). The collection of all proximities is a network connecting pairs of products that are significantly likely to be co-exported by many countries. We refer to this network as the product space and use it to study the productive structure of countries.

Then the authors proceed to visualize the Product Space. It is a graph with some 774 nodes (products) and edges representing the proximity values between those nodes. Only the top 1% strongest proximity edges are shown to keep the average degree of the graph below 5 (showing too many connections results in visual complexity). Network Science Algorithms are used to discover the highly connected communities into which the products naturally group. Those 34 communities are then color-coded. Using a combination of Minimum-Spanning-Tree and Force-Directed layout algorithms the network is then laid out and manually optimized to minimize edge crossings. The resulting Product Space graph looks like this:

Here the node size is determined by world trade volume in the product. If you step back for a moment and reflect on how much data is aggregated in such a graph it is truly amazing! One variation of the graph determines size by the Product Complexity as follows:

In this graph one can see that products within a community are of similar complexity, supporting the idea that they require similar capabilities, i.e. have high proximity. From these visualizations one can now analyze how a country moves through product space over time. Specifically, in the report there are graphs for the four countries Ghana, Poland, Thailand, and Turkey over three points in time (1975, 1990, 2009). From the original document I put together a composite showing the first two countries, Ghana and Poland.

While Ghana’s ECI doesn’t change much, Poland grows into many products similar to those where they started in 1975. This clearly increases Poland’s ECI and contributes to the strong growth Poland has seen since 1975. (Black squares show products produced by the country with a Revealed Comparative Advantage RCA > 1.0.)

In all cases we see that new industries –new black squares– tend to lie close to the industries already present in these countries. The productive transformation undergone by Poland, Thailand and Turkey, however, look striking compared to that of Ghana. Thailand and Turkey, in particular, moved from mostly agricultural societies to manufacturing powerhouses during the 1975-2009 period. Poland, also “exploded” towards the center of the product space during the last two decades, becoming a manufacturer of most products in both the home and office and the processed foods community and significantly increasing its participation in the production of machinery. These transformations imply an increase in embedded knowledge that is reflected in our Economic Complexity Index. Ultimately, it is these transformations that underpinned the impressive growth performance of these countries.

The Atlas goes on to provide rankings of countries along five axes such as ECI, GDP per capita Growth, GDP Growth etc. The finding that higher ECI is a strong driver for GDP growth allows for predictions about GDP Growth until 2020. In that ranking there are Sub-Saharan East Africa countries on the top (8 of the Top 10), led by Uganda, Kenya and Tanzania. Here is the GDP Growth ranking in graphical form – the band around the Indian Ocean is where the most GDP Growth is going to happen during this decade.

Each country has its own Product Space map. It shows which products and capability sets the country already has, which other similar products it could produce with relatively few additional capabilities and where it is more severely lacking. As such it can provide both the country or a multi-national firm looking to expand with useful information. The authors sum up the chapter on how this Atlas can be used as follows:

A map does not tell people where to go, but it does help them determine their destination and chart their journey towards it. A map empowers by describing opportunities that would not be obvious in the absence of it. If the secret to development is the accumulation of productive knowledge, at a societal rather than individual level, then the process necessarily requires the involvement of many explorers, not just a few planners. This is why the maps we provide in this Atlas are intended for everyone to use.

We will look at the rich visualizations of the data sets in this Atlas in a forthcoming second installment of this series.

 
6 Comments

Posted by on November 10, 2011 in Industrial, Scientific, Socioeconomic

 

Tags: , , ,

Implementation of TreeMap

Implementation of TreeMap

After posting on TreeMaps twice before (TreeMap of the Market and original post here) I wanted to better understand how they can be implemented.

In his book “Visualize This” – which we reviewed here – author Nathan Yau has a short chapter on TreeMaps, which he also published on his FlowingData Blog here. He is working with the statistical programming language R and uses a library which implements TreeMaps. While this allows for very easy creation of a TreeMap with just a few lines of code, from the perspective of how the TreeMap is constructed this is still a black box.

I searched for existing implementations of TreeMaps in Mathematica (which I am using for many visualization projects). Surprisingly I didn’t find any implementations, despite the 20 year history of both the Mathematica platform and the TreeMap concept. So I decided to learn by implementing a TreeMap algorithm myself.

Let’s recap: A TreeMap turns a tree of numeric values into a planar, space-filling map. A rectangular area is subdivided into smaller rectangles with sizes in relation to the values of the tree nodes. The color can be mapped based on either that same value or some other corresponding value.

One algorithm for TreeMaps is called slice-and-dice. It starts at the top-level and works recursively down to the leaf level of the tree. Suppose you have N values at any given level of the tree and a corresponding rectangle.
a) Sort the values in descending order.
b) Select the first k values (0<k<N) which sum to at least the split-ratio of the values total.
c) Split the rectangle into two parts according to split-ratio along its longer side (to avoid very narrow shapes).
d) Allocate the first k values to the split-off part, the remaining N-k values to the rest of the rectangle.
e) Repeat as long as you have sublists with more than one value (N>1) at current level.
f) For each node at current level, map its sub-tree onto the corresponding rectangle (until you reach leaf level).

As an example, consider the list of values {6,5,4,3,2,1}. Their sum is 21. If we have a split-ratio parameter of say 0.4, then we split the values into {6,5} and {4,3,2,1} since the ratio (6+5)/21 = 0.53 > 0.4, then continue with {6,5} in the first portion of the rectangle and with {4,3,2,1} in the other portion.

Let's look at the results of such an algorithm. Here I'm using a two-level tree with a branching factor of 6 and random values between 0 (dark) and 100 (bright). The animation is iterating through various split-ratios from 0.1 to 0.9:

Notice how the layout changes as a result of the split-ratio parameter. If it’s near 0 or 1, then we tend to get thinner stripes; when it’s closer to 0.5 we get more square shaped containers (i.e. lower aspect ratios).

The recursive algorithm becomes apparent when we use a tree with two levels. You can still recognize the containers from level 1 which are then sub-divided at level 2:

One of the fundamental tenets of this Blog is that interactive visualizations lead to better understanding of structure in the data or of the dynamic properties of a model. You can interact with this algorithm in the TreeMap model in Computable Document Format (CDF). Simply click on the graphic above and you get redirected to a site where you can interact with the model (requires one-time loading of the free CDF Browser Plug-In). You can change the shape of the outer rectangle, adjust the tree level and split-ratio and pick different color-schemes. The values are shown as Tooltips when you hover over the corresponding rectangle. You also have access to the Mathematica source code if you want to modify it further. Here is a TreeMap with three levels:

Of course a more complete implementation would allow to vary the color-controlling parameter, to filter the values and to re-arrange the dimensions as different levels of the tree. Perhaps someone can start with this Mathematica code and take it to the next level. The previous TreeMap post points to several tools and galleries with interactive applications so you can experiment with that.

Lastly, I wanted to point out a good article by the creator of TreeMaps, Ben Shneiderman. In this 2006 paper called “Discovering Business intelligence Using Treemap Visualizations” he cites various BI applications of TreeMaps. Several studies have shown that TreeMaps allow users to recognize certain patterns in the data (like best and worst performing sales reps or regions) faster than with other more traditional chart techniques. No wonder that TreeMaps are finding their way into more and more tools and Dashboard applications.

 
4 Comments

Posted by on November 9, 2011 in Industrial, Scientific

 

Tags: , , ,

TreeMap of the Market

TreeMap of the Market

SmartMoney has an interactive visual tool on their website called “Map of the Market”. It is an application of the TreeMap concept developed by Ben Shneiderman which I have blogged about before here.

The map lets you watch more than 500 stocks at once, with data updated every 15 minutes. Each colored rectangle in the map represents an individual company. The rectangle’s size reflects the company’s market cap and the color shows price performance. (Green means the stock price is up; red means it’s down. Dark colors are neutral). Move the mouse over a company rectangle and a little panel will pop up with more information.

Map Of The Market (Source: SmartMoney website)

For example, the above map shows the 26 week performance with the Top 5 Losers highlighted (hovered over RIMM). More information from the corresponding Map Instructions page.

This map is also quite similar in concept to the StockTouch iPad app which I covered here. StockTouch displays 900 companies, grouped into 9 sectors. The above Map of the Market is a free service, with an available upgrade to one showing 1000 companies for a subscription fee. While interesting in its own right, however, this is not about the business model of how to monetize the use of such information.

It might be interesting to put together a time-lapse video showing this map for every close of business day throughout one year. Not only would one see the up and down movement by color, but also the gradual shifts in the cumulative size of various sectors due to the area in the tree map.

Another fascinating set of tree map uses is on display at the Gallery of the Hive Group website. Their interactive tree map product HoneyComb has been used in many different industries. The Gallery shows many examples, ranging from sales performance to manufacturing / quality applications to public interest uses such as browsing Olympic Games results or data on Earthquakes. See the following example screenshot (click to interact on the Hive Group website):

TreeMap of Earthquakes (Source: HiveGroup)

While you won’t get the full benefit of seeing the details of all 540 items in one view, you can filter using the panel controls on the right or change the grouping and size and color attributes. This shows for example that the most powerful earthquakes are generally not the most deadly ones and vice versa.

Interacting with these sample tree maps again drives home the fundamental notion that interactive visualizations lead to quicker grasp and better understanding of data sets. This is similar to how walking around and seeing an object from different perspectives gives you a better idea of it’s 3-D structure than seeing it just in one 2-D picture. With multiple ways of interacting it feels almost as if you’re walking inside the data set to see it from multiple angles and perspectives. You have to do it yourself to appreciate the difference it makes.

Lastly, a good article on some of the pitfalls of tree map design with lots of links to good/bad examples comes from the folks at Juice Analytics in their Blog post titled “10 lessons in Treemap Design“.

 
4 Comments

Posted by on October 29, 2011 in Financial, Industrial

 

Tags: , ,

Share and Inequality of Mobile Phone Revenues and Volumes

Share and Inequality of Mobile Phone Revenues and Volumes

The analyst website Asymco.com visualizes various financial indicators of mobile phone companies in this interactive vendor bubble chart (follow link, select “Vendor Charts”). It covers the following 8 companies: Apple, HTC, LG, Motorola, Nokia, RIM, Samsung, Sony Ericsson. From the “vendor data” tab I downloaded the data and looked at the revenue and volume distributions for the last 4 years.

Revenue Share of Mobile Phones and corresponding Gini Index

Note the sharp reduction in inequality of revenue distribution in the 9/1/08 quarter, when Apple achieved nearly 10x in revenue (and volume) compared to the year before. While the iPhone 1 was introduced a year earlier in 2007, in commercial terms the iPhone 3G started to have strong market impact when introduced in the second half of 2008.

Volume Share of Mobile Phones and Gini Index

Volume inequality is considerably higher (average Gini = 0.61) than Revenue inequality (0.43) due to two dominant shippers (Nokia and Samsung), which continue to lead the peer group in volume. Only recently has the inequality been reduced, i.e. the volumes are distributed more evenly. Apple’s growth in volume share has come at the expense of other players (mainly Motorola and Sony Ericsson).

Volume share is a lagging indicator regarding a company’s innovation and success. It can be dominated for a long time by players who are past their prime and in financial distress (like Nokia). Revenue is more useful to predict a company’s future growth and success. But the real story is told when comparing Profit. Apple’s (Smart Phone) Profit dwarfs that of the other 7 competitors:

Profit Comparison between 8 Mobile Phone Vendors (Source: Asymco.com)

Click on the image to go to Asymco’s interactive chart (requires Flash). The bubble chart display over time is very revealing regarding Apple’s meteoric rise.

 
2 Comments

Posted by on October 22, 2011 in Financial, Industrial

 

Tags: , ,

Market Capitalization Inequality in the Steve Jobs era

The excellent analyst website asymco.com recently published a post titled Visualizing the Steve Jobs era. In it they display an area chart of the relative size of market capitalization of about 15 companies they have tracked for the last 15 years.

Since I had looked at the Gini index of a similar set of companies in an earlier post on Visualizing Inequality I contacted the author Dirk Schmidt. Thankfully he shared the underlying data. From that I calculated the Gini index for every quarter and overlaid a line chart with their area chart.

Share of Market Capitalization Area Chart overlaid with Gini Index

Dirk elaborated in his post and identified three distinct periods in his post:

  • Restructuring of Apple 1997-2000 – Gini remains very high near 0.85 due to MSFT dominance
  • iTunes era 2001-2006 – Gini decreases to ~ 0.55 due to AAPL increase and taking share from other established players
  • Mobile devices era 2007-2011 – Gini increases again to 0.65 due to increasing dominance of AAPL and irrelevance of smaller players

Regardless of the absolute value of the Gini index – note the caveat from the earlier post that it is very sensitive to the number of contributors – the trend in the Gini can be an interesting signal. One company dwarfing every other like a monopoly corresponds to high Gini (here 0.85 due to MSFT dominance). A return to lower Gini values (here down to ~0.5) signals stronger competition with multiple entrants. The recent reversal of the Gini trend (up to 0.65 due to AAPL dominance) is a sign that investors see less choices when it comes to buying shares in those tech companies. Whether that’s a leading indicator for consumers seeing less choices in the marketplace is another question…

 
Leave a comment

Posted by on September 29, 2011 in Financial, Industrial

 

Tags: , , ,

Visualizing Inequality

Visualizing Inequality

Measuring and visualizing inequality is often the starting point for further analysis of underlying causes. Only with such understanding can one systematically influence the degree of inequality or take advantage of it. In previous posts on this Blog we have already looked at some approaches, such as the Lorenz-Curve and Gini-Index or the Whale-Curve for Customer Profitability Analysis. Here I want to provide another visual method and look at various examples.

Inequality is very common in economics. Competitors have different share of and capitalization in a market. Customers have different profitability for a company. Employees have different incomes across the industry. Countries have different GDP in the world economy. Households have different income and wealth in a population.

The Gini Index is an aggregate measure for the degree of inequality of any given distribution. It ranges from 0.0 or perfect equality, i.e. every element contributes the same amount to 1.0 or the most extreme inequality, i.e. one element contributes everything and all other elements contribute nothing. (The previous post referenced above contains links to articles for the definition and calculation of the Gini index.)

There are several ways to visualize inequality, including the Lorenz-Curve. Here we look at one form of pie-charts for some discrete distributions. As a first example, consider the distribution of market capitalization among the Top-20 technology companies (Source: Nasdaq, Date: 9/17/11):

Market Cap of Top 20 Technology Companies on the Nasdaq

Apple, the largest company by far, is bigger than the bottom 10 combined. The first four (20%) companies – Apple, Microsoft, IBM, Google – are almost half of the entire size and thus almost the size of the other 16 (80%) combined. The pie-chart gives an intuitive sense of the inequality. The Gini Index gives a precise mathematical measure; for this discrete distribution it is 0.47

Another example is a look at the top PC shipments in the U.S. (Source: IDC, Date: Q2’11)

U.S. PC Shipments in Q2'11

There is a similar degree of inequality (Gini = 0.46). In fact, this degree of inequality (Gini index ~ 0.5) is not unusual for such distributions in mature industries with many established players. However, consider the tablet market, which is dominated by Apple’s iOS (Source: Strategy Analytics, Date: Q2’11)

Worldwide Tablet OS shipments in Q2'11

Apple’s iOS captures 61%, Android 30%, and the other 3 categories combined are under 10%. This is a much stronger degree of inequality with Gini = 0.74

To pick an example from a different industry, here are the top 18 car brands sold in the U.S. (Source: Market Data Center at WSJ.COM; Date: Aug-2011):

U.S. Total Car Sales in Aug-11

When comparing different the Gini index values for these kinds of distributions it is important to realize the impact of the number of elements. More elements in the distribution (say Top-50 instead of Top-20) usually increases the Gini index. This is due to the impact of additional very small players. Suppose for example, instead of the Top-18 you left out the two companies with the smallest sales, namely Saab and Subaru, and plotted only the Top-16. Their combined sales are less than 0.4% of the total, so one wouldn’t expect to miss much. Yet you get a Gini index of 0.49 instead of 0.54. So with discrete distributions and a relatively small number elements one risks comparing apples to oranges when there are different number of elements.

Consider as a last example a comparison of the above with two other distributions from my own personal experience – the list of base salaries of 30 employees reporting to me at one of my previous companies as well as the list of contributions to a recent personal charity fundraising campaign.

Gini Index Comparison

What’s interesting is that the salary distribution has by far the lowest amount of inequality. You wouldn’t believe that from the feelings of employees where many believe they are not getting their fair share and others are getting so much more… In fact, the skills and value contributions to the employer are probably far more unequal than the salaries! (Check out Paul Graham’s essays on “Great Hackers” for more on this topic!)
And when it comes to donations, the amount people are willing to give to charitable causes differs immensely. We have seen this already in a previous post on Gini-Index with recent U.S. political donations showing an astounding inequality of Gini index = 0.89. I challenge you to find a distribution across so many elements (thousands) which has greater inequality. If you find one, please comment on this Blog or email me as I’d like to know about it.

 
8 Comments

Posted by on September 22, 2011 in Industrial, Scientific, Socioeconomic

 

Tags: , ,