Category Archives: Socioeconomic link analysis on half-life of web content

The team at URL-shortening website has posted an interesting analysis on the attention span to links shared on the Internet via different social media platforms. This provides some quantification to what some have termed internet impatience. Most shared web links experience an initial burst of attention immediately after publication followed by a steep decay to near-zero relative activity. A useful measure is a link’s half-life, defined as the time interval between its peak frequency and half of the rest of all clicks over its lifetime.

From the Blog:

So we looked at the half life of 1,000 popular bitly links and the results were surprisingly similar. The mean half life of a link on twitter is 2.8 hours, on facebook it’s 3.2 hours and via ‘direct’ sources (like email or IM clients) it’s 3.4 hours. So you can expect, on average, an extra 24 minutes of attention if you post on facebook than if you post on twitter.

Distribution of web link half-lifes (Source: Blog)

This half-life distribution plot (x-axis 1 day = 86.400 seconds) of content shared via links shows some interesting patterns:

  • In general, content half-life is about 3 hours (10.000 sec)
  • Content half-life does not depend on the medium through which it is shared
  • YouTube content has a different distribution and a considerably longer half-life (about 7 hours)

One is tempted to relate such stats to one’s own browsing experience or look at systematic analysis of how people deal with shared links. For example, Microsoft’s Outlook team did extensive usability research on how people deal with incoming email so as to improve the usability of their mail reader. It was found that most emails fall into one of three categories (Open & Read immediately, Ignore & Discard, File & Flag for future reading). I speculate that links received in Twitter or email will be similar, perhaps with the added category of retweet or forward (in the case of a story going viral). YouTube being different can perhaps be attributed to the fact that many videos require more time so we make a more deliberate decision as to whether and when we want to spend that time. For instance, one might say I want to watch this video tonight when I get home from work, which would fit with the 7 hours half-life.

In any event, such statistics show us that when it comes to clicking on shared links, our behavior is fairly predictable and probably driven by simple habits rather than complex thought. On one hand this allows good estimates for the expected life-time clicks. On the other hand, it can be a bit disconcerting to realize that our clicking behavior may be controlled by rather simple behavioral drivers (habitual classification, desire for instant gratification, out-of-sight out-of-mind, etc.). For instance, we usually give the most recent incoming news priority over other criteria of personal content preference. But is the latest really the greatest? I suspect that just like impulse-shopping there is a lot of impulse-clicking. And who does not know the exhausted feeling of getting lost while browsing and in hindsight regretting not having made the best use of one’s time… Perhaps this hints at more opportunities for more personalized and content-preference filtered news delivery mechanisms (such as the News reader app Zite, recently acquired by CNN).

1 Comment

Posted by on September 9, 2011 in Scientific, Socioeconomic


Tags: , ,

Inequality, Lorenz-Curves and Gini-Index

In a previous post we looked at inequality of profits and the useful abstraction of the Whale-Curve to analyze Customer Profitability. Here I want to focus on inequality and its measurement and visualization in a broader sense.

A fundamental graphical representation of the form of a distribution is given by the Lorenz-Curve. It plots the cumulative contribution to a quantity over a contributing population. It is often used in economics to depict the inequality of wealth or income distribution in a population.

Lorenz Curve (Source: Wikipedia)

The Lorenz-Curve shows the y% contribution of the bottom x% of the population. The x-axis has the population sorted by increasing contributions; (i.e. the poorest on the left and the richest on the right). Hence the Lorenz-Curve is always at or below the diagonal line, which represents perfect equality. (By contrast, the x-axis of the Whale-Curve sorts by decreasing profit contributions.)

The Gini-Index is defined as G =  A / (A + B) , G = 2A  or G = 1 – 2B

Since each axis is normalized to 100%, A + B = 1/2 and all of the above are equivalent. Perfect equality means G = 0. Maximum inequality G = 1 is achieved if one member of the population contributes everything and everybody else contributes nothing.

An interesting interactive graph demonstrating Lorenz-Curves and corresponding Gini-Index values can be found here at the Wolfram Demonstration project.

The GINI Index is often used to indicate the income or wealth inequality of countries. The corresponding values of the GINI index are typically between 0.25 and 0.35 for modern, developed countries and higher in developing countries such as 0.45 – 0.55 in Latin America and up to 0.70 in some African countries with extreme income inequality.

GINI index of world countries in 2009 (Source: Wikipedia)

Graphically, many different shapes of the Lorenz-Curve can lead to the same areas A and B, and hence many different distributions of inequality can lead to the same GINI index. How can one determine the GINI index? If one has all the data, one can numerically determine the value from all the differences for each member of the population. An example of that is shown here to determine the inequality of market share for 10 trucking companies.
Another approach is to model the actual distribution using a formal statistical distribution with known properties such as Pareto, Log-Normal or Weibull. With a given formal distribution one can often calculate the GINI index analytically. See for example the paper by Michel Lubrano on “The Econometrics of Inequality and Poverty“. In another example, Eric Kemp-Benedict shows in this paper on “Income Distribution and Poverty” how well various statistical distributions match the actually measured data. It is commonly held that at the high end of the income the Pareto distribution is a good model (with its inherent Power law characteristic), while overall the Log-Normal is the best approximation.

After studying several of these papers I started to ask myself: If x% of the population contribute y% to the total, what’s the corresponding GINI index? For example, for the famous “80-20 rule” with 20% of the population contributing 80% of the result, what’s the GINI index for the 80-20 rule?

To answer this question I created a simple model of inequality based on a Pareto distribution. Its shape parameter controls the curvature of the distribution, which in turn determines the GINI index. The latter is visualized as color-coded bands using a 2D contour plot in the following graphic:

GINI index contour plot based on Pareto distribution model

The sample data point “A” corresponds to the 80-20 rule, which leads to a GINI index of about 0.75 (strongly unequal distribution). Data point “B” is an example of an extremely unequal distribution, namely US political donations (data from 2010 according to a statistic from the Center of Responsive Politics recently cited by CNNMoney):

“…a relatively small number of Americans do wield an outsized influence when it comes to political donations. Only 0.04% of Americans give in excess of $200 to candidates, parties or political action committees — and those donations account for 64.8% of all contributions”

0.04% contribute 64.8% of the total! Here is another way of describing this: If you had 2500 donors, the top donor gives twice as much as the other 2499 combined. This extreme amount of inequality corresponds to a GINI index of 0.89 (needless to say that this does not seem like a very democratic process…)

As for US income I created a separate graphic with data points from the high end of the income spectrum (where the underlying Pareto distribution model is a good fit): The top 1% (who earn 18% of all income), top 0.1% (8%), and top 0.01% (3.5%).

GINI Index Contour Plot with high end US Income distribution data points

These 3 data points are taken from Timothy Noah’s “The United States of Inequality“, a 10-part article series on Slate, which in turn is based on data and research from 2008 by Emmanuel Saez and visualizations by Catherine Mulbrandon of This shows the 2008 US income inequality has a GINI Index of approximately 0.46, which is unusually high for a developed country. Income inequality has grown in the US since around 1970, and the above article series analyzes potential factors contributing to that – but that’s a topic for another post. In the spirit of visualizing data to create insight, I’ll just leave you with this link to the corresponding 10-part visual guide to inequality:

Postscript: In April 2012 I came across a nice interactive visualization on the DataBlick website created by Anya A’Hearn using Tableau. It shows the trends of US income inequality over the last 90 years with 7 different categories (Top x% shares) and makes a good showcase for the illustrative power of interactive graphics.


Posted by on September 2, 2011 in Financial, Industrial, Scientific, Socioeconomic


Tags: , ,

Infographics by Column Five, Taste Graph by Hunch

Recently I downloaded an iPad app called Infographics created by the company Column Five. I found it interesting not so much because of the app itself (which has limited functionality and the recent update crashes quite frequently), but because of the rather large amount of infographics it allows you to browse quickly (unfortunately not by category or keyword and there is no search).

Infographics App from Column Five, Browser Interface

There are about 200 infographics in the app at present; these appear to be the same that you can also browse on Column Five’s website infographics gallery. These specimen cover a variety of categories, with a large dose of social media and Internet related topics – likely due to the sponsors paying for the creation of such artifacts. When in Portrait Mode you can read a bit more about the content of the respective infographic.

Infographics Browser in Portrait Mode

On their website Column Five talks about “What is an infographic?” Rather than define the term, they describe it using categories like Data Visualization or Information Design. From the above page:

In the age of big data, we need to both make sense of the numbers and be able to easily share the story they tell. The practice of data visualization, which is the study of the visual representation of data, typically analyzes large data sets. It seeks to uncover trends by showing meta patterns, or to make single data points easily visible and extractable. The visual display of this data is the most interesting and universal way to make it accessible to a wide audience. And as with all infographic design, the display method is rooted in the context and desired message.

This practice is the most numbers-heavy, and typically is what a purist would describe as a “true” infographic. These visualizations also tend to be more complex, as they often are attempting to display a great number of data points. In some cases, these graphics functionally serve only as art pieces, if no message can be extracted. When properly executed, however, they should be both beautiful and meaningful, allowing the viewer to decipher data and recognize trends while admiring its aesthetic appeal.

The focus is on the display of information both effectively (you get the intended message…) and efficiently (… quickly and unequivocally), to “use design to communicate a message that is both clear and universal”. Interesting that there is an element of art. We have seen this before on this Blog, for example the aesthetic appeal of Tree Maps or of Flight Pattern Visualizations – often the creators of such visualizations describe themselves as artists. I suspect that beautiful visualizations are better able to communicate a message – which is what the infographics sponsor paid for – because they appeal to the viewer aesthetically and thus tap into additional bandwidth to transport the intended message (“both beautiful and meaningful”).

Infographic about Hunch and their big data "taste graph"

As an example, consider the infographic about The Ever-Expanding Taste Graph by Hunch, published by Hunch on their Blog in May 2011. This visualization explains in broad strokes how Hunch is building a data structure – the “taste graph” – by recording people’s affinity to all kinds of things as observed and recorded from their own answers to questions and other interactions on the web.

On the one hand, it’s amazing how much data is being tracked and what kind of predictive power results from that. A graph with 500 million people nodes and 30 billion edges, running on a supercomputer with 48 processors and 1 TeraByte of RAM. Talk about a company whose business model is centered around big data! For those like me who started in Computer Science some 20+ years ago, such numbers are truly amazing. The cost of storage and processing power has exponentially declined for several decades now. As Bill Gates used to say: The effects of Moore’s law are often over-estimated in the short-term, but even more under-estimated in the long-term.

There is a disconcerting side to this, though: Very little privacy on the Internet. For most people, our online history gets more and more detailed every day… And with the explosion of social media we are volunteering so much information about ourselves, using our own time and effort, with the side-effect of enriching the social media enterprises. This has led some to observe that for social media companies, you are not the customer. You are the product!

I played Hunch’s online Twitter Predictor game. By analyzing my Twitter account and a few questions I volunteered to answer on the Hunch website they correctly predicted 94% of my responses to test answers by looking at the affinities and preferences of other people sufficiently similar to me. While some of those questions are fairly easy to predict and for many Yes/No questions even random guessing would get you 50% correct responses, such high accuracy is still interesting. Well, I guess I am that predictable. Is being predictable a good or bad thing?

Leave a comment

Posted by on August 23, 2011 in Industrial, Socioeconomic


Tags: , ,

Bubble Charts and GapMinder’s Trendalyzer

Bubble Charts are a powerful way to visualize data over time. They typically consist of a set of circles moving dynamically around in a two-dimensional box. One of the best illustrations of these charts comes from the GapMinder foundation. From their website mission statement:

The initial activity was to pursue the development of the Trendalyzer software. Trendalyzer sought to unveil the beauty of statistical time series by converting boring numbers into enjoyable, animated and interactive graphics. The current version of Trendalyzer is available since March 2006 as Gapminder World, a web-service displaying time series of development statistics for all countries.

In March 2007, Google acquired Trendalyzer from the Gapminder Foundation and the team of developers who formerly worked for Gapminder joined Google in California in April 2007.

Some of you may have seen Hans Rosling’s TED talks which leverage this tool. (For example, his 2007 talk on new insights on poverty or his 2010 talk on the good news of the decade about child mortality.) Some reviewers have said that in his talks, “data comes to live and sings” to the audience.

Snapshot of selected Nations Wealth and Health information for a given year.

Let’s look at the Trendalyzer above with data on the Nation’s Health and Wealth to illustrate the power of Bubble Charts:

  • Each Bubble corresponds to one nation X (say China)
  • Each Axis represents one scalar variable of the nation (here the wealth and health of nation X)
  • Position of bubble indicates the data point of the two axis variables at a given time (1960)
  • Size of bubble indicates a third scalar variable (population size of nation X)
  • Color of bubble indicates a category of the nation X, such as continent or other classification
  • Trajectory of bubble indicates the change over time (here ~ 50 years from 1960 to 2009 in annual steps)
  • With the Trendalyzer you can interact with the data in a variety of ways. You can change the two dimensions of nations data you care about. You can set the axis to linear or logarithmic to adjust the range of motion along the axis based on the data. You can select a subset of nations to highlight their bubbles. You can check to track the trajectory of bubbles over time. You can change the classification and it’s corresponding color scheme. You can manually slide time back and forth or start an automatic run through time. Here is another snapshot of the same data set:

    50 year time trace of nations wealth and health with 7 selected countries highlighted.

    This one graph alone shows a lot of interesting trends. India and China (light blue and red) both rapidly improved life expectancy between 1960 and 1980, and in the next three decades steadily improved GDP/capita. During the cold war both Russia (orange) and the United States (yellow) slowly improved wealth, but only the US increased health as well; and after the collapse of the Soviet Union in the 90’s Russia regressed in its GDP/capita back to nearly 1960 levels before slowly gaining again in the following decade. The three African countries (dark blue) both started in very different positions and each had unique trajectories. Zimbabwe started out with the highest life expectancy, but then had a devastating decade in the 90’s with the HIV epidemic taking its toll and reducing life expectancy down from 60 to around 40, followed by a backslide into more extreme poverty over the following decade. Nigeria, Africa’s most populous nation, has improved more steadily and now overtaken Zimbabwe both on average health and wealth. South Africa had slow gains in wealth throughout, but after sizable gains in health until the early 90’s, a precipitous decline brought that nation’s health back down again to near 1960 levels.

    Despite the extraordinary amount of information aggregated in such a graph, even more insight comes from interacting with the data and seeing the dynamic change in size and position over a time series. This is the central theme of this Blog: Creating insight from rich data visualizations through interaction and display of changes in real time. I encourage you to do so with the Trendalyzer tool at the Gapminder World website (requires Flash).


    Posted by on July 28, 2011 in Industrial, Scientific, Socioeconomic


    Tags: , ,

    Visualizing Player from

    Visualizing.Org is a community of creative people working to make sense of complex issues through data and design… and it’s a shared space and free resource to help you achieve this goal. One of the main tools is the new visualization player. From their website:

    Great visualizations of all kinds — from high-res infographics to interactive HTML5 apps — deserve stellar representation always. Instead of settling for embedded screenshots or links, as of today people can now easily embed your actual project (under CC license) using the Visualizing Player. This is a first for the field and we hope it helps make including data visualizations in blog posts and articles easier and more satisfying to readers and gets you and your work more attention.

    It’s a free media player designed specifically for data visualization and interactive graphics; it currently supports 7 formats (HTML5, Java, Flash, PDF, Video, Image, and URL). Its easy to embed in other sites and there are a lot of example visualizations from the community hosted at

    One of them is Gregor Aisch’s interactive graphic on Europe’s Energy production, consumption, import/export and dependencies:

    After playing with many of the example visualizations I have two spontaneous reactions:

    First, there is a lot of opportunity and possibility to display dynamic and complex information interactively. Not all infographics are interactive, of course, but those that are give you a sense of the power of interacting with the underlying data and models.

    Second, there seems to be a lack of generally accepted standards to convey certain types of information. It’s a bit of a wild-west situation with lots of creative approaches to visualizing data – for example look at the many different approaches to the UN Global Pulse data on the above community visualizations page. It reminds me of the graphical user interface days before the standardizing advent of Windows. Not that this is a bad thing; it just feels a bit overwhelming at times.

    It’s going to be interesting to see which styles of interactive presentation will become widely adopted.

    1 Comment

    Posted by on July 26, 2011 in Industrial, Socioeconomic


    Tags: ,

    Visual Human Development Index

    Alex Simoes, MIT Media Lab student working with Professor Cesar Hidalgo, developed a graphical representation of the Human Development Index (HDI). The so-called HDI trees are based on data published in the United Nations 2010 edition of the Human Development Report. The interactive version on their website allows for comparisons between two countries, or between two years of one country.

    Human Development Index – HDI Tree Representation

    From Hidalgo’s website:

    The HDI Tree aggregates data in the Human Development Index graphically instead of numerically. A long standing criticism of the Human Development Index is that, because it averages indicators of Income, Health and Education, it is possible for countries to obtain the same score with different combinations of indicators. This creates the possibility of substituting Education for Health, Health for Income or Income for Education.

    The HDI tree deals with the numerical aggregation problem by using a graphical representation in which the total value of a country’s HDI is presented together with that of its components and subcomponents. This way it is possible to see immediately the contribution of each dimension to the value of a country’s HDI.

    Moreover, the HDI tree represents an alternative way of branding the idea of Human Development and communicating its message graphically to a wide audience. For more on the HDI tree, see the original report or this summary document.

    Inevitably, there are times when one wishes to collapse multiple dimensions or factors into one numerical score. However, one loses the details underlying the score. Such tree-like visual representations of aggregate information can be used for compound measurements used in business, such as the Balanced Scorecard.

    Note: Hidalgo’s gallery features many more interesting projects, such as Disease Network Data visualizing disease associations or the Product Space visualizing economic capabilities of countries based on their trading activities.

    Addendum: I did some more research on this and found a great summary on the HDI tree posted under the title “Visualizing Human Development” at One particularly interesting chart is a summary of 35 African nations, showing their respective HDI tree for both 1970 and 2005.

    From the original summary paper “A Visual HDI” by C. Hidalgo:

    The Development Tree also facilitates searching and comparing features over large volumes of data. For example, consider Figure [above], a chart in which the HDI trees of 35 African Nations are shown for both 1975 and 2005. This figure shows information on 420 numerical values (35 countries x 2 years x 6 values). In this chart, however, there are several observations that are easy to spot despite the large amount of information being presented. For instance, it is relatively easy to find out what are the countries in the set with higher levels of development. Algeria, Botswana, Libya, Mauritius, Morocco, South Africa and Tunisia in this case. Moreover, their increases are also rather conspicuous. Also, the lopsidedness of some nations also becomes conspicuous, as it can be seen in the examples of Botswana, South Africa and Swaziland, regarding the life dimension, and that of Libya in 1970, regarding high Income, or of Congo DRC in 2005, regarding low income.

    Again, I can easily picture applications of this visual representation of an aggregate score in a typical business environment. Consider an internal ranking of employees based on an aggregation of several orthogonal dimensions such as skill, teamwork, communication, innovation and business savvy. You could look at a dozen of these employees and their respective visual aggregate tree scores to spot trends, outliers, and relative strengths. Another example is the Balanced Scorecard approach mentioned above. Suppose you are aggregating measures about Finance, Schedule, Quality, Innovation, and People into the score of an Engineering organization. Then you could picture the tree for aggregate performance of this business unit over time (quarters or years) to spot trends.


    Posted by on July 6, 2011 in Socioeconomic


    Tags: , , ,

    Interactive and Visual Information

    The way we create and consume documents to present and understand information is changing.

    Online information has started this trend around 1995. Web page content has rapidly evolved from static content in the 1990’s to much more dynamic and somewhat interactive content. Most web sites today are continuously updated with streams of new information and allow the user to query for specific information, say about the weather in a particular region or the stock price of a particular company. Users can query databases to retrieve wanted information. Users can search the web and follow links to pages likely relevant to the search. Users can type in questions and receive answers, in some cases calculated from available models and data. I have written about the evolutionary impact of this on the presentation of information during meetings. (Technology also moves towards other forms of interaction without typing or touching, such as voice-recognition a la Google Voice or motion-detection a la Microsoft Kinect. But the focus here is not on the nature of the interaction format as much as on its impact on the information transfer.)

    The composition of online information has also drastically changed. This goes far beyond hyperlinking documents for easy navigation. Instead of one area of text per site (as is the standard model in a book) we now routinely have multiple areas on the page showing different, often related pieces of information. Input fields and other controls allow for interaction, such as entering a stock ticker symbol and then hovering over the generated historical chart for analysis. Multiple widgets or components make up modern web sites, often highly customizable and aggregating information from various sources or feeds. With the advent of digital music, photo and video this turned into multimedia information. The latter offers the potential to re-shape the way information is produced and consumed in electronic books. The combination of new form factors (such as the iPad), nearly ubiquitous wireless Internet connectivity, tremendous processing power, increased battery life, rising popularity and adoption of eBook readers and constantly improving authoring tools ushers in a new era of content. An example of the sophisticated use of multimedia and touch interface on the iPad is the book (in custom app format) “Our Choice” from Al Gore, released by Push-Print-Press.

    Map of solar power density across the U.S. Note interactive popup with location-specific detail (data both lookup and calculated).

    No longer tethered to power outlets and network cables, we now have greater mobility of information than ever before. This leads to additional possibilities such as location-based services, as the presentation of information can be customized to the location of the reader. (“Where are some nearby restaurants or gas stations?”) Conversely, the location of the reader can be tracked over time and thus new location-based information and services are created.

    Of particular interest is interactive information based on executable models which can simulate or calculate outcomes based on parameters interactively set by the reader. Conceptually, the document now acts as a container not just for text and images, but for models which can carry out computations. A simple example might be a mortgage calculator: Type in a loan amount, drag a slider to reflect interest rates and repayment time, and out comes the monthly payments and other details. Embedding such a calculator in a document transforms the passive reading experience into an interacting exploration.

    Or think of a business model with parameters such as production volumes, storage, pricing and customer order rates. Now one can interact and explore ranges of model behavior as well as boundaries (profit/loss, break-even, etc.) Here are two examples of such business models of increasing complexity:

    Just like Adobe coined the term PDF (Portable Document Format) trying to establish a standard for portability, Wolfram Research has coined the term CDF (Computable Document Format) trying to establish a standard for computability. The two above and many other examples can be found on Wolfram Research’s CDF demonstrations page. The best way to understand such models is to interact with them. (For CDF you will have to install a free browser plug-in.) A good article on the potential of CDF for presenting information in a business context is here.

    Another very interesting example of the presentation of a dynamic nonlinear system is provided by Bret Victor below. It focuses on two fundamental concepts: Ubiquitous visualization and in-context manipulation. The ability to interact with the model and to see rich visuals while doing so is key to understanding its behavior. As this model does not seem to be publicly available yet the next best thing is to watch someone interact with it:

    Interactive Exploration of a Dynamical System from Bret Victor on Vimeo.

    As a final example, consider the Mandelbrot Set, an abstract mathematical object discovered around 1980. While it’s easy to specify in mathematical formal terms, its amazing complexity can so much better be experienced when you have the ability to explore this object, zoom in/out and discover the infinitely rich detail of its fractal boundary. With the computing power, rich color graphics and intuitive touch interface of the iPad 2 and a good app (like the $1.99 Fractile Plus) make this perhaps the most advanced case of model-based, interactive visualization to date.

    Screenshot of Fractile Plus app on iPad. The experience of zooming and moving around via touch interface and fast visual feedback is amazing.

    No amount of pictures displayed here can replace the interactive experience of such apps to explore and understand the beauty of this object. If you didn’t do so yet, get your own iPad and explore – you won’t regret it!


    Posted by on June 8, 2011 in Industrial, Scientific, Socioeconomic



    Google Fusion Tables – Free Visualization Tool

    Google does search very well. But Google does so much more than that. Think GMail and Blogger, YouTube and Picasa, Google Maps and Google Earth, to name just a few. The Google products page at present lists about 50 tools across categoires Web, Mobile, Media, Geo, Home & Office, Social, Specialized Search, and Innovation. In this last category is Google Fusion Tables, a free tool to share, analyze and visualize data on the web.

    You can upload, display and edit your own data, do some filter, aggregate, merge operations, and leverage a series of typical visualization options (Table, Map, Line, Bar, Pie, Scatter…), similar to what you expect from a spreadsheet tool like Excel or Numbers. Through integration with Google Maps APIs it is easy to generate geographical maps and charts such as this demonstration of average cigarette use in countries across the world.

    Sample Demonstration of Google Fusion Tables tool showing a world intensity map of cigarette use.

    This makes the tool a good candidate to learn or teach about data visualization and play with the available sample data. The bucket of available public tables is rather unstructured – no taxonomy or hierarchical structure – and search for tables is surprisingly limited.
    That said, there are plenty of documents, FAQ, APIs, and Forum discussions. And some of the demonstrations are quite useful, for example the website which shows an interactive world-map with more than 10.000 newspapers in their respective geographies and color-coded in the published language:

    Note the color-code for different languages.

    Leave a comment

    Posted by on June 8, 2011 in Industrial, Socioeconomic


    Tags: , ,

    Composite Graphs

    Today’s edition of the Wall Street Journal features an article on the nation-wide decline of housing values across the US. There is a good example of a composite graph illustrating a lot of data at once:

    From the chart legend:

    “Charts show percentage change since 2000 in S&P/Case-Shiller national quarterly home-price index and in monthly indexes for U.S. metropolitan areas through March 2011.”

    Think about the amount of data aggregated in these charts! The S&P Case-Shiller home price index is calculated monthly using a three-month moving average and published with a two month lag on the Standard & Poors website. Each of the 8 metropolitan indices shown on the right is composed of thousands of individual data points on changes in property values in the respective area. (Specifically, those data points are measured using the repeat sales technology, which uses sales pairs of two successive transactions for one property to calculate home-price changes.) Every month that data is aggregated to one new data point, some 120+ of which compose the graph over more than 10 years. That’s already more than 100.000 data points aggregated in each of the 8 charts on the right. Looking at the National average on the left – which is an aggregate of all the 20 metropolitan areas in the index – you are literally looking at an aggregate of millions of data points!

    An interesting exercise is to google for images on the S&P Case-Shiller index. Here is a collection of the first of some 300.000+ results:

    Image search results on indices are an excellent source of examples on how to aggregate numerical data graphically.

    Addendum: Alex Kerin from Data Driven Consulting published this interactive chart of the Case-Shiller index using Tableau Public. It clearly shows how an interactive chart goes beyond static images in bringing data to live and telling the underlying story.


    Leave a comment

    Posted by on June 1, 2011 in Financial, Socioeconomic


    Tags: ,

    %d bloggers like this: