RSS

Category Archives: Industrial

Inequality, Lorenz-Curves and Gini-Index

In a previous post we looked at inequality of profits and the useful abstraction of the Whale-Curve to analyze Customer Profitability. Here I want to focus on inequality and its measurement and visualization in a broader sense.

A fundamental graphical representation of the form of a distribution is given by the Lorenz-Curve. It plots the cumulative contribution to a quantity over a contributing population. It is often used in economics to depict the inequality of wealth or income distribution in a population.

Lorenz Curve (Source: Wikipedia)

The Lorenz-Curve shows the y% contribution of the bottom x% of the population. The x-axis has the population sorted by increasing contributions; (i.e. the poorest on the left and the richest on the right). Hence the Lorenz-Curve is always at or below the diagonal line, which represents perfect equality. (By contrast, the x-axis of the Whale-Curve sorts by decreasing profit contributions.)

The Gini-Index is defined as G =  A / (A + B) , G = 2A  or G = 1 – 2B

Since each axis is normalized to 100%, A + B = 1/2 and all of the above are equivalent. Perfect equality means G = 0. Maximum inequality G = 1 is achieved if one member of the population contributes everything and everybody else contributes nothing.

An interesting interactive graph demonstrating Lorenz-Curves and corresponding Gini-Index values can be found here at the Wolfram Demonstration project.

The GINI Index is often used to indicate the income or wealth inequality of countries. The corresponding values of the GINI index are typically between 0.25 and 0.35 for modern, developed countries and higher in developing countries such as 0.45 – 0.55 in Latin America and up to 0.70 in some African countries with extreme income inequality.

GINI index of world countries in 2009 (Source: Wikipedia)

Graphically, many different shapes of the Lorenz-Curve can lead to the same areas A and B, and hence many different distributions of inequality can lead to the same GINI index. How can one determine the GINI index? If one has all the data, one can numerically determine the value from all the differences for each member of the population. An example of that is shown here to determine the inequality of market share for 10 trucking companies.
Another approach is to model the actual distribution using a formal statistical distribution with known properties such as Pareto, Log-Normal or Weibull. With a given formal distribution one can often calculate the GINI index analytically. See for example the paper by Michel Lubrano on “The Econometrics of Inequality and Poverty“. In another example, Eric Kemp-Benedict shows in this paper on “Income Distribution and Poverty” how well various statistical distributions match the actually measured data. It is commonly held that at the high end of the income the Pareto distribution is a good model (with its inherent Power law characteristic), while overall the Log-Normal is the best approximation.

After studying several of these papers I started to ask myself: If x% of the population contribute y% to the total, what’s the corresponding GINI index? For example, for the famous “80-20 rule” with 20% of the population contributing 80% of the result, what’s the GINI index for the 80-20 rule?

To answer this question I created a simple model of inequality based on a Pareto distribution. Its shape parameter controls the curvature of the distribution, which in turn determines the GINI index. The latter is visualized as color-coded bands using a 2D contour plot in the following graphic:

GINI index contour plot based on Pareto distribution model

The sample data point “A” corresponds to the 80-20 rule, which leads to a GINI index of about 0.75 (strongly unequal distribution). Data point “B” is an example of an extremely unequal distribution, namely US political donations (data from 2010 according to a statistic from the Center of Responsive Politics recently cited by CNNMoney):

“…a relatively small number of Americans do wield an outsized influence when it comes to political donations. Only 0.04% of Americans give in excess of $200 to candidates, parties or political action committees — and those donations account for 64.8% of all contributions”

0.04% contribute 64.8% of the total! Here is another way of describing this: If you had 2500 donors, the top donor gives twice as much as the other 2499 combined. This extreme amount of inequality corresponds to a GINI index of 0.89 (needless to say that this does not seem like a very democratic process…)

As for US income I created a separate graphic with data points from the high end of the income spectrum (where the underlying Pareto distribution model is a good fit): The top 1% (who earn 18% of all income), top 0.1% (8%), and top 0.01% (3.5%).

GINI Index Contour Plot with high end US Income distribution data points

These 3 data points are taken from Timothy Noah’s “The United States of Inequality“, a 10-part article series on Slate, which in turn is based on data and research from 2008 by Emmanuel Saez and visualizations by Catherine Mulbrandon of VisualizingEconomics.com. This shows the 2008 US income inequality has a GINI Index of approximately 0.46, which is unusually high for a developed country. Income inequality has grown in the US since around 1970, and the above article series analyzes potential factors contributing to that – but that’s a topic for another post. In the spirit of visualizing data to create insight, I’ll just leave you with this link to the corresponding 10-part visual guide to inequality:

Postscript: In April 2012 I came across a nice interactive visualization on the DataBlick website created by Anya A’Hearn using Tableau. It shows the trends of US income inequality over the last 90 years with 7 different categories (Top x% shares) and makes a good showcase for the illustrative power of interactive graphics.

 
6 Comments

Posted by on September 2, 2011 in Financial, Industrial, Scientific, Socioeconomic

 

Tags: , ,

Infographics by Column Five, Taste Graph by Hunch

Recently I downloaded an iPad app called Infographics created by the company Column Five. I found it interesting not so much because of the app itself (which has limited functionality and the recent update crashes quite frequently), but because of the rather large amount of infographics it allows you to browse quickly (unfortunately not by category or keyword and there is no search).

Infographics App from Column Five, Browser Interface

There are about 200 infographics in the app at present; these appear to be the same that you can also browse on Column Five’s website infographics gallery. These specimen cover a variety of categories, with a large dose of social media and Internet related topics – likely due to the sponsors paying for the creation of such artifacts. When in Portrait Mode you can read a bit more about the content of the respective infographic.

Infographics Browser in Portrait Mode

On their website Column Five talks about “What is an infographic?” Rather than define the term, they describe it using categories like Data Visualization or Information Design. From the above page:

In the age of big data, we need to both make sense of the numbers and be able to easily share the story they tell. The practice of data visualization, which is the study of the visual representation of data, typically analyzes large data sets. It seeks to uncover trends by showing meta patterns, or to make single data points easily visible and extractable. The visual display of this data is the most interesting and universal way to make it accessible to a wide audience. And as with all infographic design, the display method is rooted in the context and desired message.

This practice is the most numbers-heavy, and typically is what a purist would describe as a “true” infographic. These visualizations also tend to be more complex, as they often are attempting to display a great number of data points. In some cases, these graphics functionally serve only as art pieces, if no message can be extracted. When properly executed, however, they should be both beautiful and meaningful, allowing the viewer to decipher data and recognize trends while admiring its aesthetic appeal.

The focus is on the display of information both effectively (you get the intended message…) and efficiently (… quickly and unequivocally), to “use design to communicate a message that is both clear and universal”. Interesting that there is an element of art. We have seen this before on this Blog, for example the aesthetic appeal of Tree Maps or of Flight Pattern Visualizations – often the creators of such visualizations describe themselves as artists. I suspect that beautiful visualizations are better able to communicate a message – which is what the infographics sponsor paid for – because they appeal to the viewer aesthetically and thus tap into additional bandwidth to transport the intended message (“both beautiful and meaningful”).

Infographic about Hunch and their big data "taste graph"

As an example, consider the infographic about The Ever-Expanding Taste Graph by Hunch, published by Hunch on their Blog in May 2011. This visualization explains in broad strokes how Hunch is building a data structure – the “taste graph” – by recording people’s affinity to all kinds of things as observed and recorded from their own answers to questions and other interactions on the web.

On the one hand, it’s amazing how much data is being tracked and what kind of predictive power results from that. A graph with 500 million people nodes and 30 billion edges, running on a supercomputer with 48 processors and 1 TeraByte of RAM. Talk about a company whose business model is centered around big data! For those like me who started in Computer Science some 20+ years ago, such numbers are truly amazing. The cost of storage and processing power has exponentially declined for several decades now. As Bill Gates used to say: The effects of Moore’s law are often over-estimated in the short-term, but even more under-estimated in the long-term.

There is a disconcerting side to this, though: Very little privacy on the Internet. For most people, our online history gets more and more detailed every day… And with the explosion of social media we are volunteering so much information about ourselves, using our own time and effort, with the side-effect of enriching the social media enterprises. This has led some to observe that for social media companies, you are not the customer. You are the product!

I played Hunch’s online Twitter Predictor game. By analyzing my Twitter account and a few questions I volunteered to answer on the Hunch website they correctly predicted 94% of my responses to test answers by looking at the affinities and preferences of other people sufficiently similar to me. While some of those questions are fairly easy to predict and for many Yes/No questions even random guessing would get you 50% correct responses, such high accuracy is still interesting. Well, I guess I am that predictable. Is being predictable a good or bad thing?

 
Leave a comment

Posted by on August 23, 2011 in Industrial, Socioeconomic

 

Tags: , ,

Business Benefits of Software Release in Multiple Increments

One of the main principles of Lean-Agile Software Development is to deliver fast and in small increments. Breaking a large system into multiple increments and delivering some of them early has many benefits to both the business and the customer such as: Earn returns for delivered value sooner. Obtain customer feedback sooner to clarify future features. Capture more market share due to early mover advantage. Reduce risk of obsolescence due to late delivery.

While these benefits are somewhat intuitive, how can we better illustrate and quantify such benefits? From a financial, cash-flow perspective, there are three main business benefits of switching from one single release to multiple release increments:

  • Sooner Break-Even
  • Smaller Investment
  • Higher Net Return

A visual model helps to reinforce and quantify them. In the following 4min demonstration I am using a simplified model to illustrate the above benefits:

The above demonstration is based on the book “Lean-Agile Software Development – Achieving Enterprise Agility” by Alan Shalloway, Guy Beaver and James Trott. (See chapter 2: The Business Case for Agility) I believe that having such dynamic visualizations can help explain these benefits and thus make a more compelling business case for using Lean-Agile Software Development.

Business Benefits of Two-Increment Release

Click on the above graphic to interact with the dynamic model using the Wolfram CDF Player.

 
1 Comment

Posted by on August 20, 2011 in Industrial

 

Tags: , , ,

Customer Profitability

Inequality is often at the root of structure and contrasts. Exposing inequality can often lead to insight. For example, take the well-known Pareto principle, which states that roughly 80% of the effects come from 20% of the causes (hence also referred to as the 80-20 rule).

From the above Wikipedia page on the Pareto principle, chapter on business:

The distribution shows up in several different aspects relevant to entrepreneurs and business managers. For example:
80% of your profits come from 20% of your customers
80% of your complaints come from 20% of your customers
80% of your profits come from 20% of the time you spend
80% of your sales come from 20% of your products
80% of your sales are made by 20% of your sales staff
Therefore, many businesses have an easy access to dramatic improvements in profitability by focusing on the most effective areas and eliminating, ignoring, automating, delegating or re-training the rest, as appropriate.

Visualization can be a powerful instrument for such analysis. For customer profitability, a graphical representation of this inequality is often used as a starting point for analysis. A commonly used visualization is the so-called Whale-Curve. I created a short, 4 min video recording of a dynamic Whale-Curve Demonstration:

In case you’re curious, the above demonstration uses an underlying model I created in Mathematica. You can dynamically interact with it yourself using the free CDF (Computable Document Format) Player:

I have provided it as a contribution to the Wolfram Demonstration project, so you can download it, and even look at the source code if you are a Mathematica user.

If you are interested in applying customer profitability analysis to your business, you may want to consider the company RapidBusinessModeling, which has an elaborate analysis approach starting with such Whale-Curves.

The underlying notion of Inequality is a fundamental concept. We will look at it in other contexts in a later post.

 
2 Comments

Posted by on August 18, 2011 in Financial, Industrial, Scientific

 

Tags: , , , ,

Bubble Charts and GapMinder’s Trendalyzer

Bubble Charts are a powerful way to visualize data over time. They typically consist of a set of circles moving dynamically around in a two-dimensional box. One of the best illustrations of these charts comes from the GapMinder foundation. From their website mission statement:

The initial activity was to pursue the development of the Trendalyzer software. Trendalyzer sought to unveil the beauty of statistical time series by converting boring numbers into enjoyable, animated and interactive graphics. The current version of Trendalyzer is available since March 2006 as Gapminder World, a web-service displaying time series of development statistics for all countries.

In March 2007, Google acquired Trendalyzer from the Gapminder Foundation and the team of developers who formerly worked for Gapminder joined Google in California in April 2007.

Some of you may have seen Hans Rosling’s TED talks which leverage this tool. (For example, his 2007 talk on new insights on poverty or his 2010 talk on the good news of the decade about child mortality.) Some reviewers have said that in his talks, “data comes to live and sings” to the audience.

Snapshot of selected Nations Wealth and Health information for a given year.

Let’s look at the Trendalyzer above with data on the Nation’s Health and Wealth to illustrate the power of Bubble Charts:

  • Each Bubble corresponds to one nation X (say China)
  • Each Axis represents one scalar variable of the nation (here the wealth and health of nation X)
  • Position of bubble indicates the data point of the two axis variables at a given time (1960)
  • Size of bubble indicates a third scalar variable (population size of nation X)
  • Color of bubble indicates a category of the nation X, such as continent or other classification
  • Trajectory of bubble indicates the change over time (here ~ 50 years from 1960 to 2009 in annual steps)
  • With the Trendalyzer you can interact with the data in a variety of ways. You can change the two dimensions of nations data you care about. You can set the axis to linear or logarithmic to adjust the range of motion along the axis based on the data. You can select a subset of nations to highlight their bubbles. You can check to track the trajectory of bubbles over time. You can change the classification and it’s corresponding color scheme. You can manually slide time back and forth or start an automatic run through time. Here is another snapshot of the same data set:

    50 year time trace of nations wealth and health with 7 selected countries highlighted.

    This one graph alone shows a lot of interesting trends. India and China (light blue and red) both rapidly improved life expectancy between 1960 and 1980, and in the next three decades steadily improved GDP/capita. During the cold war both Russia (orange) and the United States (yellow) slowly improved wealth, but only the US increased health as well; and after the collapse of the Soviet Union in the 90’s Russia regressed in its GDP/capita back to nearly 1960 levels before slowly gaining again in the following decade. The three African countries (dark blue) both started in very different positions and each had unique trajectories. Zimbabwe started out with the highest life expectancy, but then had a devastating decade in the 90’s with the HIV epidemic taking its toll and reducing life expectancy down from 60 to around 40, followed by a backslide into more extreme poverty over the following decade. Nigeria, Africa’s most populous nation, has improved more steadily and now overtaken Zimbabwe both on average health and wealth. South Africa had slow gains in wealth throughout, but after sizable gains in health until the early 90’s, a precipitous decline brought that nation’s health back down again to near 1960 levels.

    Despite the extraordinary amount of information aggregated in such a graph, even more insight comes from interacting with the data and seeing the dynamic change in size and position over a time series. This is the central theme of this Blog: Creating insight from rich data visualizations through interaction and display of changes in real time. I encourage you to do so with the Trendalyzer tool at the Gapminder World website (requires Flash).

     
    4 Comments

    Posted by on July 28, 2011 in Industrial, Scientific, Socioeconomic

     

    Tags: , ,

    New book: Visualize This by Nathan Yau

    Released just 2 weeks ago I got a copy of “Visualize This”, the new book by FlowingData Blog author Nathan Yau.

    Nathan Yau's Blog "FlowingData" and new book "Visualize This"

    You can of course get a lot of details on Nathan’s own website here as well as reviews on Amazon. Below are my first impressions after spending a few hours with this book.

    If you have followed Nathan’s blog you will recognize many topics in the book. The book gives a good introduction how to create graphs and visualizations to “tell a story” to the audience. It has comprehensive coverage of topics such as where to get data from, how to get them into the right format and validate them, which tools to use based on what type of aggregation or visualization you intend to create. He focuses specifically on R, a programming language for statistical computing and graphics. He also recommends using a box of tools to leverage the strengths of each of them, such as quickly creating a raw chart in R and then dressing it up in Adobe Illustrator. I’d certainly enjoy using the examples as a tutorial for learning the R language.

    The book deserves a lot of credit for being laid out well and using a lot of practical examples from everyday life (aging trends, crime rates, economic charts, unemployment data, company store location & growth, urban population, fertility rates, etc.) which most people can relate to. It’s enjoyable to read and makes its points in fluid, yet precise language.

    I already took away a few new ideas about aggregate matrix plots (such as Figure 6-9 Scatterplot matrix of crime rates) or using shapes to compare vectors of multiple variables (such as the star charts and Nightingale Charts in chapter 7). For example, I think the Nightingale chart in Figure 7-18 of crime rates by US state is a very useful visualization showing at a glance both the relative amount as well as the break-down into 6 different types of crime per state.

    Sample figure with Nightingale Charts displaying crime rates per US state

    Don’t expect to learn much in terms of statistics – this book doesn’t purport to go into any sort of statistical depth. It is focused primarily on how to get good visualizations, as compared to incorrect, misleading or even purposely distorting graphs – what Nathan refers to as “Ugly Visualizations” on his Blog.

    If I had one wish regarding the contents of this book – or perhaps a sequel some day – I’d say to focus a bit more on interactive graphics. This is obviously hard to do in a printed book, whose pages will always be static. However, there is so much innovation in this area and with the advent of electronic books and media players for interactive content. Together with the advent of mobile computing platforms such as the iPad and book readers such as the Kindle I’m convinced that interactive graphics will enable a whole new way to “tell the story”.

     
    1 Comment

    Posted by on July 27, 2011 in Industrial, Scientific

     

    Tags: ,

    Flight Pattern Visualization

    Aaron Koblin, an artist specializing in data and digital technologies and currently leader of the Data Arts team in Google’s Creative Lab, collaborated with Wired Magazine and FlightView Software to create beautiful graphics and illustrations of flights based upon tracking data by the FAA.

    Flight Patterns over the US by Aaron Koblin

    The following YouTube video is a time-lapse movie of flights over the US during a 24 hour period in 2008. One can clearly see the airspace come alive on the East Coast in the early morning hours and then calm down over night.

    It is amazing how much data is aggregated into such a visualization – covering over 200.000 flights! Aaron’s website has a section about the flight patterns project which is well worth exploring. There are other graphs where you can set filters for aircraft type, manufacturer, altitude etc. Some of these graphics have been sold as wallpaper or prints and graced various art exhibitions. There is beauty in properly visualized data.

     
    1 Comment

    Posted by on July 26, 2011 in Art, Industrial

     

    Tags: , ,

    Visualizing Player from Visualizing.org

    Visualizing.Org is a community of creative people working to make sense of complex issues through data and design… and it’s a shared space and free resource to help you achieve this goal. One of the main tools is the new visualization player. From their website:

    Great visualizations of all kinds — from high-res infographics to interactive HTML5 apps — deserve stellar representation always. Instead of settling for embedded screenshots or links, as of today people can now easily embed your actual project (under CC license) using the Visualizing Player. This is a first for the field and we hope it helps make including data visualizations in blog posts and articles easier and more satisfying to readers and gets you and your work more attention.

    It’s a free media player designed specifically for data visualization and interactive graphics; it currently supports 7 formats (HTML5, Java, Flash, PDF, Video, Image, and URL). Its easy to embed in other sites and there are a lot of example visualizations from the community hosted at visualization.org.

    One of them is Gregor Aisch’s interactive graphic on Europe’s Energy production, consumption, import/export and dependencies:

    After playing with many of the example visualizations I have two spontaneous reactions:

    First, there is a lot of opportunity and possibility to display dynamic and complex information interactively. Not all infographics are interactive, of course, but those that are give you a sense of the power of interacting with the underlying data and models.

    Second, there seems to be a lack of generally accepted standards to convey certain types of information. It’s a bit of a wild-west situation with lots of creative approaches to visualizing data – for example look at the many different approaches to the UN Global Pulse data on the above community visualizations page. It reminds me of the graphical user interface days before the standardizing advent of Windows. Not that this is a bad thing; it just feels a bit overwhelming at times.

    It’s going to be interesting to see which styles of interactive presentation will become widely adopted.

     
    1 Comment

    Posted by on July 26, 2011 in Industrial, Socioeconomic

     

    Tags: ,

    Interactive Visualization of Flight Information with Kayak

    Kayak.com is a powerful online search aggregator for travel planning, including flights, hotels, cars and more.

    The corresponding Kayak HD app on the iPad has an interesting feature called “Explore”. In this mode you specify an airport and then qualify various flight attributes such as duration, price, number of stops etc. You can then see search hits on a world map. As you move the sliders for the search parameters, the result set gets updated automatically on the map. Here is an example animated image which displays the increased number of resulting flights from JFK airport in NYC when varying the flight duration in hours:

    Animation Sequence of Flight Results from JFK by flight duration

    It is such dynamic display during in-context manipulation which makes interactive visualization a powerful tool to explore data and create insight.

     
    Leave a comment

    Posted by on July 1, 2011 in Industrial

     

    Tags: ,

    Branded Data Visualizations: LUMAscapes

    In this article on the Spotfire Blog Amanda Brandon recently posed the question: Can Data Visualizations Change the Business Decision Game? The article recounts the creation of data visualizations by Terrence Kawaja to show the complex online advertising space with over 1200 companies involved in a $10b annual business. The graphics show the flow of information and involved service providers from advertiser to consumer. It is said that the original chart published in 2009 became a “go-to tool for advertising executives”.

    Advertising Technology Landscape by Terrence Kawaja (2009)

    Kawaja of investment firm LUMA Partners refined this approach and created six such landscapes called LUMAscapes for display, video, search, mobile, commerce, and social online advertising.

    Search online advertising technology landscape (Source: Lumascapes from lumapartners.com)

    The Spotfire Blog conlcudes with four takeaways for business analysts from the approach to use such visualizations:

    Data visualizations are the ultimate content marketing. The simplification of complex data in a visually appealing format can take your information and brand viral. Giving away data on the major players and how they work together to drive an industry set the stage for authority and respect. …

    Data visualizations can become an industry standard. Simply look at how Kawaja was able to help ad executives navigate the digital ad space.

    Data visualizations can become a game-changer. Kawaja is branding these tools and using the graphics as a tool in generating business for his investment firm.

    Data visualizations can be central to business decision-making. According to the WSJ, these new visualizations could enhance discussions at the Digital Media Summit, a meeting of top execs from the investment and Internet advertising space.

    A picture can be worth more than a thousand words…

     
    Leave a comment

    Posted by on June 23, 2011 in Financial, Industrial

     

    Tags: , ,