RSS

Inequality and the World Economy

Inequality and the World Economy

The last edition of The Economist featured a 25-page special report on “The new politics of capitalism and inequality” headlined “True Progressivism“. It is the most recommended and commented story on The Economist this week.

We have looked at various forms of economic inequality on this Blog before, as well as other manifestations (market share, capitalization, online attention) and various ways to measure and visualize inequality (Gini-index). Hence I was curious about any new trends and perhaps ways to visualize global economic inequality. That said, I don’t intend to enter the socio-political debate about the virtues of inequality and (re-)distribution policies.

In the segment titled “For richer, for poorer” The Economist explains.

The level of inequality differs widely around the world. Emerging economies are more unequal than rich ones. Scandinavian countries have the smallest income disparities, with a Gini coefficient for disposable income of around 0.25. At the other end of the spectrum the world’s most unequal, such as South Africa, register Ginis of around 0.6.

Many studies have found that economic inequality has been rising over the last 30 years in many industrial and developing nations around the world. One interesting phenomenon is that while the Gini index of many countries has increased, the Gini index of world inequality has fallen. This is shown in the following image from The Economist.

Global and national inequality levels (Source: The Economist)

This is somewhat non-intuitive. Of course the countries differ widely in terms of population size and level of economic development. At a minimum it means that a measure like the Gini index is not simply additive when aggregated over a collection of countries.

Another interesting chart displays a world map with color coding the changes in inequality of the respective country.

Changes in economic inequality over the last 30 years (Source: The Economist)

It’s a bit difficult to read this map without proper knowledge of the absolute levels of inequality, such as we displayed in the post on Inequality, Lorenz-Curves and Gini-Index. For example, a look at a country like Namibia in South Africa indicates a trend (light-blue) towards less inequality. However, Namibia used to be for many years the country with the world’s largest Gini (1994: 0.7; 2004: 0.63; 2010: 0.58 according to iNamibia) and hence still has much larger inequality than most developed countries.

World Map of national Gini values (Source: Wikipedia)

So global Gini is declining, while in many large industrial countries Gini is rising. One region where regional Gini is declining as well is Latin-America. Between 1980-2000 Latin America’s Gini has grown, but in the last decade Gini has declined back to 1980 levels (~0.5), despite the strong economic growth throughout the region (Mexico, Brazil).

Gini of Latin America over the last 30 years (Source: The Economist)

Much of the coverage in The Economist tackles the policy debate and the questions of distribution vs. dynamism. On the one hand reducing Gini from very large inequality contributes to social stability and welfare. On the other hand, further reducing already low Gini diminishes incentives and thus potentially slows down economic growth.

In theory, inequality has an ambiguous relationship with prosperity. It can boost growth, because richer folk save and invest more and because people work harder in response to incentives. But big income gaps can also be inefficient, because they can bar talented poor people from access to education or feed resentment that results in growth-destroying populist policies.

In other words: Some inequality is desirable, too much of it is problematic. After growing over the last 30 years, economic inequality in the United States has perhaps reached a worrisome level as the pendulum has swung too far. How to find the optimal amount of inequality and how to get there seem like fascinating policy debates to have. Certainly an example where data visualization can help an otherwise dry subject.

 
1 Comment

Posted by on October 15, 2012 in Socioeconomic

 

Tags: , , ,

Software continues to eat the world

Software continues to eat the world

One year ago Marc Andreessen, co-founder of Netscape and venture capital firm Andreessen-Horowitz, wrote an essay for the Wall Street Journal titled “Why Software Is Eating The World“. It is interesting to reflect back to this piece and some of the predictions made back at a time when Internet company LinkedIn had just gone public and Groupon was just filing for an IPO.

Andreessen’s observation was simply this: Software has become so powerful and computer infrastructure so cheap and ubiquitous that many industries are being disrupted by new business models enabled by that software. Examples listed were books (Amazon disrupting Borders), movie rental (NetFlix disrupting Blockbuster), music industry (Pandora, iTunes), animation movies (Pixar), photo-sharing services (disrupting Kodak), job recruiting (LinkedIn), telecommunication (Skype), video-gaming (Zynga) and others.

On the infrastructure side one can bolster this argument by pointing at the rapid development of new technologies such as cloud computing or big data analytics. Andreessen gave one example of the cost of running an Internet application in the cloud dropping by a factor of 100x in the last decade (from $150,000 / month in 2000 using LoudCloud to about $1500 / month in 2011 using Amazon Web Services). Microsoft now has infrastructure with Windows Azure where procuring an instance of a modern server at one (or even multiple) data center(s) takes only minutes and costs you less than $1 per CPU hour.

Likewise, the number of Internet users has grown from some 50 million around 2000 to more than 2 billion with broadband access in 2011. This is certainly one aspect fueling the enormous growth of social media companies like Facebook and Twitter. To be sure, not every high-flying startup goes on to be as successful after its IPO. Facebook trades at half the value of opening day after three months. Groupon trades at less than 20% of its IPO value some 9 months ago. But LinkedIn has sustained and even modestly grown its market capitalization. And Google and Apple both trade near or at their all-time high, with Apple today at $621b becoming the most valuable company of all time (non inflation-adjusted).

The growing dominance and ubiquitous reach of software shows in other areas as well. Take automobiles. Software is increasingly been used for comfort and safety in modern cars. In fact, self-driving cars – once the realm of science fiction such as flying hover cars – are now technically feasible and knocking on the door of broad industrial adoption. After driving 300.000 miles in test Google is now deploying its fleet of self-driving cars for the benefit of its employees. Engineers even take self-driving cars to the racetracks, such as up on Pikes Peak or the Thunderhill raceway. Performance is now at the level of very good drivers, with the benefit of not having the human flaws (drinking, falling asleep, texting, showing off, etc.) which cause so many accidents. Expert drivers still outperform the computer-driven cars. (That said, even human experts sometimes make mistakes with terrible consequences, such as this crash on Pikes Peak this year.) The situation is similar to how computers got so proficient at chess in the mid-nineties that finally even the world champion was defeated.

In this post I want to look at some other areas specifically impacting my own life, such as digital photography. I am not a professional photographer, but over the years my wife and I have owned dozens of cameras and have followed the evolution of digital photography and its software for many years. Of course, there is an ongoing development towards chips with higher resolution and lenses with better optic and faster controls. But the major innovation comes from better software. Things like High Dynamic Range (HDR) to compensate for stark contrast in lighting such as a portrait photo against a bright background. Or stitching multiple photos together to a panorama, with Microsoft’s PhotoSynth taking this to a new level by building 3D models from multiple shots of a scene.

One recent innovation comes in the form of the new Sony RX100 camera, which science writer David Pogue raved about in the New York Times as “the best pocket camera ever made”. My wife bought one a few weeks ago and we both have been learning all it can do ever since. Despite the many impressive features and specifications about lens, optics, chip, controls, etc. what I find most interesting is the software running on such a small device. The intelligent Automatic setting will decide most settings for your everyday use, while one can always direct priorities (aperture, shutter, program) or manually override most aspects. There are a great many menus and it is not trivial to get to use all capabilities of this camera, as it’s extremely feature-rich. Some examples of the more creative software come in modes such as ‘water color’ or ‘illustration’. The original image is processed right then and there to generate effects as if it was a painting or a drawing. Both original and processed photo are stored on the mini-SD card.

Flower close-up in ‘illustration’ mode

One interesting effect is to filter to just the main colors (Yellow, Red, Green, Blue). Many of these effects are shown on the display, with the aperture ring serving as a flexible multi-functional dial for more convenient handling with two hands. (Actually, the camera body is so small that it is a challenge to use all dials while holding the device; just like the BlackBerry keyboard made us write with two thumbs instead of ten fingers.) The point of such software features is not so much that they are radically new; you could do so with a good photo editing software for many years. The point is that with the ease and integration of having them at your fingertips you are much more likely to use them.

Example of suppressing all colors except yellow

The camera will allow registering of faces and detect those in images. You can set it up such that it will take a picture only when it detects a small/medium/large smile on the subject being photographed. One setting allows you to take self-portrait, with the timer starting to count down as soon as the camera detects one (or two) faces in the picture! It is an eerie experience when the camera starts to “understand” what is happening in the image!

There is an automatic panorama stitching mode where you just hold the button and swipe the camera left-right or up-down while the camera takes multiple shots. It automatically stitches them into one composite, so no more uploading of the individual photos and stitching on the computer required.

Beach panorama stitched on the camera using swipe-&-shoot

I have been experimenting with panorama photos since 2005 (see my collection or my Panoramas from the Panamerican Peaks adventure). It’s always been somewhat tedious and results were often mixed (lens distortions, lighting changes sun vs. cloud or objects moving during the individual frames, not holding the camera level, skipping a part of the horizon, etc.) despite crafty post-processing on the computer with image software. I have read about special 360 degree lenses to take high-end panoramas, but who wants to go to those lengths just for the occasional panorama photo? From my experience, nothing moves the needle as much as the ease and integration of taking panoramas right in the camera as the RX100 does.

Or take the field of healthcare. Big Data, Mobility and Cloud Computing make possible entirely new business models. Let’s just look at mobility. The smartphone is evolving into a universal healthcare device for measuring, tracking and visualizing medical information. Since many people have their smartphone with them at almost all times, one can start tracking and analyzing personal medical data over time. And for almost any medical measurement, “there is an app for that”. One interesting example is this optical heart-rate monitor app Cardiio for the iPhone. (Cardio + IO ?)

Screenshots of Cardiio iPhone app to optically track heart rate

It is amazing that this app can track your heart rate just by analyzing the changes of light reflected from your face with its built-in camera. Not even a plug-in required!

Another system comes from Withings, this one designed to turn the iPhone into a blood pressure monitor. A velcro sleeve with battery mount and cable plugs into the iPhone and an app controls the inflation of the sleeve, the measurement and some simple statistics.

Blood pressure monitor system from Withings for iPhone

Again, it’s fairly simple to just put the sleeve around one upper arm and push the button on the iPhone app. The results are systolic and diastolic blood pressure readings and heart rate.

Sample blood pressure and pulse measurement

Like many other monitoring apps this one also keeps track of the readings and does some simple form of visual plotting and averaging.

Plot of several blood pressure readings

There is also a separate app which will allow you to upload your data and create a more comprehensive record of your own health over time. Withings provides a few other medical devices such as scales to add body weight and body fat readings. The company tagline is “smart and connected things”.

One final example is an award-winning contribution from a student team from Australia called Stethocloud. This system is aimed at diagnosing pneumonia. It is comprised of an app for the iPhone, a simple stethoscope plug-in for the iPhone and on the back-end some server-based software analyzing the measurements in the Windows Azure cloud according to standards defined by the World Health Organization. The winning team (in Microsoft’s 2012 Imagine Cup) built a prototype in only 2 weeks and had only minimal upfront investments.

StethoCloud system for iPhone to diagnose pneumonia

This last example perhaps illustrates best the opportunities of new software technologies to bring unprecedented advances to healthcare – and to many other fields and industries. I think Marc Andreessen was spot on with his observation that software is eating the world. It certainly does in my world.

 
Leave a comment

Posted by on August 20, 2012 in Industrial, Medical, Socioeconomic

 

Tags: , , , , ,

Olympic Medal Charts

Olympic Medal Charts

The 2012 London Olympic Games ended this weekend with a colorful closing ceremony. Media coverage was unprecedented, with other forms of competition around who had the most social media presence or which website had the best online coverage of the games.

In this post I’m looking at the medal counts over the history of the Olympic Games (summer games only, 27 events over the last 116 years, no games in 1916, 1940, and 1944). Nearly 11.000 athletes from 205 countries competed for more than 900 medals in 302 events. The New York Times has an interactive chart of the medal counts on their London 2012 Results page:

Bubble size represents the number of medals won by the country, bubble position is roughly based on a world map and bubble color indicates the continent. Moving the slider to a different year changes the bubbles, which gives a dynamic grow or shrink effect.

Below this chart is a table listing all gold, silver, bronze winners for each sport in that year, grouped by type of sport such as Gymnastics, Rowing or Swimming. Selecting a bubble will filter this to entries where the respective country won a medal. This shows the domination of some sports by certain countries, such as Diving (8 events, China won 6 gold and 10 total medals) or Cycling – Track (10 events, Great Britain won 7 gold and 9 total medals). In two sports, domination by one country was 100%: Badminton (5 events, China won 5 gold and 8 total medals), Table Tennis (4 events, China won 4 gold and 6 total medals).

There is also a summary table ranking the countries by total medals. For 2012, the United States clearly won that competition, winning more gold medals (46) than all but 3 other countries (China, Russia, Britain) won total medals.

Top 10 countries for medal count in 2012

Of course countries vary greatly by population size. It is remarkable that a relatively small nations such as Jamaica (~2.7 million) won 12 medals (4, 4, 4), while India (~1.25 billion) won only 6 medals (0, 2, 4). In that sense, Jamaica is about 1000x more medal-decorated per population size than India! In another New York Times graphic there is an option to compare medal count adjusted for population size, i.e. with the medal count normalized to a standard population size of say 100 million.

Directed graph comparing medal performance adjusted for country size

Selecting any node in this graph will highlight countries with better, worse or comparable relative medal performance. (There are different ways to rank based on how different medals are weighted.)

The Guardian Data Blog has taken this a step further and written a piece called “alternative medals table“. This post not only discusses multiple factors like population, GDP, or number of athletes and how to deal with them statistically; it also provides all the data and many charts in a Google Docs spreadsheet. One article combines GDP adjustment with cartographical mapping across Europe:

Medals GDP Adjusted and mapped for Europe

If you want to do your own analysis, you can get the data in shared spreadsheets. To do a somewhat more historic analysis, I used a different source, namely Wolfram’s curated data source accessible from within Mathematica. Of course, once you have all that data, you can examine it in many different directions. Did you know that 14853 Olympic medals were awarded so far in 27 summer Olympiads? The average was 550 medals, growing about 29 medals per event with nearly 1000 awarded in 2008 and 2012.

A lot of attention was paid to who would win the most medals in London. China seemed in contention for the top spot, but in the end the United States won the most medals, as it did in the last 5 Olympiads. Only 7 countries won the most medals at any Olympiad. Greece (1896), France (1900), the United Kingdom (1908), Sweden (1912), and Germany (1936) did so just once. The Soviet Union (which no longer exists) did it 8 times. And the United States did it 14 times. China, which is only participating since 1984, has yet to win the most medals of any Olympiad.

Aside from the top rank, I was curious about the distribution of medals over all countries. Both nations and events have increased, as is shown in the following paired bar chart:

Number of participating nations and total medals per Summer Games

The number of nations grew steadily with only two exceptions during the thirties and the seventies; presumably due to economic hardship many nations didn’t want to afford participation. 1980 also saw the Boycott of the Moscow Games by the United States and several other delegations over geopolitical disagreements. At just over 200 the number of nations seems to have stabilized.

The number of medals depends primarily on the number of events at each Olympiad. This year there were 302 events in 26 types of Sports. Total medal count isn’t necessarily exactly triple that since in some events there could be more than 1 Bronze (such as in Judo, Taekwondo, and Wrestling). Case in point, in 2012 there were 968 medals awarded, 62 more than 3 * 302 events.

What is the distribution of those medals over the participating nations? One measure would be the percentage of nations winning at least some medals. Another measure showing the degree of inequality in a distribution is the Gini index. Here I plotted the percentage of nations medaling and the Gini index of the medal distribution over all participating nations for every Olympiad:

Percentage and Gini-Index of medal distribution by nations

Up until 1932 3 out of 4 nations won at least some medals. Then the percentage dropped down to levels around 40% and lower since the sixties. That means 6 of 10 nations go home without any medals. During the same time period the inequality grew from Gini of about .65 to near .90 One exception were the Third Games in 1904 in St. Louis. With only 13 nations competing the United States dominated so many sports to yield an extreme Gini of .92 All of the last five Games resulted in a Gini of about .86, so this still very large amount of medal winning inequality seems to have stabilized.

It would be interesting to extend this to the level of participating athletes. Of course we know which athlete ranks at the top as the most decorated Olympic athlete of all time: Michael Phelps with 22 medals.

 
1 Comment

Posted by on August 15, 2012 in Recreational

 

Tags: , , , , , ,

Keystroke Biometrics using Mathematica

Keystroke Biometrics using Mathematica

A few weeks ago Paul-Jean Letourneau posted an article on Wolfram’s Blog about using Mathematica to collect and analyze keystroke metrics as a way to identify individuals. The article analyzes how you type, measuring the time intervals between your typing the individual characters using a little interactive widget, collecting and visualizing the data while you repeatedly type in the word “wolfram”.

Keystroke metrics of 50 trials typing the word “wolfram”

 

It is somewhat interesting at this point to analyze one’s one typing style. For example there appears to be a bi-modal distribution of the time intervals between keystrokes, with the sequence “r-a” taking me almost twice as long (~130ms) as most other sequences (~60-70ms). There is also a ‘learning’ effect visible in my 50 trials, where the speed improves noticeably after about 20 repetitions or so. However, there are occasional relapses into a much slower typing pattern throughout the rest of the trials.

However, what I thought was more interesting is the subsequent analysis the author did across a set of 42 such series he obtained from his colleagues (noting humorously that “it just so happens that Wolfram is a company full of data nerds”). He then proceeds to analyze and visualize that data in various ways.

Distribution Histogram of keystroke intervals

He observes the bimodal nature of the distribution with peaks around 75ms and 150ms for different pairs of characters. In fact, averaging over all those pair typing times, a correlation is found indicating that when people type slower they are more consistent.

(Negative) Correlation of pairwise typing speed and consistency

The analysis continues with the observation that each measurement can be seen as a point in a six-dimensional space (six pair-transitions in a word with seven characters). When a person types this same word 50 times you get a cluster of 50 points in six-dimensional space. Different individuals will produce different clusters. So one can use the (built-in) function FindClusters to determine such clusters. However, since people have a certain amount of inconsistency in their typing, it is possible that sometimes one person’s typing will show up in another person’s cluster and vice versa. To measure the quality of the clusters to distinguish individuals, one can implement various measures. The author implements the Rand-index, a measure of the similarity between two data-clusterings. This gives a numeric accuracy on a scale from 0 to 1 for the ability to distinguish between a pair of two people. When looking across all pairs of 42 people – there are 21*41=861 different pairs, but the author chose to look at all 42*42=1764 pairs, as the FindCluster results depend on the sequence input data, so Rand[i,j] may be different from Rand[j,i] – you get the following histogram of Rand quality scores:

Histogram of Rand quality score for all pairs

This clearly shows that keystroke metrics for one word are not sufficient to reliably distinguish between arbitrary pairs of people. The average quality score is only 0.67. On the other hand, about 400 (~23%) of those quality scores are a perfect 1.0, so for about a quarter of the pairs it alone would suffice to reliably distinguish the two people typing. About half as many scores are 0.0, meaning that the clusters overlap so much that no distinction is possible. The remaining scores are distributed mostly between 0.5 and 1.0, meaning you would just guess right more often than wrong.

The author wraps up the post with this paragraph:

Using this fun little typing interface, I feel like I actually learned something about the way my colleagues and I type. The time to type two letters with the same finger on the same hand takes twice as long as with different fingers. The faster you type, the more your typing speed will fluctuate. The more your typing speed fluctuates, the harder it will be to distinguish you from another person based on your typing style. Of course we’ve really just scratched the surface of what’s possible and what would actually be necessary in order to build a keystroke-based authentication system. But we’ve uncovered some trends in typing behavior that would help in building such a system.

An interactive CDF widget embedded in the article allows you to collect and visualize the timing of your own typing. Source code as well as the test data is also shared if you want to further explore the details of this interesting analysis.

 
1 Comment

Posted by on July 20, 2012 in Linguistic, Scientific

 

Tags: , , , , , ,

London Tube Map and Graph Visualizations

London Tube Map and Graph Visualizations

The previous post on Tube Maps has quickly risen in the view stats into the Top 3 posts. Perhaps it’s due to many people searching Google for images of the original London tube map in the context of the upcoming Olympic Games.

I recently reviewed some of the classes in the free Wolfram’s Data Science course. If you are interested in Data Science, this is excellent material. And if you are using Mathematica, you can download the underlying code and play with the materials.

It just so happens that in the notebook for the Graphs and Networks: Concepts and Applications class there is a graph object for the London subway.

Mathematica Graph object for the London subway

As previously demonstrated in our post on world country neighborhood relationships, Mathematica’s graph objects are fully integrated into the language and there are powerful visualization and analysis functions.

For example, this graph has 353 vertices (stations) and 409 edges (subway connections). This one line of code  highlights all stations no more than 5 stations away from the Waterloo station:

HighlightGraph[london, 
  NeighborhoodGraph[london, "Waterloo", 5]]

Neighborhood Graph 5 around Waterloo

Since HighlightGraph and NeighborhoodGraph are built-in functions, this can be done in one line of code.

Export["london.gif",
  Table[HighlightGraph[london, 
    NeighborhoodGraph[london, "King's Cross St. Pancras", k]],
   {k, 0, 20, 1}]]

creates this animated GIF file:

Paths spreading out from the center

Shortest paths can easily be determined and visualized:

HighlightGraph[london, 
  FindShortestPath[london, "Amersham", "Woolwich Arsenal"]]

A shortest path example

There are many other graph functions such as:

GraphDiameter[london]   39
GraphRadius[london]     20
GraphCenter[london]     "King's Cross St. Pancras"
GraphPeriphery[london]  {"Watford Junction", "Woodford"}

In other words, the King’s Cross St. Pancras station is at the center, with radius up to 20 out into the periphery, and 39 the shortest path between Watford Junction and Woodford, the longest shortest path in the network.

Let’s look at distances within the graph. The built-in function GraphDistanceMatrix calculates all pairwise distances between any two stations:

mat = GraphDistanceMatrix[london]; MatrixPlot[mat]

Graph Distance Matrix Plot

For the 353*353 = 124,609 pairs of stations, let’s plot a histogram of the pairwise distances:

Histogram[Flatten[mat]]

Graph Distance Histogram

The average distance between two stations in the London subway system is about 14.

So far, very little coding has been required as we have used built-in functions. Of course, the set of functions can be easily extended. One interesting aspect is the notion of centrality or distance of a node from the center of the graph. This is expressed in the built-in function ClosenessCentrality

cc = ClosenessCentrality[london];
HighlightCentrality[g_, cc_] := 
   HighlightGraph[g, 
    Table[Style[VertexList[g][[i]], 
      ColorData["TemperatureMap"][cc[[i]]/Max[cc]]], 
        {i, VertexCount[g]}]];
HighlightCentrality[london, cc]

Color coded Centrality Map

Another interesting notion is that of BetweennessCentrality, which is a measure indicating how often a particular node lies on the shortest paths between all node-pairs. The following nifty little snippet of code identifies the 10 most traversed stations – along the shortest paths – of the London underground:

HighlightGraph[london,
 First /@ SortBy[
 Thread[VertexList[london] -> BetweennessCentrality[london]],
 Last][[-10 ;;]]]

10 most traversed stations

I have often felt that progress in computer science and in languages comes from raising the level of abstraction. It’s amazing how much analysis and visualization one can do in Mathematica with very little coding due to the large number of powerful, built-in functions. The reference documentation of these functions often has many useful examples (and is also available for free on the web).
When I graduated from college 20 years ago we didn’t have such powerful language platforms. Implementing a good algorithm for finding shortest paths is a good exercise for a college-level computer science course. And even when such pre-built functions exist, it may still be instructive to figure out how to implement such algorithms.
As manager I have always encouraged my software engineers to spend a certain fraction of their time searching for built-in functions or otherwise pre-existing code to speed up project implementation. Bill Gates has been quoted to have said:

“There is only one trick in software: Use a piece of code that has already been written.”

With software engineers, it is well known that productivity often varies not just by small factors, but by orders of magnitude. A handful of talented and motivated engineers with the right tools can outperform staffs of hundreds at large companies. I believe the increasing levels of abstraction and computational power of platforms such as Mathematica further exacerbates this trend and the resulting inequality in productivity.

 
1 Comment

Posted by on July 11, 2012 in Education, Recreational

 

Tags: , , , , , , ,

Interactive Tournament Map

Interactive Tournament Map

I hadn’t followed the UEFA 2012 European football championship (called soccer in the US) and wanted to catch up on where things stand. Enter the interactive tournament map on the official UEFA website:

Row selection highlights games at that stadium

When you first enter the map it animates the timeline from left to right by drawing the colored lines for each team. The tabular layout shows time in daily columns from left to right and teams in rows by 4 tournament groups. Today’s day column is always highlighted. Here are some of the interactive elements:

  • Mouse over any of the colored lines highlights the corresponding team’s games along it’s timeline.
  • Clicking on a particular day column header highlights the games played on that date.
  • Clicking on the stadium symbol at the right end highlights the games played at that stadium.
  • Clicking on any circle brings up a dialog with details for that game.
  • Clicking on a row header on the left brings up a dialog with details for that team.
  • Selecting the tournament stage at the bottom (quarter-, semi-, final) moves to the date interval.

Detail for team Spain

Spain is the reigning football world champion, so they are clearly one of the favorites of this tournament and will actually play their semi-final against Portugal later this evening.

The final will be played in the Olympic Stadium in Kyiv, capital of participating host country Ukraine.

Detail with game schedule for stadium

From these details you can click on the games and get to yet more detail (videos, comments, etc.) for that particular game.

When I first looked at the map, the amount of information displayed had me a bit confused. The color scheme is often difficult to separate, for example the three orange-red tones in Group B. The black background feels attractive, although I could do without the pattern overlay, which doesn’t add information and only distracts. Lastly, I could do without the colorful advertisements around the map. On first glance I thought the stadium symbols on the right were also just colored ads.

The interactive nature made the map grow on me. It’s intuitive and the tabular layout easy to navigate. You may not have a screen wide enough to see the map in its entirety, but I suppose you wouldn’t want to see time down the vertical axis, would you?

Postscript 7/1/12: Sure enough, Spain beat Italy 4:0 in today’s final and went on to become the European football champion 2012.

 
Leave a comment

Posted by on June 27, 2012 in Recreational

 

Tags: , , , , ,

Self-publishing to Apple bookstore

Self-publishing to Apple bookstore

Over the last couple of weeks I finished writing the book about my adventure of a lifetime: Panamerican Peaks, cycling from Alaska to Patagonia and climbing the highest mountain of every country along the way. By now I have successfully self-published the book to the Apple bookstore. This post gives a recap of the steps involved in that process, with a focus on the tools, logistics and finally some numbers and sales stats.

Disclaimer: In my personal life I am an avid Apple fan, and this post is heavily biased towards Apple products. In particular, the eBook is only available for the iPad. So the tools and publishing route described below may not be for everybody, but the process and lessons learnt may still be of interest.

Path to self-publishing on Apple bookstore

Creating Content

The first step is obviously to create, select and edit the content of the book. During the actual trip I tried documenting my experiences via the following:

  • Taking about 10,000 photos with digital camera (Olympus and Panasonic)
  • Taking daily notes with riding or climbing stats (on iPhone or NetBook)
  • Shooting about 200 video clips (Flip Mino)
  • Uploading photos (to Picasa) and videos (to YouTube)
  • Writing posts on my personal Blog

In the months after coming home I refined some of the above material. Using iMovie I created ~ 5 min long movies based on video clips, photos and map animations, typically with some iTunes song in the background and a bit of explanatory text or commentary. I shared those videos on my personal Blog and on my Panamerican Peaks YouTube channel.

I loaded all photos into Aperture on our iMac and tagged and rated them. That allowed me to organize them by topic or as required. The ‘Smart Folders’ feature of Aperture comes in handy here, as it allows to set up filters and select a subset of photos without having to copy them. For example, if I wanted photos rated 4 stars or higher related to camping, or photos of mountains in Central America, I just needed to create another Smart Folder. This was very useful for example for the Panamerican Peaks Synopsis video which features quick photo sets by topics (cycling, climbing, camping, etc.).

Google Earth proved to be a very useful tool as I could easily create maps of the trip based on the recorded GPS coordinates from my SPOT tracker. One can even retrace the trip in often astonishing detail thanks to Google Street View. For example, in many places along the Pacific Coast I can look at campgrounds or road-side restaurants where I stopped during my journey. I even created a video illustrating the climbing route on Mount Logan from within Google Earth.

The heart and soul of any book is of course the story and the text used to tell it. I created multiple chapters using MS Word because I am so used to it, but one can of course use any modern text writing tool. In addition, I created some slides for presentations I gave last summer using Keynote.

Book Layout

Once all the ingredients were available, it was time to compose the actual book. As I had decided to build an eBook for the iPad I used Apple’s new iBooks Author tool on my MacBook Pro. This meant choosing the layout and including the text and media. iBooks provides a few interactive widgets and accepts all widgets that can be installed into the OS X dashboard. This in particular allowed me to link to the various YouTube videos. I could always get a preview of the book copied out to my attached iPad 2.

After many weeks of busy work putting the finishing touches on the book and adding various edits from a few trusted friends I got to the point where I needed to figure out how to get the book published in Apple’s bookstore. There are two steps required here:

  1. Creating a developer account with Apple via iTunes Connect
  2. Managing one’s content via iTunes Producer

The creation of the account is fairly straightforward through the web browser. To get started, I visited Apple’s Content Provider FAQ page and filled out an application. One submits basic information such as name, address, tax ID, credit card information, and ties it all to an existing Apple account. It can take a while. I never received the account validation email I was promised. So after a few days I started inquiring in Apple’s support forum. This had happened to others. Finally I just tried connecting via web browser to itunesconnect.apple.com and it worked – I had an account to publish from.

The packaging of all material and uploading is done via the free iTunes Producer app on the Mac. iBooks Author exports the book in .ibooks format, which becomes part of the iTunes Producer package. One can also provide a free sample for the book. This can be any subset or variation of the full book, unlike with Amazon’s bookstore, where the free sample is always the first N pages.

Next, one needs to provide additional metadata such as book category, description, author name, optional sample screen shots etc. One also has to provide an ISBN (International Standard Book Number) for the book. These can be obtained from publishers or directly purchased from Bowker. This stems from the need to catalogue and identify physical books in inventory or libraries, but seems a bit anachronistic for electronic books. The prices for ISBNs are very high, especially for small volumes (1 for $125, 10 for $250, 100 for $500, 1000 for $1000). But since Bowker has a monopoly in the US you don’t have a choice in that matter. This expense seemed to be the only marginal upfront cost to publishing the book (aside from the tools to create the content).

Finally one can determine the pricing and the markets where the content is to be sold. Apple follows the agency model of book publishing: As author you get to set the price. As distributors they take a share of your proceeds, here 30%. (By contrast, in the wholesale model you sell to the distributor at a discount, say 50% of the suggested retail price; the seller has sole discretion to set the price.)

Book Review

Much has been written about the very restrictive terms and conditions Apple puts on authors using their iBooks Author tool. Essentially it locks you in as an author to sell only through Apple. For many authors that is not a viable option. It also allows Apple to reject your work at their sole discretion. So as an author you are completely at the mercy of Apple’s review process.

Apple is also strict with enforcing certain rules regarding the content it allows you to sell. For example, your book cannot contain any links to YouTube videos or Amazon books. They rejected my first revision with YouTube links and suggested to embed all videos. This would have bloated the download size of the book by more than 1 GB. As a compromise, I created short 1 min teaser versions of all videos and included those. At the end they display a screen to go to the companion website (my personal Blog) for the full versions.

After 3 revision cycles and about a week later I finally had my book on sale in 24 countries around the world, for $9.99 or the equivalent in Euro or other countries’ currencies.

Book Marketing

Publishing is not selling. Here are some of the things I did to promote my own book:

  • Email – Customized note to Hotmail contacts (~ 300 contacts)
  • Twitter – Tweets and direct messages to influencers for retweets (~ 2000 followers)
  • FaceBook – My daughter posted on her wall (~ 1000+ friends)

Sending the emails was not without hiccups. I used MS Word and Outlook to do a mail merge with text blocks and individual text from an Excel spreadsheet. First, the Mail Merge Filter condition dialog has a bug which replicates the last AND condition and adds it as an OR condition. This screws up your filter and ends up selecting lots of folks you didn’t mean to. I found this bug during a test with the first 5 addresses. (I sent them each an apologetic email explaining this.) Then after I did the filtering all in the spreadsheet it worked and Outlook cranked out the emails. After a short while, Hotmail decided that my account had apparently been hacked and used for spam, so they locked my account down! In a way this is good, but I didn’t consider my carefully crafted and personalized emails spam. So I had to change my password and unlock my account again.
The email was very effective. I got lots of positive responses and a few folks decided to buy right away. I had sold my first copy. Every journey of 1000 miles starts with a single step.

As a result of my daughter posting the news on Facebook I noticed a spike (4x average) in the views of my Blog and Book page. I also offered promo codes for free book download to influential twitter users if they would retweet the book announcement to their followers. Within a couple of days a handful of them accepted the offer and retweeted, which exposed the tweet to a total of 2,000+ followers.

I had emailed the Apple bookstore, and to my delight they actually featured my book in their Travel & Adventure category.

My book featured in Apple’s bookstore, Travel & Adventure section

Book Sales

With all these promotion efforts I couldn’t wait until the next morning to see the sales numbers. (iTunes Connect updates their sales numbers only once a day.) I had the first ratings and reviews come in, all at 5-stars. Naturally, I hoped to see the sales numbers go up. After all, I had reached hundreds, if not thousands of people, most of which either know me or are somewhat interested in adventure. The result? Tiny sales numbers. To date after one week I have sold 14 copies, with a maximum of four (4) copies per day. At my $10 price and 70% share this amounts to just under $100 for the first week. Not exactly enough to retire on.

I’ll revisit this topic at some point in the future when I have more data. Obviously, the iPad is just a fraction of the entire book market with Kindle, Nook and other devices. (Although, the iBook looks much better on the iPad than on many other readers, in particular the smaller black & white e-Ink display Kindle readers.) While the selection of titles seems comparable on Apple’s and Amazon’s bookstores, about 1.35 million each (see a spreadsheet of my recent sample here), there don’t appear to be many shoppers in Apple’s bookstore. Of course, Travel & Adventure is only a small fraction of the book market. But even there, on a day where I sold two copies my book briefly ranked 30.th in the Top Charts. 30.th out of 11,800 titles (in Travel & Adventure)! That means the other 11,770 titles sold even less than mine (i.e. one or none) during the sampling time interval. Book sales appear very unevenly distributed, another case of huge online inequality.

But more importantly, most of the people reached by my promotional efforts don’t engage to the level of actually following the links, downloading the sample and finally buying the book. From my experience, one needs to reach more than 100 people for every one book sold. Fellow adventure traveller and author Andrew Hyde – whose book coincidentally is featured just above mine in the screen-shot above – has recently written about his book sales here. His stats show a similar small fraction of sales to views. I just don’t have the millions of Twitter followers to generate meaningful sales this way!

 
1 Comment

Posted by on June 21, 2012 in Recreational

 

Tags:

Visualign Blog – View Stats for first year and a half

Visualign Blog – View Stats for first year and a half

I started this Data Visualization Blog back at the end of May 2011. WordPress provides decent analytics to measure things like views, referrer, clicks, etc. The built-in stats show bar charts by day/week/month, views by country, top posts and pages, search engine terms, comments, followers, tags and so on. I have accumulated the view data and wanted to share some analysis thereof.

At this point there are 17,000 views and 56 posts (about 1 post per week). The weekly views have grown as follows:

Weekly Views of Visualign Blog

The WordPress dashboard for monthly views looks like this:

Assuming an exponential growth process this amounts to a doubling roughly every 3 months. This may not sound like much, but if it were to continue, it would lead to a 16x increase per year or a 4096x increase in 3 years. Throughout the first year this model has been fairly accurate and allowed to predict when certain milestones would be reached (such as 10k views, reached in Apr-2011 or 100k views, predicted by Jan-2013).

However, the underlying process is not a simple exponential growth process. Instead it is the result of multiple forces, some increasing, some decreasing, such as level of interest of fresh content for target audience, rather short half-life of web content, size of audience, frequency of emails or tweets with links to the content etc. So I expect growth to slow down and consequently the 100k views milestone to be pushed out past Jan-2013.

Views come from some 112 countries, albeit very unevenly distributed.

Views by Country (10244 views since Feb-25, 2012)

The Top 2 countries (United States and United Kingdom) contribute nearly half of the views, the Top 10 (9%) countries nearly 75% of all views. The fairly high Gini index of this distribution (~0.83) indicates strong dependency on just a few countries. The only surprise for me in the Top 10 list was South Korea, ranking fifth and slightly ahead of India. Germany is probably a bit over-represented due to my German business partner (RapidBusinessModeling) and related network.

Views by country with Top 10 list

One interesting analysis comes from looking at the distribution of views over weekdays. Not every weekday is the same. Thursdays are the busiest, Saturday the quietest days. After a little more than one year, averaging over some 56 weeks, the distribution looks like this.

Weekday variation of Blog views averaged over 1st year

Of course, time zone boundaries may cause some distortions here, but it looks like the view activity builds during the week until it hits a peak on Thursday. Then it falls sharply to a low on Saturday, and builds from there again. This fits with intuition: One would expect the weekend days to be low as well as Monday and Friday to be lower than the mid-week days. It’s tempting to correlate that with the amount of work or research getting done by professionals. The underlying assumption is that people discover or revisit my Blog when it fits into their work.

A large fraction (> 65%) of referrals comes from search engines. Within those, it’s mostly Google (>90% summed across many countries) with just a small amount of others like Bing. It’s safe to say that without Google search my Blog would have practically no views. Chances are that your first exposure to this Blog came from a Google search as well. One unexpected insight for me was to see a high ratio of image to text searches, typically 3:1 or 4:1. In some ways it shouldn’t be surprising that a blog on data visualizations gets discovered more often by searching for visual elements than for text. It also jibes with the enormous growth of image related sites such as Instagram or Pinterest. I just would not have expected the ratio to be that high.

The beginning is always slow. But any exponential growth sooner or later leads to rather large numbers. So the real question is how one can keep the exponential growth process going? I’d love to hear your comments. If you want to compare this against your own Blog stats, I have shared the underlying data as a Google doc here. I have no idea how this compares to other blog stats in similar domains. If you know of any other public Blog stats analysis, please comment with a pointer below. Thanks.

Addendum 7/11/2012: Today my Blog reached 20,000 views. I noticed over the last few weeks that the deviation from an exponential growth model was getting quite large. For an exponential trend line R² = 0.9886.

Daily views with 20,000 total view milestone

When instead modeling the weekly views on a linear growth rate, this gives the total views a quadratic growth. Curve fitting the total views with a 2nd order polynomial yields a very good fit (R² = 0.9977).

Total views growth curve with quadratic curve fit

Linear growth of weekly views is compatible with approximately linear increase in content (steady frequency of about 1 post / week) and thus increased chance of Google search indexing new content (with Google search the main source of view traffic). Quadratic growth of total views is also nonlinear, but far slower than exponential growth. For example, the 100,000 view milestone is now projected to be reached in 08/2013 instead of in 01/2013, i.e. in 13 months as compared to 7 months.

Addendum 11/1/2012: The Blog reached 30000 views on Oct-19 and here is a chart of the monthly views through Oct-2012:

Monthly Blog views through Oct-2012

August and September have been slow, presumably seasonal variation. I also didn’t post between late August and mid October. The view data of the last couple of months no longer support the theory of significant growth in view frequency. Instead, multiple dynamic factors come into play. At times views spike due to a mention or a post of temporary interest – such as the recent post on visualizing superstorm Sandy. But such spikes quickly fade away according to the very limited half-life of web information these days. The undulating 4 week trailing average in weekly views below visualizes this clearly. The net effect has been a plateau in view frequency around 3000 per month.

Weekly Views with average Nov 2012

I continue to see most of the referrals coming from Google searches, still with a majority of those being image searches. Engagement growth has been anemic, with relatively few comments, back links or other forms of engagement. It seems to me that growth proceeds in phases, with growth spurts interspersed by plateaus of varying length. One such growth spurt has been reported by Andrei Pandre on his Data Visualization Blog through the use of Google+. Perhaps it’s time to extend this Blog to Google+ as well.

Variation of views by weekday

With regard to variation of views by weekday, the qualitative pattern remains. Tuesday is now emerging as the day with the most views, with Monday, Wednesday, and Thursday slightly behind, but still above average. Friday is slightly below average, Saturday is the lowest day with only half the views and Sunday in between.

I’m not sure whether to conclude from that that important posts should be published on a particular weekday. Again, most views come from Google searches and are accumulated over time, so perhaps only the height of the initial spike will vary somewhat based on the publishing weekday.

 
Leave a comment

Posted by on June 12, 2012 in Scientific

 

Venn Diagrams

Venn Diagrams

The private library Blog had a post with some word play relating to sound, spelling and meaning of words in the English language. From their post on Homographic Homophones:

English is one of the most difficult languages in the world for a non-native speaker to learn.  One of the reasons why this is so is that English has a large number of words that are pronounced the same as other words (i.e., they are homophones) even though they have quite different meanings.  Homophones such as parepair and pear, for example, have the same pronunciation but are spelled differently and have different meanings (heterographic homophones).  Other homophones — tender (locomotive),tender (feeling) and tender (resignation), for instance — are spelled the same and pronounced the same (homographic homophones) but have different meanings (i.e., they are homonyms).

Got all that?  Wikipedia has a nice Venn diagram that may help you sort it out:

Venn Diagram displaying meaning, spelling, and pronunciation of words (Source: Wikipedia)

Of course, you could also list the above combinations in a table. If you’re interested, Carol Moore has done just that on her Buzzy Bee riddle page.

A beautifully symmetric 5 set Venn diagram drawn from ellipses has been proposed by Branko Grünbaum and drawn by Wikipedia contributor Cmglee:

Symmetrical_5-set_Venn_diagram (Source: Wikipedia)

Such set-based diagrams invite a more mathematical notation. Cmglee annotates his image with this snippet:

Labels have been simplified for greater readability; for example, A denotes A ∩ Bc ∩ Cc ∩ Dc ∩ Ec (or A ∩ ~B ∩ ~C ∩ ~D ∩ ~E), while BCE denotes Ac ∩ B ∩ C ∩ Dc ∩ E (or ~A ∩ B ∩ C ∩ ~D ∩ E).

If you search the Wolfram Demonstration Project for ‘Venn Diagram’, you get several interactive diagrams.

Venn Diagram Demonstration Projects (Source: Wolfram Demonstration Project)

These diagrams are interactive. For example, they allow you to click on any subset and then have that set highlighted and the corresponding mathematical set notation displayed accordingly. Interesting and fun to learn.

Speaking of fun: Venn diagrams are also effectively used in many different areas, two of which I’d like to leave you with here:

Data Science Venn Diagram (Source: drewconway.com)

And last but not least, Stephen Wildish’s Pancake Venn Diagram:

 
Leave a comment

Posted by on June 10, 2012 in Linguistic, Scientific

 

Tags: , , ,

Graphic comparing highest mountains

Graphic comparing highest mountains

In mountaineering, 8000m peaks are the ultimate test of high-altitude climbing. It so happens that there are 14 Eight-thousanders. In 1986 Reinhold Messner became the first person to have climbed all 14 8000m peaks. It has become a coveted trophy of mountaineering, with only about 30 people having done so since.

A different, but somewhat related challenge is to climb the highest mountain on every continent, the so-called Seven Summits. This was first completed by Dick Bass in 1985. It has become a more mainstream mountaineering challenge, and about 300 people have repeated that feat. That has also lead to significant and often problematic overcrowding on those seven summits.

Interestingly, it was noted that the second highest mountain on each continent is typically harder to climb than the highest. Hence yet another challenge was born to complete the first ascent of the Seven Second Summits. Hans Kammerlander claims to have done so in 2010 – although some doubts have arisen regarding whether he stood on the right summit on Mount Logan, Canada. Others have suggested combining the Seven Summits and the Seven Second Summits, giving again 14 peaks.

On the Wikipedia page I found an interesting graphic comparing the 14 Eight-thousanders with the Seven + Seven Second Summits. It was created by Cmglee and shared on the Wikipedia page.

Comparison of highest mountains (Source: Wikipedia)

This is an interesting chart, created as .svg file and thus rendering in high definition on large wide-format screens. It is also interesting to follow the revision history on the talk page and the suggestions about coloring and labeling coming from interested readers. In some ways, this shows how published charts can be improved collaboratively. Contributor ‘Cmglee’ has contributed several .svg graphics to Wikipedia as per the User talk page, including a 5-set Venn diagram, life-expectancy bubble charts and Earthquake intensity bubble charts.

I have a personal interest in mountaineering. In 2009-2010 I embarked on my own adventure of a lifetime called the ‘Panamerican Peaks’. Cycling between Alaska and Patagonia (Panamerican Highway) and Climbing the highest mountain of every country along the way. You can find out more about that adventure on my Panamerican Peaks website. Coincidentally, there are a minimum of 14 countries and peaks in that set as well: United States, Canada, Mexico, Guatemala, El Salvador, Honduras, Nicaragua, Costa Rica, Panama, Colombia, Ecuador, Peru, Chile, Argentina.

Position and elevation of 14 Panamerican Peaks

Prior to starting my adventure journey I had mapped out the height of those 14 mountains. Interestingly, except for a few peaks in Central America, the country high-points get higher the further North or South they are located.

Heights of 14 Panamerican Peaks

Four of those peaks are included in the Seven (Second) Summit lists above: Denali, Logan (North America) and Aconcagua, Ojos (South America). It would be great to include the other 10 Panamerican Peaks in a similar graphic. About time for me to look into generating .svg graphics…

And sure enough, Wikipedia contributor Cmglee provided me with a version of the above .svg chart comparing the 14 Panamerican Peaks with the 14 Seven (Second) Summits as follows:

Comparison of 14 Panamerican Peaks with Seven (Second) Summits

Thanks to Cmglee for the quick turn-around.

 
1 Comment

Posted by on June 4, 2012 in Recreational

 
 
%d bloggers like this: