RSS

Personal Analytics with the Suunto Ambit


Suunto Ambit

Half a year ago my wife bought me the Suunto Ambit multi-function sport watch and heart rate monitor. It is a fantastic device, with very precise GPS, lots of add-on functionality and an interesting online portal and community.

There is some configuration and setup involved, such as pairing the Ambit with your heart rate belt and in the case of cycling with a cadence pod. You charge the batteries by plugging it into a USB port, which is also the way how you upload the data form the device to your computer or a website.

While the device itself and its programmability is quite advanced, I want to focus here on the associated online portal called Movescount.com where you can upload and visualize all your data for free – and share it with friends or the community if you’re so inclined. This amount of personal data collecting and analyzing is a fairly recent phenomenon, often referred to as Personal Analytics.

Each recorded session with the Suunto can be uploaded and classified into one of many sports, such as hiking, cycling, basketball, or indoor exercise. Each session is called a move, and with the portal you can collectively visualize all your moves. The current theme at movescount.com has a black background with mostly orange bars and charts. One of the first controls to organize your moves is either a list or a calendar control.

Calendar Control for Moves

Calendar Control for Moves

This already gives you a good overview over the type of sports activities and the distribution over weekday and weekends. A summary display is available in various forms, such as the following simple bar charts.

Summary information about heart rate zones

Summary information about heart rate zones

Another display format summarizes your selected moves, such as all moves in a particular month together with commensurate calorie consumption and breakdown of hours by type of move.

Moves Summary Display

Moves Summary Display

You can now select either a single or multiple moves (or group by the type of move) and display more information about this particular move. Note the x-axis can be set to display either distance or time and one can zoom in on any part of the entire recorded move. One can alos overlay multiple measures in the same chart by selecting more than one factor, although I find this to lead to very busy and confusing charts.

Graph and BarChart details per move

Graph and BarChart details per move

There are many individual measurements available for display, some based on individual sensors (like heart rate or GPS location or temperature or altitude / air pressure), others based on calculations and estimates (such as speed, recovery parameter “R&R”, EPOC or VO2).

MapGraph

Of particular interest to me as a cyclist is the ability to overlay the GPS-track on a Google map. Not only is it a very detailed recording of the route, but it is color-coded based on the currently selected measure. For example, the color-range shows the heart rate in the same colors as the above bar charts. One can clearly see where one is just warming up at the beginning (low heart rate, green color) or where one is riding up “into the red”, i.e. towards the limits of one’s own heart rate. Selecting points along the route displays some information about that particular point of the ride.

One interesting feature would be a time-geo correlation of any portable photo camera when taking pictures along the ride. Based on synchronized time one could then easily geo-code the photos even without any GPS capability within the camera itself.

The Suunto Ambit can do a lot more, including customizing the display mode and storing your configurations in so-called apps. One idea I have for this is to display an estimate of the total calorie consumption for a known route when continuing at the current pace (but I haven’t played with the programming yet). The Ambit seems to be particularly well suited to hiking, mountain biking or skiing due to its altimeter; however I don’t get to leverage that in flat Florida. Only the few bridges over the Intracoastal waterway show up as bumps in the vertical – with the corresponding acceleration of the heart rate on the uphill side.

One of the downsides is the fact that the heart rate sensor worn around the chest does not work in the water. Hence any swimming in the Ocean or the pool can not be measured precisely. (I replace the measurements with estimates.) And sure enough, just recently Suunto announced the new Ambit 2, which overcomes this limitation. Such is the world of new electronic toys, that the half-life of their innovation is getting shorter and shorter.

Bubble Chart of set of rides

Bubble Chart of set of rides

Measures in Bubble Chart

Measures in Bubble Chart

One last chart I wanted to point out is the flexible bubble chart. Shown above is a selection of all my rides in the first half of 2013 (47 rides minus two outliers, very long rides which would have changed the scale and compressed the rest of the chart). This gives a good feel for the distribution and variance of personal rides over a longer period of time – from the quick half hour duration to the more typical rides of a good 2 hours. Note that one can select any of about 30 measures in any of the three drop-down boxes (X-Axis, Y-Axis, Bubble-Size).

One side-effect of measuring and visualizing so many moves is that we find some interesting differences in our respective exercise habits and corresponding energy consumption. While I burn most of my calories on the bicycle, my wife gets more exercise out of indoor circuit exercises and Yoga than I do. For me, after literally decades of recreational cycling, I can raise my heart rate to much higher levels for extended periods of time on the bike compared to indoor circuit exercises. In a way that is not surprising, given the strength and oxygen consumption of the large leg muscles compared to smaller shoulder and arm muscles. But I would not have expected the difference to be so pronounced and could not have quantified it nearly as precisely as without such personal analytics.

It can be expected that the field of healthcare and personal analytics will converge and provide much more personalized data and insight into the specific life of any patient. Medical indicators like heart rate, blood sugar, blood pressure or factors like exercise and diet will become much more quantifiable and individually tracked over time. The hope is that this will also lead to better, more personal and generally more preventive care and medical treatments to any personal condition.

 
Leave a comment

Posted by on June 30, 2013 in Recreational

 

Tags: , ,

Apple, Amazon, Google, Microsoft

Last year we looked at the four companies and compared their business model over two quarters: Apple (hardware), Amazon (retail), Google (advertising), Microsoft (software). It struck me how far the integrated Wolfram Alpha technology has come in the last two years. It combines the symbolic computing capabilities of the Mathematica platform with curated data (for example financial data) and some pretty impressive linguistic analysis capabilities for free-form text input.

For example, in Wolfram Alpha, just type in the following query: “Googe vs. Amazon vs. Apple vs. Microsoft” The results are shown as a series of three screen-shots below:

ComparisonWolframAlpha1

ComparisonWolframAlpha2

ComparisonWolframAlpha3

Not only do you get the various data such as the fundamentals or the analysis of a mean-variance optimal portfolio displayed, but you get all the code needed to programmatically load such data. For example, if you want to get the breakdown of the analyst ratings, the system will expand it as follows:

AnalystRatings

So far we haven’t done any coding or bothered with integrating any data source. This amount of integration and automation is pretty impressive. I am often surprised how few companies are taking advantage of such advanced technology platforms.

 
1 Comment

Posted by on April 28, 2013 in Financial

 

Tags: , , ,

Visualizing Conversion Rates (Funnels, Bullet Charts)

Most sales processes go through a series of stages, from first contact through successive engagement of the potential client to the close. One can think of these as special cases of attrition-based workflows. These are very typical in online (B2C eCommerce) or tele-marketing (call centers) and companies usually collect a lot of data around the conversion rates at each step. How can one visualize such workflows?

One metaphor for these processes is that of a sales funnel. A lot of leads feed in on one side, and at each stage fewer pass through to the next. It is then straightforward to want to visualize such funnels, such as shown here by Tableau.

Tutorial video explaining how to create a funnel chart using Tableau

Tutorial video explaining how to create a funnel chart using Tableau (Source: Tableau Training Video)

Aside from the somewhat tedious steps involved – custom field calculations, left-right duplication of the chart, etc. – it turns out, however, that funnel charts are not easy to interpret. For example, it is not well suited to answer the following questions:

  • What’s the percentage reduction at each step?
  • Comparing two or more funnels, which has better conversions at each step?
  • What are the absolute numbers in each step?
  • Are the conversion rates above or below targets at each step?

Ash Maurya from Spark59 wrote a very useful article on this topic entitled “Why not the funnel chart?“. In it he looks specifically at comparisons of funnels (current vs. prior time intervals or A|B testing).

Time Series comparison of funnel performance (Source: Ash Maurya's Spark59 Blog)

Time Series comparison of funnel performance (Source: Ash Maurya’s Spark59 Blog)

In a next step he shows that the funnel shape doesn’t add to the readability. Instead simple bar charts can do just as well:

Same information with Bar Charts (Source: Ash Maurya's Spark59 Blog)

Same information with Bar Charts (Source: Ash Maurya’s Spark59 Blog)

For a multi-step funnel, the problem remains that with the first step set to 100%, subsequent steps often have fairly small percentages and thus are hard to read and compare. Suppose you are sending emails to 100,000 users, 30% of which click on a link in the email, of which only 10% (3% of total) proceed to register, of which only 10% (0.3% of total) proceed to subscribe to a service. Bars with 3% or even 0.3% of the original length will be barely visible. One interesting variation is to normalize each step in the funnel such that the new, expected conversion number (or that from the prior period) is scaled back to 100%. In that scenario it is easy to see which steps are performing above or below expectations. (Here big jump in Registrations from Jan to Feb, then small drop in Mar.)

Bar Charts with absolute vs. relative numbers

Bar Charts with absolute vs. relative numbers

Next, Ash Maurya uses the Bullet Chart as introduced by Stephen Few in 2008. The Bullet Chart is a variation of a Bar Chart that uses grey-scale bands to indicate performance bands (such as poor – ok – good) as well as a target to see whether the performance was above or below expectations. The target bar allows to combine two charts into just one, giving a compact representation of the relative performance:

Funnel Chart showing funnel performance (Source: Ash Maurya's Spark59 Blog)

Bullet Chart showing funnel performance (Source: Spark59 Blog)

Various authors have looked at how to create such bullet charts in Excel. For example Peltier Tech has looked at this in this article called “How to make horizontal bullet graphs in Excel“. There is still quite some effort involved in creating such charts, as Excel doesn’t directly support bullet charts. Adding color may make sense, although it quickly leads to overuse of color when used in a dashboard (as Stephen Few points out in his preference for grey scales).

Funnel Graphs in Excel (Source: Peltier's Excel Blog)

Bullet Graphs in Excel (Source: Peltier’s Excel Blog)

Another interesting approach comes from Chandoo with an approximation of a bullet graph in cells (as compared to a separate Excel chart object). In this article “Excel Bullet Graphs” he shows how to use conditional formatting and custom formulae to build bullet graphs in a series of cells which can then be included in a table, one chart in each row of the table.

In-Cell Funnel Graph in Excel (Source: Chandoo's Blog)

In-Cell Bullet Graph in Excel (Source: Chandoo’s Blog)

It is somewhat surprising that modern data visualization tools do not yet widely support bullet charts out of the box.

Measuring how marketing efforts influence conversions can be difficult, especially when your customers interact with multiple marketing channels over time before converting. To that end, Google has introduced multi-channel funnels (MCF) in Google Analytics, as well as released an API to report on MCFs. This enables new sets of graphs, which we may cover in a separate post.

 
4 Comments

Posted by on March 31, 2013 in Industrial

 

Tags: , , ,

Magic Quadrant Business Intelligence 2013

It’s that time of the year again: Gartner has released its report on Business Intelligence and Analytics platforms. One year ago we looked at how the data in the Magic Quadrant – the two-dimensional space of execution vs. vision – can be used to visualize movement over time. In fact, the article Gartner’s Magic Quadrant for Business Intelligence became the most viewed post on this Blog.

I had also uploaded a Tableau visualization to Tableau Public, where everyone can interact with the trajectory visualization and download the workbook and the underlying data to do further analysis. This year I wanted to not only add the 2013 data, but also provide a more powerful way of analyzing the dynamic changes, such as filtering the data. For example, consider the moves from 2012 to 2013 of some 21 vendors:

Gartner's Magic Quadrant for Business intelligence, changes from 2012 to 2013

Gartner’s Magic Quadrant for Business intelligence, changes from 2012 to 2013

It might be helpful to filter the vendors in this diagram, for example to show just niche players, or just those who improved in both vision and execution scores. To that end, I created a simple Tableau dashboard with four filters: A range of values for the scores of both vision and execution scores, as well as a range of values for the changes in both scores. The underlying data is also displayed for reference, which can then be used to sort companies by ordering along those values.

Here is an example of the dashboard set to display the subset of 15 companies who increased either both or at least one of their vision or execution scores without lowering the other one.

Subset of companies who improved vision and/or execution over the last year.

Subset of companies who improved vision and/or execution over the last year.

That’s more than 70% of platforms, with the increase in vision being more pronounced than that of execution. That’s considerably more than in the previous years (2013: 15; 2012: 6; 2011: 6; 2010: 3; 2009: 9) – making this collective move to the top-right perhaps a theme of this year’s report.

Who changed Quadrants? Who moved in which dimension?

Last year Tibco (Spotfire) and Tableau were the only two platforms changing quadrants, then becoming challengers. This year both of them “turned right” in their trajectory and crossed over into the leaders quadrant due to strong increases in their vision capabilities. (QlikTech had been on a similar trajectory, but already crossed into the leader quadrant in 2012. It also strengthened both execution and vision again this year.)

Another new challenger is LogiXML. Thanks to ease of use, enhancements from customer feedback and a focus on the OEM market its ability to execute increased substantially. From the Gartner report summary on LogiXML:

Ease of use goes hand-in-hand with cost as a key strength for LogiXML, which is reflected by its customers rating it above average in the survey. The company includes interfaces for business users and IT developers to create reports and dashboards. However, its IT-oriented, rapid development environment seems to be most compelling for its customers. The environment features extensive prebuilt elements for creating content with minimal coding, while its components and engine are highly embeddable, making LogiXML a strong choice for OEMs.

A few other niche players almost broke into new quadrants, including Alteryx (which had the biggest overall increase and almost broke into the visionary quadrant), as well as Actuate and Panorama Software.

The latter two stayed the same with regards to execution (as did SAP) – while all three of them moved strongly to the right to improve on the vision score (forming the Top 3 of vision improvement).

Information Builders and Oracle stayed where they were, changing neither their execution nor vision scores.

Microsoft and Pentaho stayed about the same on vision, but increased substantially in their execution scores.  This propelled Microsoft to the top of the heap on the execution score, while it moved Pentaho from near the bottom of the heap to at least a more viable niche player position. Microsoft’s integration of BI capabilities in Excel, SQL Server and SharePoint as well as leveraging of cloud services and attractive price points make it a strong contender especially in the SMB space. Improvements of its ubiquitous Excel platform give it a unique position in the BI market. From the Gartner report:

Nowhere will Microsoft’s packaging strategy likely have a greater impact on the BI market than as a result of its recent and planned enhancements to Excel. Finally, with Office 2013, Excel is no longer the former 1997, 64K row-limited, tab-limited spreadsheet. It finally begins to deliver on Microsoft’s long-awaited strategic road map and vision to make Excel not only the most widely deployed BI tool, but also the most functional business-user-oriented BI capability for reporting, dashboards and visual-based data discovery analysis. Over the next year, Microsoft plans to introduce a number of high-value and competitive enhancements to Excel, including geospatial and 3D analysis, and self-service ETL with search across internal and external data sources.

The report then goes on to praise Microsoft for further enhancements (queries across relational and Hadoop data sources) that contribute to its strong product vision score and “positive movement in overall vision position”. This does not seem consistent with the presented Magic Quadrant, where Microsoft only moved to the top (execution), not to the right (vision). Perhaps another reason for Gartner to publish the underlying coordinate data and finally adopt this line of visualization with trajectories.

Deteriorate2013

Dashboard with filters revealing two platforms deteriorating in both vision and execution

Only two vendors saw their scores deteriorate in both dimensions: MicroStrategy gave up some ranks, but remains in the leader quadrant. The report cites a steep sales decline in 3Q12 and the increased importance of predictive and prescriptive analytics in this years evaluation among the reasons:

MicroStrategy has the lowest usage of predictive analytics of all vendors in the Magic Quadrant. A reason for this behavior might be the user interface that is overfocused on report design conventions and lacks proper data mining workbench capabilities, such as analysis flow design, thus failing to appeal to power users. To address this matter, MicroStrategy should deliver a new high-end user interface for advanced users, or consumerize the analytic capabilities for mainstream usage by embedding them in Visual Insight.

The other vendor moving to the bottom-left is arcplan, which is now at the bottom of the heap in the niche players quadrant.

Who moved to the top-left?

With the dashboard at hand, you can also go back and do similar queries not just for the current year 2013, but any of the five previous years. For example, who has moved to the top-left – improved execution at the expense of reduced vision – over the years?

In 2013 those were Targit, Jaspersoft, Board International. All three of them had a sharp drop in Execution in the previous year 2012. A plausible scenario of what happened is that these companies lost their focus on execution, dropped the scores and in an attempt to turn-around focused on executing well with a smaller set of features (hence lower vision).

In 2012 the only vendor to display a move to the top-left was QlikTech. They had some sales issues the prior year as well, although their trajectory in 2011 was only modestly lower in execution, mostly towards higher vision.

In 2011 Actuate and Information Builders moved to the top-left. Both had trajectories to the bottom-left the prior year (2010), with especially Actuate losing a lot of ground. With the Year slider on the top-left of the dashboard one can then play out the trajectory while the company filters remain, thus showing only the filtered subset and their history. Actuate completed a remarkable turn-around since then and is now positioned back roughly where it was back in 2010.

Dashboard with analysis of top-left moving companies.

Dashboard with analysis of top-left moving companies.

 

(Click on the image above or here to go to the interactive Public Tableau website.)

In 2010 there were five vendor moving to the top-left: Oracle, SAS, QlikTech, Tibco (Spotfire) and Panorama Software. Although in that case none of them did show a decrease in execution the previous year. That focus on execution may simply have been the result of the economic downturn in 2009.

Such exploratory analysis is hard to conceive without proper interactive data visualization. Given the focus of all the vendors it covers in this report, it seems somewhat anachronistic that Gartner in its report does not leverage the capabilities of such interactive visualization itself. In the previous post on Global Risks we have seen how much value that can add to such thorough analysis. (Much of this dashboard should be applicable for risk analysis as well, just that the two-dimensional space changes from platform vision vs. execution to risk likelihood vs. impact!) If Gartner does not want to drop on its own execution and vision scores, they better adopt such visualization. It’s time.

 
3 Comments

Posted by on February 12, 2013 in Industrial

 

Tags: , , , ,

Visualizing Global Risks 2013

Visualizing Global Risks 2013

A year ago we looked at Global Trends 2025, a 2008 report by the National Intelligence Commission. The 120 page document made surprisingly little use of data visualization, given the well-funded and otherwise very detailed report.

By contrast, at the recent World Economic Forum 2013 in Davos, the Risk Response Network published the eighth edition of its annual Global Risks 2013 report. Its focus on national resilience fits well into the “Resilient Dynamism” theme of this year’s WEF Davos. Here is a good 2 min synopsis of the Global Risks 2013 report.

We will look at the abundant use of data visualization in this work, which is published in print as an 80-page .pdf file. The report links back to the companion website, which offers lots of additional materials (such as videos) and a much more interactive experience (such as the Data Explorer). The website is a great example of the benefits of modern layout, with annotations, footnotes, references and figures broken out in a second column next to the main text.

RiskCategories

One of the main ways to understand risks is to quantify it in two dimensions, namely its likelihood and its impact, say on a scale from 1 (min) to 5 (max). Each risk can then be visualized by its position in the square spanned by those two dimensions. Often risk mitigation is prioritized by the product of these two factors. In other words, the further right and/or top a risk, the more important it becomes to prepare for or mitigate it.

This work is based on a comprehensive survey of more than 1000 experts worldwide on a range of 50 risks across 5 broad categories. Each of these categories is assigned a color, which is then used consistently throughout the report. Based on the survey results the report uses some basic visualizations, such as a list of the top 5 risks by likelihood and impact, respectively.

Source for all figures: World Economic Forum (except where noted otherwise)

Source for all figures: World Economic Forum (except where noted otherwise)

When comparing the position of a particular risk in the quadrant with the previous year(s), one can highlight the change. This is similar to what we have done with highlighting position changes in Gartner’s Magic Quadrant on Business Intelligence. Applied to this risk quadrant the report includes a picture like this for each of the five risk categories:

EconomicRisksChange

This vector field shows at a glance how many and which risks have grown by how much. The fact that a majority of the 50 risks show sizable moves to the top right is of course a big concern. Note that the graphic does not show the entire square from 1 through 5, just a sub-section, essentially the top-right quadrant.

On a more methodical note, I am not sure whether surveys are a very reliable instrument in identifying the actual risks, probably more the perception of risks. It is quite possible that some unknown risks – such as the unprecedented terrorist attacks in the US on 9/11 – outweigh the ones covered here. That said, the wisdom of crowds tends to be a good instrument at identifying the perception of known risks.

Note the “Severe income disparity” risk near the top-right, related to the phenomenon of economic inequality we have looked at in various posts on this Blog (Inequality and the World Economy or Underestimating Wealth Inequality).

A tabular form of showing the top 5 risks over the last seven consecutive years is given as well: (Click on chart for full-resolution image)

Top5RisksChanges

This format provides a feel for the dominance of risk categories (frequency of colors, such as impact of blue = economic risks) and for year over year changes (little change 2012 to 2013). The 2011 column on likelihood marks a bit of an outlier with four of five risks being green (= environmental) after four years without any green risk in the Top 5. I suspect that this was the result of the broad global media coverage after the April 2011 earthquake off the coast of Japan, with the resulting tsunami inflicting massive damage and loss of lives as well as the Fukushima nuclear reactor catastrophe. Again, this reinforces my belief that we are looking at perception of risk rather than actual risk.

Another aggregate visualization of the risk landscape comes in the form of a matrix of heat-maps indicating the distribution of survey responses.

SurveyResponseDistribution

The darker the color of the tile, the more often that particular likelihood/impact combination was chosen in the survey. There is a clear positive correlation between likelihood and impact as perceived by the majority of the experts in the survey. From the report:

Still it is interesting to observe how for some risks, particularly technological risks such as critical systems failure, the answers are more distributed than for others – chronic fiscal imbalances are a good example. It appears that there is less agreement among experts over the former and stronger consensus over the latter.

The report includes many more variations on this theme, such as scatterplots of risk perception by year, gender, age, region of residence etc. Another line of analysis concerns the center of gravity, i.e. the degree of systemic connectivity between risks within each category, as well as the movement of those centers year over year.

Another set of interesting visualizations comes from the connections between risks. From the report:

Top5Connections

Top10ConnectedRisks

Finally, the survey asked respondents to choose pairs of risks which they think are strongly interconnected. They were asked to pick a minimum of three and maximum of ten such connections.

Putting together all chosen paired connections from all respondents leads to the network diagram presented in Figure 37 – the Risk Interconnection Map. The diagram is constructed so that more connected risks are closer to the centre, while weakly connected risks are further out. The strength of the line depends on how many people had selected that particular combination.

529 different connections were identified by survey respondents out of the theoretical maximum of 1,225 combinations possible. The top selected combinations are shown in Figure 38.

It is also interesting to see which are the most connected risks (see Figure 39) and where the five centres of gravity are located in the network (see Figure 40).

One such center of gravity graph (for geopolitical risks) is shown here:RiskInterconnections

The Risk Interconnection Map puts it all together:

RiskInterconnectionMap

Such fairly complex graphs are more intuitively understood in an interactive format. This is where the online Data Explorer comes in. It is a very powerful instrument to better understand the risk landscape, risk interconnections, risk rankings and national resilience analysis. There are panels to filter, the graphs respond to mouse-overs with more detail and there are ample details to explain the ideas behind the graphs.

DataExplorer

There are many more aspects to this report, including the appendices with survey results, national resilience rankings, three global risk scenarios, five X-factor risks, etc. For our purposes here suffice it to say that the use of advanced data visualizations together with online exploration of the data set is a welcome evolution of such public reports. A decade ago no amount of money could have bought the kind of interactive report and analysis tools which are now available for free. The clarity of the risk landscape picture that’s emerging is exciting, although the landscape itself is rather concerning.

 
1 Comment

Posted by on January 31, 2013 in Industrial, Socioeconomic

 

Tags: , , , , , , ,

2012 in review

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had 36,000 views in 2012. If each view were a film, this blog would power 8 Film Festivals

Click here to see the complete report.

 
Leave a comment

Posted by on December 31, 2012 in Recreational

 

Circos Data Visualization How-to Book

Earlier this year we have looked at a powerful data visualization tool called Circos developed by Martin Krzywinski from the British Columbia Genome Science Center. The previous post looked at an example of how this tool can be used to show complex connectivity pathways in the human neocortex, so-called Connectograms.

Circos Book Cover

The Circos tool can be used interactively on the above website. In that mode you upload jobs via tabular data- and configuration-files and have some limited control over the rendering of the resulting charts. For full expressive power and flexibility, Circos can also be downloaded freely and used on your computer for rendering with extensive customization control over the resulting charts.

I have been asked to review a new book titled “Circos Data Visualization How-to“, published by Packt Publishing here. It’s main goal is to guide through the above download + installation process and get you started with Circos charts and their modification. Here is a brief review of this book.

Although originally developed for visualizing genomic data, Circos has been applied to many other complex data visualization projects, incl. social sciences. One such study was done by Tom Schenk, who analyzed the relationships between college majors and the professions those graduates ended up in. It appears as if this work inspired the author to write this book to help others with using Circos.

I downloaded the book in Kindle format and read it on the Mac due to the color graphics and the much larger screen size. It’s well structured and around 70 pages in printed form. The book focuses first on the download and install part, then has a series of examples from first chart to more complex ones using customization such as colors, ribbons, heat maps or dynamic binding.

Flow Chart for creation of Circos charts

Flow Chart for creation of Circos charts

Circos is essentially a set of Perl modules combined with the GD graphics library.

The first part is on Installing Circos, with a chapter each on Windows 7 and on Linux or Mac OS. Working on MAC I went the latter route. I ended up right in the weeds and it took me about 4 hours to get everything installed and working. The description is derived from a Linux install and is generally somewhat terse. It assumes you have all prerequisite tools installed on your Mac or at least that you are savvy enough to figure out what’s missing and where to get it. I had to dust off some of my Unix skills and go hunting for solutions via Google to a list of install problems:

  • directory permissions (I needed to warp the exact instructions with sudo)
  • installing Xcode tools from Apple for my platform (make was not preinstalled)
  • understanding cause of error messages (Google searches, Google group on Circos)
  • locating and installing the GD graphics library (helpful installing-circos-on-os-x tips by Paulo Nuin)
  • version and location issues (many libraries are in ongoing development; some sources have moved)

Others may find this part a lot easier, but I would say there should be an extra chapter for the Mac with tips and explanations to some of these speed bumps. On the plus side, the Google group seems to be very active and I found frequent and recent answers by Circos author Martin Krzywinski.

The next part of the book is easy to understand. One creates a simple hair-to-eye color relationship diagram. Then configuration files are introduced to customize colors and chart appearance. All required data and configuration files are also contained in the companion download from the Packt Publishing book page.

Chart of relationship between hair and eye colors

Chart of relationship between hair and eye colors

The last part of the book goes into more advanced topics such as customizing labels, links and ribbons, formatting links with rules, reducing links through bundling, and adding data tracks as heat maps or histograms. This is the meat for those who intend to use Circos in more advanced ways. I did not spend a lot of time here, but found the examples to be useful.

Contributions by State and Political party during 2012 U.S. Presidential Elections

Contributions by State and Political party during 2012 U.S. Presidential Elections

This section ends abruptly. One gets the feel that there are other subtleties that could be explored and explained. A summary or outlook chapter would have been nice to wrap up the book and give perspective. For example, I would have liked to hear from the author how much time he spent with various features during the college major to professions project.

In summary: This book will get you going with Circos on your own machine. Installing can be a challenge on Mac, depending on how familiar you are with Unix and the open source tool stack. The examples for your first Circos charts are easy to follow and explain data and configuration files. The more advanced features are briefly touched upon, but require more experimentation and time to understand and appreciate.
Circos author Martin Krzywinski writes on his website: “To get your feet wet and hands dirty, download Circos and a read the tutorials, or dive into a full course on Circos.” The How-to book by Tom Schenk helps with this process, but you still need to come prepared. If you are a Unix power user this should feel familiar. If you are a Mac user who rarely ever opens a Terminal then you might be better off just using Circos via the tableviewer web interface.
Lastly, I would recommend buying the electronic version of this book, as you can cut & paste the code, leverage the companion code and documents. A printed version of this book would be of very limited use.

 
1 Comment

Posted by on December 6, 2012 in Education, Scientific

 

Tags: , , ,

2012 Election Result Maps

2012 Election Result Maps

The New York Times has covered the 2012 U.S. presidential election in great detail, including the much heralded fivethirtyeight Blog (after the 538 electoral votes) by forecaster Nate Silver. His poll-aggregation model has consistently produced the most accurate forecasts, and called 99 of 100 states correctly in both the 2008 and the 2012 elections.

A popular visualization is the map of the 50 states in colors red (Republican) and blue (Democrat) plus green (Independent). Since most states allocate all their electoral votes to the candidate with the most votes in that state, this state map seems the most important.

2012 Election Result By State (Source: NYTimes.com)

This map hardly changed from 2008, only Indiana and North Carolina changed color. Hence the electoral vote result in 2012 (332 Dem206 Rep)  is similar to that of 2008 (365 Dem173 Rep). The visual perception of this map, however, is that there is roughly the same amount of red and blue, with slightly more red than blue. This perception becomes even stronger when looking at the results by county.

2012 Election Results By County (Source: NYTimes.com)

Why is the outcome so strongly in favor of the blue (Democrat) when it looks like the majority of the area is red? The answer is found in very uneven population density of the 50 states. Although roughly the same size, California’s (slightly more blue) population density is about 40x higher than Montana’s (mostly red). On the extreme end of this scale, the most densely populated state New Jersey has about 1000x as many people living per square mile as the least densely populated state Alaska. Urban areas have a much higher density of voters than rural areas. The different demographics are such that urban areas tend to vote more blue (Democrat), rural areas tend to vote more red (Republican). The size of the colored area in the above chart would only be a good indicator if the population density was uniform. A great way to compensate visually for this difference can be seen in the third chart published by the NYTimes.

2012 Election Delta By County (Source: NYTimes.com)

Now the size of the colored circles is proportional to the number of surplus votes for that color in that county. The few blue circles around most major cities are larger and outweigh the many small red circles in rural areas – both optically intuitive and numerically in total. The original map is interactive, giving tooltips when you hover over the circles. For example, in just Los Angeles county there were about 1 million more blue (Democrat) votes than red (Republican).

2012 Election in Los Angeles County

This optical summation leads to intuitively correct results for the popular votes. The difference in popular vote was about 3.5 million more blue (Democrat) votes or roughly 3%. We see more blue in this delta circle diagram.

Of course, the president is not elected by the popular, but by the electoral votes per state. So no matter how big the Democrat advantage in California may be, there won’t be more than the 55 electoral votes for California. This winner-take-all dynamic of electoral votes by state leads to the outsized influence of swing states which are near the 50%-50% mark on the popular votes. A small lead in the popular vote can lead to a large gain in electoral votes. In extreme cases, a candidate can win the electoral vote and become president despite losing in the popular vote (as happened in 2000 and the very narrow win of Florida by George W. Bush).

Another variation on this theme of visually combining votes and population density information comes from Chris Howard. (This was referenced in an article on theatlanticcities.com by Emily Badger on the spatial divide of urban vs. rural voting preferences which has other election maps as well). The idea is to use shades of blue and red with population density increasing in darker shades of the color, used on a by county map.

2012 Election by county with shading by population density (Source: Chris Howard)

A final visualization comes from Nate Silver’s Blog post on November 8. While the % details of this at the time preliminary result may be slightly off (not all votes had been counted yet), the electoral vote counts remain valid.

2012 Election By State Cumulative (Source: Fivethirtyeight Blog)

It shows which swing state [electoral votes] put the blue ticket over the winning line (Colorado [9]) and which other swing states could have been lost without losing the presidency (Florida [29], Ohio [18], Virginia [13]). It also gives a crude, but somewhat telling indication of where you might want to live if you want to surround yourself by people with blue or red preferences.

 
Leave a comment

Posted by on November 15, 2012 in Socioeconomic

 

Superstorm Sandy – Visualizing Hurricanes

Superstorm Sandy – Visualizing Hurricanes

Time-lapse animation of Sandy Oct-28 from geostationary orbit, 1 frame per minute, 11 hours of daylight. Although “only” a category 1 hurricane, this superstorm has enormous size. Tropical storm force winds extend out over an area 900 miles in diameter.

Living in South Florida makes you alert to tropical storms during hurricane season from May to November. Exactly 7 years ago, at the end of October 2005, the eye of category 3 hurricane Wilma swept over our home in West Palm Beach in South Florida – the most powerful natural weather event I have ever witnessed. After avoiding a direct hit since then, we got a massive rain event from Isaac earlier this year, but again avoided a direct hit. To be sure, often the flooding associated with hurricanes is worse than the wind damage. For example, when hurricane Katrina hit New Orleans in August 2005, most of the devastation came from flooding after the levees were breached. But the first question is always where the storms will make landfall and how strong they are when they hit your area.

Tropical storms are being tracked and forecast in great detail, in particular by the National Hurricane Center of the National Weather Service. There are many great visualizations illustrating the path, windspeed, rainfall, extent of tropical storm force winds, etc. Due to the convenience for browsing, I have almost completely switched to following hurricane or weather updates from the iPad. (In this case I’m using the Hurr Tracker app from EZ Apps.)

Last week a new tropical storm emerged in the Carribean and was named ‘Sandy’. A few days ago with Sandy’s center over the Bahamas, the path looked like this:

Path of hurricane Sandy as of Oct-25 (Hurr Tracker iPad app)

Note the use of color for wind speed and the cone of uncertainty in the lower segment, as well as the rings around the center indicating the size of the area with storm-force winds.

Naturally curious whether South Florida was likely to get hit, another image gave us some relief:

5 Day tracking map for hurricane Sandy

Now a few days later, while we did get some strong northerly winds and pounding surf leading to beach erosion, Sandy was not a particularly disturbing event for South Florida. At the same time, however, Sandy is forecast to make landfall on the Jersey shore within about 24 hours during the night from Monday to Tuesday.

One interesting set of maps with a color code displaying the probability of an area experiencing winds of a certain speed, say at least tropical storm force winds (>= 39 mph). The following map was issued this afternoon and indicates the very large area (mostly offshore) with near 100% probability of exceeding tropical storm force winds in purple.

Tropical storm force wind speed probabilities for hurricane Sandy as of Oct-28

This indicates how large Sandy is – an area the size of Texas with tropical storm force winds! Meteorologists are concerned for the Northeast due to Sandy converging with two other weather events, a storm from the West and cold air coming down from the North. This is expected to intensify the weather system, similar to the Perfect Storm of 1991. Due to the timing around Halloween this is why Sandy was also called a ‘Frankenstorm’.

One of the most chilling pictures is this animated GIF from WeatherBELL. A story in the Atlantic earlier today writes this:

Dr. Ryan Maue, a meteorologist at WeatherBELL, put out this animated GIF of the storm’s approach yesterday. “This is unprecedented –absolutely stunning upper-level configuration pinwheeling #Sandy on-shore like ping-pong ball,” he tweeted. It shows how cold air to the north and west of the storm spin Sandy into the mid-atlantic coastline.

(Click the image if the animation doesn’t play in your browser.)

Animation of hurricane Sandy moving into the NorthEast (Source: WeatherBELL)

Understandably this forecast of superstorm Sandy has the authorities worried. The full moon tomorrow exacerbates the tides and New York City is expecting up to 11 ft storm surge. Cities across the Northeast are taking precautions as of this writing. For example, the New York City subway metro transit system is shutting down tonight and several hundred thousand people in low-lying coastal areas are under mandatory evacuation order. More than 5000 flights to the area on Monday have been cancelled. Take a look at the expected 5 day precipitation forecast in the Northeast. Some areas may get up to 10 inches of rain and/or snow!

5 day precipitation forecast with Sandy’s impact for the Northeast

The first priority is to use such visualizations to communicate the weather impact and allow people to take necessary precautions. One can use similar hurricane charts to visualize other uncertain events, such as the future outcomes of development projects. We will look at this in an upcoming post on this Blog.

 

Addendum 11/4/12: The NYTimes has provided some interactive graphics detailing the location and size of power outages caused by superstorm Sandy in the New York and New Jersey area. The New York City outages have been summarized in this chart, normalized to the percentage of all customers. As can be seen, the efforts to restore power over the first 6 days have been fairly successful, especially in Manhattan and Staten Island, less so in Westchester.

6 day tracking map of power outages caused by Sandy in New York City

 
Leave a comment

Posted by on October 28, 2012 in Recreational, Scientific

 

Tags: , ,

Trends in Health Habits across the United States

Trends in Health Habits across the United States

This week Scientific American published an interesting article about trends in health habits across the United States. The article includes both a large composite chart as well as a page with an interactive chart. Both are well done and a great example of using a visualization to help telling a story. I personally find the most useful part of the graphic to be the comparison column on the right with shades of color indicating degree of improvement (blue) or deterioration (red).

US health habits 1995 vs. 2010 (Source: Scientific American)

From the article:

Americans are imbibing alcohol and overeating more yet are smoking less (black lines in center graphs).

Some of the behaviors have patterns; others do not. Obesity is heaviest in the Southeast (2010 maps). Smoking is concentrated there as well. Excess drinking is high in the Northeast.

Comparing 2010 and 1995 figures provides the greatest insight into trends (maps, far right). Heavy drinking has worsened in 47 states, and obesity has expanded in every state. Tobacco use has declined in all states except Oklahoma and West Virginia. The “good” habit, exercise, is up in many places—even in the Southeast, where it has lagged.

A more detailed visual analysis is possible using the interactive version of these graphs on the related subpage Bad Health Habits are on the rise. Here one can compare up to three arbitrary states against top, median, and bottom performing states by health habit.

The following examples show tobacco use, exercise and obesity by state with line charts for the three arbitrarily selected states of Florida, California and Hawaii.

Tobacco Trend By State

Exercise Trend By State

Obesity Trend By State

Leading the exercise statistics are citizens in states offering attractive outdoor sports opportunities, like Oregon or Hawaii. Such correlation seems intuitive in both causal directions: People interested in exercise tend to move to those states with the most attractive outdoor sports. And people living in those states may end up exercising more due to the opportunity.

When looking at the average trend line, exercise seems to have leveled off after a bump in the early 2000’s, whereas the decline in smoking over the last decade continues unabated.

15 years is half a generation. During that time, Americans have in almost every state smoked less, exercised more in many states, but obesity is sharply on the rise in every state! From a health and policy debate the latter seems to be the most alarming trend. Most people want the next generation to be better off than the previous one. This has to some extent been true with wealth, at least until the great recession of 2008. But these data show that at population levels, more wealth is not necessarily more health.

 
Leave a comment

Posted by on October 19, 2012 in Medical

 
 
%d bloggers like this: