RSS

Category Archives: Education

Visualizing Voting Preferences for World Values

The other day I listened to a presentation by Melinda Gates prepared for the United Nations to deliver an update about progress towards the Millennium Development Goals (MDG). The eight goals of the MDG had been embraced by the UN back in 2005 for the time target of 2015. So it is reasonable to see whether the world is on track to reach each of these eight goals. To summarize, from the MDG Wikipedia page:

  1. Eradicating extreme poverty and hunger
  2. Achieving universal primary education
  3. Promoting gender equality and empowering women
  4. Reducing child mortality rates
  5. Improving maternal health
  6. Combating HIV/AIDS, malaria, and other diseases
  7. Ensuring environmental sustainability
  8. Developing a global partnership for development

A good listing of reports, statistics and updates can be found on the UN website here.

Sample Vote for 6 of 16 MDG choices

Sample Vote for 6 of 16 MDG choices

At the end of Melinda’s presentation is a link to a UN global survey on the MDG goals after 2015. I took this survey and found the visualization of voting results quite interesting. First, one is asked to select six out of a list of sixteen (6 of 16) goals which one thinks have the highest impact for a future better world. (The survey methodology is described in more detail here.) Here is a sample vote:

A nice touch is that for each of the sixteen goals there is a different color and when you check that goal, one of the sixteen areas on the stylized globe is filled with that color. Personal data such as name is optional, but some demographic information is required, including age, gender, educational level and country. Next, one can look at a summary of all currently tallied votes and compare them interactively to ones own vote (checkmarks on the right).

WorldVoteOverview

It is perhaps not surprising that I voted very similar to others in similar demographic cohorts.

  • Country: I picked five of the Top five goals like all other voters living in the US. I included ‘Political freedoms’ in my top six, which in the US only ranks 11th.
  • Age: I shared five of the Top six goals with people in my age group (world-wide). The one I did not check was ranked 4th (Better job opportunities). When you mouse over one of the goals, the display changes to highlight this goal in all columns:
Interactive Vote Analysis with highlighted goal

Interactive Vote Analysis with highlighted goal

  • Gender: Here I picked four of the Top five goals (did not include the ‘Better job opportunities’).
  • Education: I voted very similar to people with very high HDI (Human Development Index, a visualization of which we covered in a previous post) with five of the Top six.

From the above, it seems somewhat surprising that voters in the US did not ascribe a higher value to ‘Better job opportunities’, given how much economic values and topics like unemployment seem to dominate the media. That said, these votes should be a reflection about which goals are most valuable for making the world a better place – not just your own home country. Worldwide it seems that other, more fundamental goals are judged by voters in the US to be more important than ‘Better Job opportunities’.

Another chart on the results page is showing a heat map of the world countries based on how many votes have been submitted. I thought it was interesting that Ghana had submitted about twice as many votes as all of the US, and Nigeria about 7x as many. The country with most voters at this time is India, but not far ahead of Nigeria.

CountryTotals

A fairly useless dynamic animation in this map is a map pin drop of four people who voted similarly to me. I found this too anecdotal to be of any real interest and downright annoying that I couldn’t turn it off. and just focus on the vote heat map. For example, the total number of votes should be displayed in the Legend. I vaguely remember that it was several hundred thousand from 194 countries prior to starting the survey, but couldn’t get that data to display again without clicking on the Vote Again:

MyWorldVotes

 
Leave a comment

Posted by on September 21, 2013 in Education, Medical, Scientific

 

Circos Data Visualization How-to Book

Earlier this year we have looked at a powerful data visualization tool called Circos developed by Martin Krzywinski from the British Columbia Genome Science Center. The previous post looked at an example of how this tool can be used to show complex connectivity pathways in the human neocortex, so-called Connectograms.

Circos Book Cover

The Circos tool can be used interactively on the above website. In that mode you upload jobs via tabular data- and configuration-files and have some limited control over the rendering of the resulting charts. For full expressive power and flexibility, Circos can also be downloaded freely and used on your computer for rendering with extensive customization control over the resulting charts.

I have been asked to review a new book titled “Circos Data Visualization How-to“, published by Packt Publishing here. It’s main goal is to guide through the above download + installation process and get you started with Circos charts and their modification. Here is a brief review of this book.

Although originally developed for visualizing genomic data, Circos has been applied to many other complex data visualization projects, incl. social sciences. One such study was done by Tom Schenk, who analyzed the relationships between college majors and the professions those graduates ended up in. It appears as if this work inspired the author to write this book to help others with using Circos.

I downloaded the book in Kindle format and read it on the Mac due to the color graphics and the much larger screen size. It’s well structured and around 70 pages in printed form. The book focuses first on the download and install part, then has a series of examples from first chart to more complex ones using customization such as colors, ribbons, heat maps or dynamic binding.

Flow Chart for creation of Circos charts

Flow Chart for creation of Circos charts

Circos is essentially a set of Perl modules combined with the GD graphics library.

The first part is on Installing Circos, with a chapter each on Windows 7 and on Linux or Mac OS. Working on MAC I went the latter route. I ended up right in the weeds and it took me about 4 hours to get everything installed and working. The description is derived from a Linux install and is generally somewhat terse. It assumes you have all prerequisite tools installed on your Mac or at least that you are savvy enough to figure out what’s missing and where to get it. I had to dust off some of my Unix skills and go hunting for solutions via Google to a list of install problems:

  • directory permissions (I needed to warp the exact instructions with sudo)
  • installing Xcode tools from Apple for my platform (make was not preinstalled)
  • understanding cause of error messages (Google searches, Google group on Circos)
  • locating and installing the GD graphics library (helpful installing-circos-on-os-x tips by Paulo Nuin)
  • version and location issues (many libraries are in ongoing development; some sources have moved)

Others may find this part a lot easier, but I would say there should be an extra chapter for the Mac with tips and explanations to some of these speed bumps. On the plus side, the Google group seems to be very active and I found frequent and recent answers by Circos author Martin Krzywinski.

The next part of the book is easy to understand. One creates a simple hair-to-eye color relationship diagram. Then configuration files are introduced to customize colors and chart appearance. All required data and configuration files are also contained in the companion download from the Packt Publishing book page.

Chart of relationship between hair and eye colors

Chart of relationship between hair and eye colors

The last part of the book goes into more advanced topics such as customizing labels, links and ribbons, formatting links with rules, reducing links through bundling, and adding data tracks as heat maps or histograms. This is the meat for those who intend to use Circos in more advanced ways. I did not spend a lot of time here, but found the examples to be useful.

Contributions by State and Political party during 2012 U.S. Presidential Elections

Contributions by State and Political party during 2012 U.S. Presidential Elections

This section ends abruptly. One gets the feel that there are other subtleties that could be explored and explained. A summary or outlook chapter would have been nice to wrap up the book and give perspective. For example, I would have liked to hear from the author how much time he spent with various features during the college major to professions project.

In summary: This book will get you going with Circos on your own machine. Installing can be a challenge on Mac, depending on how familiar you are with Unix and the open source tool stack. The examples for your first Circos charts are easy to follow and explain data and configuration files. The more advanced features are briefly touched upon, but require more experimentation and time to understand and appreciate.
Circos author Martin Krzywinski writes on his website: “To get your feet wet and hands dirty, download Circos and a read the tutorials, or dive into a full course on Circos.” The How-to book by Tom Schenk helps with this process, but you still need to come prepared. If you are a Unix power user this should feel familiar. If you are a Mac user who rarely ever opens a Terminal then you might be better off just using Circos via the tableviewer web interface.
Lastly, I would recommend buying the electronic version of this book, as you can cut & paste the code, leverage the companion code and documents. A printed version of this book would be of very limited use.

 
1 Comment

Posted by on December 6, 2012 in Education, Scientific

 

Tags: , , ,

London Tube Map and Graph Visualizations

London Tube Map and Graph Visualizations

The previous post on Tube Maps has quickly risen in the view stats into the Top 3 posts. Perhaps it’s due to many people searching Google for images of the original London tube map in the context of the upcoming Olympic Games.

I recently reviewed some of the classes in the free Wolfram’s Data Science course. If you are interested in Data Science, this is excellent material. And if you are using Mathematica, you can download the underlying code and play with the materials.

It just so happens that in the notebook for the Graphs and Networks: Concepts and Applications class there is a graph object for the London subway.

Mathematica Graph object for the London subway

As previously demonstrated in our post on world country neighborhood relationships, Mathematica’s graph objects are fully integrated into the language and there are powerful visualization and analysis functions.

For example, this graph has 353 vertices (stations) and 409 edges (subway connections). This one line of code  highlights all stations no more than 5 stations away from the Waterloo station:

HighlightGraph[london, 
  NeighborhoodGraph[london, "Waterloo", 5]]

Neighborhood Graph 5 around Waterloo

Since HighlightGraph and NeighborhoodGraph are built-in functions, this can be done in one line of code.

Export["london.gif",
  Table[HighlightGraph[london, 
    NeighborhoodGraph[london, "King's Cross St. Pancras", k]],
   {k, 0, 20, 1}]]

creates this animated GIF file:

Paths spreading out from the center

Shortest paths can easily be determined and visualized:

HighlightGraph[london, 
  FindShortestPath[london, "Amersham", "Woolwich Arsenal"]]

A shortest path example

There are many other graph functions such as:

GraphDiameter[london]   39
GraphRadius[london]     20
GraphCenter[london]     "King's Cross St. Pancras"
GraphPeriphery[london]  {"Watford Junction", "Woodford"}

In other words, the King’s Cross St. Pancras station is at the center, with radius up to 20 out into the periphery, and 39 the shortest path between Watford Junction and Woodford, the longest shortest path in the network.

Let’s look at distances within the graph. The built-in function GraphDistanceMatrix calculates all pairwise distances between any two stations:

mat = GraphDistanceMatrix[london]; MatrixPlot[mat]

Graph Distance Matrix Plot

For the 353*353 = 124,609 pairs of stations, let’s plot a histogram of the pairwise distances:

Histogram[Flatten[mat]]

Graph Distance Histogram

The average distance between two stations in the London subway system is about 14.

So far, very little coding has been required as we have used built-in functions. Of course, the set of functions can be easily extended. One interesting aspect is the notion of centrality or distance of a node from the center of the graph. This is expressed in the built-in function ClosenessCentrality

cc = ClosenessCentrality[london];
HighlightCentrality[g_, cc_] := 
   HighlightGraph[g, 
    Table[Style[VertexList[g][[i]], 
      ColorData["TemperatureMap"][cc[[i]]/Max[cc]]], 
        {i, VertexCount[g]}]];
HighlightCentrality[london, cc]

Color coded Centrality Map

Another interesting notion is that of BetweennessCentrality, which is a measure indicating how often a particular node lies on the shortest paths between all node-pairs. The following nifty little snippet of code identifies the 10 most traversed stations – along the shortest paths – of the London underground:

HighlightGraph[london,
 First /@ SortBy[
 Thread[VertexList[london] -> BetweennessCentrality[london]],
 Last][[-10 ;;]]]

10 most traversed stations

I have often felt that progress in computer science and in languages comes from raising the level of abstraction. It’s amazing how much analysis and visualization one can do in Mathematica with very little coding due to the large number of powerful, built-in functions. The reference documentation of these functions often has many useful examples (and is also available for free on the web).
When I graduated from college 20 years ago we didn’t have such powerful language platforms. Implementing a good algorithm for finding shortest paths is a good exercise for a college-level computer science course. And even when such pre-built functions exist, it may still be instructive to figure out how to implement such algorithms.
As manager I have always encouraged my software engineers to spend a certain fraction of their time searching for built-in functions or otherwise pre-existing code to speed up project implementation. Bill Gates has been quoted to have said:

“There is only one trick in software: Use a piece of code that has already been written.”

With software engineers, it is well known that productivity often varies not just by small factors, but by orders of magnitude. A handful of talented and motivated engineers with the right tools can outperform staffs of hundreds at large companies. I believe the increasing levels of abstraction and computational power of platforms such as Mathematica further exacerbates this trend and the resulting inequality in productivity.

 
1 Comment

Posted by on July 11, 2012 in Education, Recreational

 

Tags: , , , , , , ,

Khan Academy and Interactive Content in Digital Education

Khan Academy and Interactive Content in Digital Education

Online education has received a lot of attention lately. Many factors have contributed to the rise in online educational content, including higher bandwidth, free video hosting (YouTube), mobile devices, growing and global audiences, improved customization mechanisms (scoring, similarity recommendations), gamification (earning badges, friendly competitions, etc.) and others. Interactivity is an important ingredient for any form of learning.

“I tell you and you forget. I show you and you remember. I involve you and you understand.” [Confucius, 500 BC]

During learning a student forms a mental model of the concepts. Understanding a concept means to have a model detailed enough to be able to answer questions, solve problems, predict a system’s behavior. The power of interactive graphics and models comes from the ability of the student to “ask questions” by modifying parameters and receive specific answers to help refine or correct the evolving mental model.

Digital solutions are bringing innovations to many of these areas. One of the most innovative approaches is the Khan Academy. What started as an experiment just a few years ago by way of recording short, narrated video lessons and sharing them via YouTube with family and friends has grown into a broad-based approach to revolutionize learning. Over the years, founder Sal Khan has developed a large collection of more than 3000 such videos. Backed by prominent endorsers such as Bill Gates the not-for-profit Khan Academy has developed a web-based infrastructure which can handle a large number of users and collect and display valuable statistics for students and teachers. The Khan Academy has received lots of media attention as well, with coverage on CBS 60 minutes, a TED talk and more. The videos have by now been seen more than 130 million times!

Another high profile experiment has been launched in the fall of 2011 at Stanford University, where three Computer Science courses have been made available online for free, including the Introductory Course to Artificial Intelligence by Sebastian Thrun and Peter Norvig. In a physical classroom a professor can teach several dozens to a few hundred students at most. In a virtual classroom these limits are obviously far higher. Exceeding all expectations, some 160.000 students in 190 countries had signed up for this first course!

The basic pillar of online learning continues to be the recorded video of a course unit. The student can watch the video whenever, wherever to learn at his own pace and schedule. One can pause, rewind, replay however often as needed to better understand the content. Of course, if that was the only way to interact, it would be fairly rudimentary. Unlike in a real classroom or with a personal tutor, one can’t ask the teacher in the video a question and receive an answer. One can’t try out variations of a model and see its impact.

Sample Khan Academy Profile Graph

That’s where the tests come in. Testing a concept’s understanding usually involves a series of sample questions or problems which can only be solved repeatedly and reliably with such an understanding. Both Khan Academy and the Stanford AI course have test examples, exams and grading mechanisms to determine whether a student has likely understood a concept. In the Khan Academy, testable concepts revolve around mathematics, where an unlimited number of specific instances can be generated for test purposes. The answers to test questions are recorded and can be plotted.

Khan Academy Knowledge Map of testable concepts

The latter form of interactivity may be among the most useful. The system records how often you take tests, how long it takes you to answer, how often you get the answers right, etc. All this can then be plotted in some sort of dashboard. Both for yourself as individual student, or for an entire class if you are a coach. This shows at a glance where you or your students are struggling and how far along they have progressed.

Concepts are related to one another in a taxonomy so that one gets guidance as to which concepts to master first before building higher level concepts on top of the simpler ones. Statistical models can suggest the most plausible hints of what to try next based on prior observations.

Founder Sal Khan deserves a lot of respect for having almost single-handedly having recorded some 3000+ video lessons and changing the world of online education so much for the better with his not-for-profit organization. From an interactive content perspective, imagine if at the end of some Khan video lessons you could download an underlying model, play with the parameters and maybe even extend the model definition? I know this may not be feasible in all taught domains, but it seems as if there are many areas ripe for such additional interactivity. We’ll look at one in the next post.

 
1 Comment

Posted by on March 26, 2012 in Education

 

Tags: , , ,