Top data visualisation and management tools
Your best resources for plowing through mountains of data
By Sharon Machlis | Computerworld US | Published: 12:00, 25 April 2011
What's cool: One of the best things about Protovis is how well it's documented, with plenty of examples featuring visualization and sample code. There are also a large number of sample visualisation types available, including maps and some statistical analyses. This is a robust tool, capable of building graphics like this colour-coded US map with timeline slider.
Skill level: Expert.
GIS/mapping on the desktop
There's a wide range of business uses for geographic information systems (GIS), ranging from oil exploration to choosing sites for new retail stores. Or, as The Miami Herald did for its Pulitzer Prize-winning coverage of Hurricane Andrew, you can compare maximum wind speeds with damage reports and building information (and perhaps discover, for example, that the worst damage didn't happen in the areas suffering the heaviest winds, but in areas with a lot of new, shoddy construction).
What it does: This is full-fledged GIS software, designed for creating maps that offer sophisticated, detailed data-based analysis of a geographic regions.
The best-known desktop GIS software is probably Esri's ArcView, a robust, well supported application that costs quite a bit of money. The open source QGIS is an alternative to ArcView.
As OpenOffice is to Microsoft Office, QGIS is to ArcView. ArcView enthusiasts argue that Esri's offering is a couple of years ahead of open source alternatives, has a better-developed interface, enjoys commercial support and is better suited for print output. But QGIS users say the open source alternative is an excellent program that does a great deal of useful GIS work, and may even be better than ArcView when it comes to generating maps for the web, thanks to a plug-in dedicated to generating HTML image maps.
What's cool: QGIS has an enormous amount of GIS functionality, including the ability to create maps, overlay various types of data, do spatial analysis, publish to the web and more. It can also be enhanced with plug-ins that add support for numerous undertakings, including geocoding, managing underlying table data, exporting to MySQL and generating HTML image maps.
Drawbacks: As with any sophisticated GIS application, learning to use this software entails a serious commitment of time and training.
Even in hour-long hands-on sessions with first ArcView and then QGIS, I noticed things that were easier to do in the commercial option. For example, ArcView had a one-click "normalise" function to immediately calculate, say, the percentage of people 65 and over versus the total population from a data table with both columns. In QGIS, I needed to pull up a "field calculator" and create a new column with the formula to do that calculation myself.
Runs on: Linux, Unix, Mac OS X, Windows. This is one case where installation is more complicated on OS X, since it requires manual installation of several dependencies. There's a one-click installer for Windows.
Skill level: Intermediate to expert.
Note: If you're interested in GIS and want to consider other free software options, download this PDF listing of Open Source/Non-Commercial GIS Products. And if you're looking for a free open source desktop GIS program that might be fairly easy to use, Jacob Fenton, director of computer-assisted reporting at American University's Investigative Reporting Workshop, recommends taking a look at the System for Automated Geoscientific Analyses (SAGA) site.
Most of us are familiar with mapping tools from major companies like Google (which has a number of third-party front ends such as Map A List, an add-on that adds info to a Google Map from a spreadsheet). There's also Yahoo Maps Web Services and Bing Maps, all with APIs. But there are numerous other options from smaller organisations or lone open source enthusiasts that were designed from the ground up to map geographic data.
What it does: This user-friendly website generates colour-coded maps. The colours change depending on underlying info such as population change or average income. It can also place markers on a map, varying the size of the markers based on a data table.
In addition to providing the web-based service, author Pete Warden has also packaged OpenHeatMap as a jQuery plugin for those who don't want to rely on hosting at OpenHeatMap.com. However, not all data formats work correctly when hosted locally. "My recommended way is to embed the maps from the site," Warden wrote via Skype chat.
What's cool: It is astonishingly easy to create a colour-coded map from many types of location data, even IP addresses (just use the column header ip_address).
It took me about 60 seconds to create a basic map from a spreadsheet of magnitude 7 or higher earthquakes around the world since January 1, 2000, then a couple of minutes more to customise the rollover box to display both date and magnitude. (You can see a larger version on OpenHeatMap.com.)
Marker transparency, size and colour are extremely simple to customise. You can also upload your own marker image and customise what appears in the tooltips rollover by adding a tooltip column to your data source.
OpenHeatMap automatically figures out and maps locations based on a wide range of place definitions, relying on how the location columns are named: "address," "country," "fips_code" (used by the US Census Bureau), "zip_code_area" (for five-digit ZIP codes), "lat" (latitude), "lon" (longitude) and so on.
This is a well thought out interface from a onetime Apple engineer. Warden said he worked on several software projects at Apple, including Final Cut Studio.
Drawbacks: There's no way to delete data once it's been uploaded (you can get around this by using a Google Spreadsheet as a data source), and editing time is limited to as long as your browser is open and you haven't started a new map. Embedded OpenHeatMap.com maps may be slow to load.
The documentation doesn't make it clear whether you can set where the map is centered or what the default zoom level should be. Warden told me that the system remembers where you last positioned and zoomed the map before saving. And this feature still can occasionally be buggy, although Warden is responsive to bug reports.
Skill level: Beginner.
Runs on: Web browsers enabled for Flash or HTML 5 Canvas.
Learn more: Its title notwithstanding, the four minute video "How OpenHeatMap Can Help Journalists" offers a clear explanation for anyone interested in using the service. You can also view samples on the OpenHeatMap Gallery and check out this Guardian interactive map of where Facebook is used.
Drawbacks: OpenLayers is not yet as developed or as easy to use as, say, Google Maps. The project page notes that it is "still undergoing rapid development."
Skill level: Expert.
Runs on: Any web browser.
What it does: OpenStreetMap is somewhat like the Wikipedia of the mapping world, with various features such as roads and buildings contributed by users worldwide.
What's cool: The main attraction of OpenStreetMap is its community nature, which has led to a number of interesting uses. For example, it is compatible with the Ushahidi mobile platform used to crowdsource information after the earthquakes in Haiti and Japan. While Ushahidi can use several different providers for the base map layer, including Google and Yahoo, some project creators feel most comfortable sticking with an open source option.
Drawbacks: As with any project accepting public input, there can be issues with contributors' accuracy at times (such as the helicopter landing pad someone once placed in my neighborhood, it's actually quite a few miles away). Although, to be fair, I've encountered more than one business listing on Google Maps that was woefully out of date. In addition, the general look and feel of the maps isn't quite as polished as commercial alternatives.
Skill level: Advanced beginner to intermediate.
Runs on: Any web browser.
Learn more: See the Quick Tutorial on the OpenLayers site.
Temporal data analysis
If time is an important component of your data, traditional timeline visualisations may show patterns, but they don't allow for sophisticated analysis or a great deal of interaction. That's where this project comes in.
What it does: This desktop software is for analysing data points that involve a time component. In a demo I wrote about last summer, creators Fernanda Viégas and Martin Wattenberg, the pair behind the Many Eyes project who are now working at Google, showed how TimeFlow can generate visual timelines from text files with entries colour- and size-coded for easy pattern spotting. It also allows the information to be sorted and filtered, and it gives some statistical summaries of the data.
What's cool: TimeFlow makes it incredibly easy to interact with data in various ways, such as switching views or filtering by criteria such as date ranges or earthquakes of magnitude 8 or more. The timeline view offers a slider so you can zero in on a time period. While many applications can plot bar graphs, fewer also offer calendar views. And unlike web-based Google Fusion Tables, TimeFlow is a desktop application that makes it quick and painless to edit individual entries.
Drawbacks: This is an alpha release designed to help individual reporters doing investigative work. There are no facilities for publishing or sharing results other than taking a screen snapshot, and additional development appears unlikely in the near future.
Skill level: Beginner.
Runs on: Desktop systems running Java 1.6, including Windows and Mac OS X.
Learn more: Check out Top tips.
Note: If you're looking to publish visualised timelines, better options include Google Fusion Tables, VIDI or the SIMILE Timeline widget.
Some data visualisation geeks think word clouds are either not very serious or not very original. You can think of them as the tiramisu of visualisations, once trendy, now overused. But I still enjoy these graphics that display each word from a text file once, with the size of the words varying depending on how often each one appears in the source.
What it does: Several tools mentioned previously can create word clouds, including Many Eyes and the Google Visualization API, as well as the website Wordle (which is a handy tool for making word clouds from websites instead of text files). But if you're looking for easy desktop software dedicated to the task, IBM's free Word-Cloud desktop application fits the bill.
What's cool: This is a quick, fun and easy way to find frequency of words in text.
Drawbacks: Because it's trying to ignore words such as "a" and "the," the basic configuration can miss some important terms. In my tests, it didn't know the difference between "it" and "IT," and completely missed "AT&T."
Skill level: Advanced beginner. This app runs on the command line, so users should have ability to find file paths and plug them into a sample command.
Runs on: Windows, Mac OS X and Linux running Java.
Learn more: Check the examples that come with the download.
Social and other network analysis
These tools use a pre-Facebook/Twitter definition of "social network analysis" (SNA), referring to the discipline of finding connections between people based on various data sets. Investigative journalists have used such tools to, for example, find links between people who are involved in development projects or who are members of various boards of directors.
An understanding of statistical theories of network node analysis is necessary in order to use this category of software. Since I've only had a very basic introduction to that discipline, this is one category of tools I did not test hands-on. But if you're seeking software to do such analysis, one of these might meet your needs.
What it does: Billed as a Photoshop for data, this open source beta project is designed for visualising statistical information, including relationships within networks of up to 50,000 nodes and half a million edges (connections or relationships) as well as network analyses of factors such as "betweenness," closeness and clustering coefficient.
Runs on: Windows, Linux, Mac OS X running Java 1.6.
Learn more: Try this Quick Start tutorial (PDF).
What it does: This Excel plugin displays network graphs from a given list of connections, helping you analyse and see patterns and relationships in the data.
NodeXL merges the older and current definitions of SNA. It's "optimised for analysing online social media, it includes built-in connections to query the APIs of Twitter, Flickr and YouTube, allowing you to draw networks of users and their activity," according to Peter Aldhous, San Francisco bureau chief for New Scientist magazine.
It also handles email and conventional network analysis files (including data created by the popular, but not free, analysis tool UCINET).
Runs on: Excel 2007 and 2010 on Windows.
Learn more: Download this detailed free NodeXL tutorial (PDF) or these basic step-by-step instructions on analysing your own Facebook social network (PDF). One Facebook app for downloading your own friend information for use in NodeXL is Name Gen Web.