Feb 1, 2008

cities, maps #1: data is only infrequently orthogonal

those of us who claim to be interdisciplinary recognise that there's very little information out there that's truly orthogonal to everything else -- a fancy way of saying data usually makes more sense in the context of other data. putting two or more usually disjoint datasets together in the right space allows stories to be told more powerfully and conclusions to be drawn more accurately.

the difficulty then usually lies in finding the space in which two datasets interact to produce more information than the sum of parts. imagine, for instance, overlaying malaria interventions, vector populations, average monthly rainfall, prevailing winds, and malaria prevalence onto a map of equatorial africa. connecting preventive techniques with success indicators and other data connected to the phenomenon helps place the data in context and helps researchers develop good hypotheses to test. (amazingly, no one has done this yet that i can find online. once corrie sends me a few datasets, i'll make and post something.)

since i work on maps, one of the things that falls to me do to regularly is tell people that their lives and work could be easily improved by

  • publishing their data in an openly-accessible format
  • allowing that data to easily interact with other data.
  • putting that data in geographic context
data interacts with other data in all sorts of places but the physical world is probably one of the most useful general spaces for commonplace data to be combined. this is particularly true for cities, where their organization is fundamentally geographic in nature: once you've gone past the city limits of newark, you're not in newark any longer. because the information that city governments generally collect is about a collection of physical locations (education or average house price statistics by neighbourhood, coverage areas for hospitals, etc), physical location can connect multiple apparently unrelated observations.

the geographic information systems (GIS) community long ago cottoned on to the idea that each one of these types of information (house prices, school information, public services, etc) constitutes a dataset for a given area, and that these datasets could be overlaid on each other to produce precisely the kind of contextualization to make the datasets more valuable viewed together than in isolation. imagine searching for a house and being able to say "i only want to find houses that match my price range that are within a mile of a grade school and a half-mile of a grocery store." that's a combination of 3 sets of data: prices of houses currently for sale, school information, and store information. if you're looking to move to toronto, realosophy lets you do this easily. most other places, it's sort of a drag.

many cities employ professional geographic information systems (GIS) specialists to help them publish all sorts of data that might be interesting to the public, as a search on google for "city GIS" shows. this is a start, but most of the data continues to be published in formats that are only viewable on proprietary (and expensive) software like ArcGIS or through a clunky web interface or application (here are the solutions denver and norfolk have come up with. points for effort, but would anyone voluntarily subject themselves to using these?). most cities also require that this information be purchased in order to be manipulated, even though the real value of a diversity of datasets comes from the ability to mash them together and see what results. if urban information is difficult to find and use, people don't use it enough.

portland's mapping department is a shining beacon: they publish urban information, for free, in a format that anyone with google earth can open (its called KML, for Keyhole Markup Language. think HTML for geographic browsers like google earth and google maps.) once opened in google earth, users can place multiple layers of data on top of each other easily.

what happens when people publish their data and make it easy to mash it together? really nice things. for example, everyblock's made it possible for people in covered cities (new york, chicago, and san francisco) to see news in their neighbourhoods and filter it if so desired. see, for example, a news map for albany park in chicago. they made a nice user interface, but the data's all coming from data feeds from governments, news outlets, and the like.

so, given that data frequently makes other data useful, what can city or urban organizations do to improve the lives of the people whom they service?
step 1a: publish lots of location-specific data
step 1b: publish it in an easily and freely viewable format
step 1c: allow anyone with the inclination to use and transform your data
step 1d: use (or let people use) someone else's mapping service as a canvas for the data

next up: public transit information and some thoughts on why it mostly sucks.

