I work for a progressive, well-intentioned city, and I have become convinced that analysis and visualization of data can improve my city’s outcomes. But I immediately ran into a problem in the effort. Sure, we’ve been working to learn data analysis skills. We have sophisticated tech tools to analyze and visualize data. Our leadership is supportive.
But we weren’t sure where to find the data. We had a sort of blank page problem, a where-to-start uncertainty.
My first assumption was that the starting point would be data generation – that is, I assumed that useful data didn’t currently exist and that we needed to create it. Of course, this is a specific example of a general assumption: In organizations, if we don’t see something being done, we assume that it’s not being done.
Which means I started by meeting with department heads and telling them how exciting it would be to start tracking and measuring and comparing and benchmarking. Fun! It’s not necessary to tell the painfully long version of this story; suffice it to say that months later, after various forms of resistance, we still didn’t have big, useful, clean datasets to analyze and visualize.
So we rethought the problem and realized that our city was already creating a vast amount of potentially useful data already. For example:
- Existing databases. Most of our city departments use some form of software. The finance department uses finance software, the utilities department uses billing software, the public works department uses asset and work order management software, and so on. Each software program, in turn, manages its record-keeping processes through some sort of underlying database structure. And it turns out that most data analysis and visualization software can connect directly to virtually any structured database. Not to say that this is an easy process. Our finance database, for example, has some 640 tables with galaxies of undecipherable column headers. But with time and effort, we are beginning to build dashboards that report out real-time data from the underlying databases. We have also (thanks to Karl) put together a data dictionary to track and organize these existing databases.
- Existing reporting. Beyond the record-keeping that is being routinely performed by software suites, many department heads already keep some form of informal reporting systems. In my city, the Building Director keeps a spreadsheet, updated monthly, that records inspections, failures, re-inspections, permits, and so on. Historically that data stayed in the spreadsheet for his own use. But it’s easy enough to grab that data and include it in other analytical functions. It might be particularly useful to combine that data with records from other departments.
In addition, we realized that there were other sources of data that don’t impose additional workloads on department heads.
- Outside organizations. Federal, state, and regional level governments, not to mention NGOs, create data that are potentially useful to our city. There are obvious examples, like the US Census Bureau. Less obvious examples may exist as well; for example, our local Council of Governments provides data-gathering and research services to its member governments.
- Traditionally, surveys were a costly and statistically suspect option. Now, with the availability of efficiency and low-friction electronic survey options, our city can gather potentially useful data in a very short time horizon and for a very low cost.
- Trained observers. We realized that we had the ability to task existing employees (or, in appropriate cases, outside observers) to gather and report data from the field. For example, in my city the planning department conducted a vacant property survey by driving all surveyed neighborhoods and identifying properties that appeared to be vacant. The resulting survey wasn’t definitive (i.e., properties may have been missed or misidentified), but it was an extremely helpful baseline. Low-cost geotagging options (including the humble smart phone) also allow instant tagging of target conditions. A city in South Carolina, for example, piloted the development of a leaf and limb pick-up tagging program. As sanitation crews ran their city-wide trash routes, they tagged brush piles for pickup. When the leaf and limb crews ran later the same day, they would route to tagged locations only.
The danger of starting with existing or relatively low-friction data sources, of course, is that it becomes tempting to tailor the questions to the existing data. The first step, then, should alwaysbe to ask what it is you want to know. Having said that, you might be surprised (as we were) how much responsive data already exists or is relatively easy to obtain.