M. Edward (Ed) Borasky is, in order of appearance, a boy genius, computer programmer, applied mathematician, folk singer, actor, professional graduate student, armchair astronaut, algorithmic composer, supercomputer programmer, performance engineer, Linux geek, and social media inactivist. He currently develops virtual appliances for social media analytics and data journalism, and is the publisher of the Borasky Research Journal. His hobby is collecting hobbies.
First of all, your competitors are doing it. In the USA, for example, MSNBC’s EveryBlock service literally provides information on “every block” in 16 cities. As you can see from Marshall Kirkpatrick’s “The Day EveryBlock
Came to Town”, EveryBlock generates maps and other visualisations directly from publicly-available government data, such as crime reports, restaurant health inspection records and emergency response service RSS feeds.
Second, your readers want to see this kind of information. They especially want to see this kind of information on mobile devices, such as smart phones, tablets and notebooks. Services like Twitter have turned us into a world of “news junkies.” Sure, we want to keep up with the world, but we also want to know what’s going on within walking distance of the coffee shop where we are sitting. Even at home, people love to see what’s happening around them.
Finally, static and interactive data visualizations and infographics often are the best way to tell a story. While some people may have learned to extract meaning from pages of tables, most of us can’t. But almost anyone can look at a crime map, or move sliders on a local government’s economic projection and understand the issues.
Assuming you’ve decided to start producing data driven content, the question then becomes “How?” There are three phases to producing data driven content. The first phase is data collection. While some forms of data on the Internet are ready to analyze, more often you’ll find data must be acquired carefully and painstakingly from raw web pages, Word documents or PDF files.
If that’s the case, you’ll find the tips on ProPublica’s Nerd Blog a good place to start. Another tool for data collection is a web site called ScraperWiki. And for turning raw data into formats accessible in spreadsheets and other analysis tools, Google Refine is a favorite tool of many data journalists.
Once you’ve got the data into a spreadsheet, the next phase is the actual analysis. What you do here is going to depend on what the story is that you’re trying to tell, but you’ll probably want to stick with easily-understood graphics like pie charts, bar charts and, of course, maps. There are plenty of more sophisticated analyses you can do if you have the time. Some examples can be found in ck12.org‘s Advanced Probability and Statisticstextbook. My own Project Kipling packages advanced analysis tools as well.
The final phase is telling the story. Some tools, courtesy of @andybull’s Masterclass 20: Getting started in data journalism:
Click on the banner for more Hyperlocal Week content
- Sport? Environment? No, data journalism is perfect for me This time last year I was in a quandary. I...
- Data journalism and development at DevXS I’ve just come back from the first student developer conference...
- The Royal Wedding: An experiment in data journalism UPDATE: After having a stab at data journalism today, my...
- Data proficiency is key Alastair Dant and Alex Graul both work on the Guardian’s...
- How data journalism can push you as a journalist Some people would try to claim that journalism is dying...
After finishing my stint in student media, I couldn’t help but look