City Press and Rapport carried a story by the Media24 Investigations team today which I’d like to think is one of the first examples of data journalism and online mapping technology producing a pretty solid news story in South Africa. I thought I’d share a bit of insight into how the story came about and how we did it.
The story, if you click on the link, talks about how the ANC-linked Chancellor House Mineral Resources company has scored hundreds and hundreds of prospecting licences and emerges at a time when the party is wracked with a debate over the nationalisation of mines in South Africa. It gives some insight into the potential wealth available to the ruling party through the Chancellor House entities and, I believe, raises questions about the entwinement of the party quite directly in the economy, a phenomenon which I am pressed to find another example of in any other constitutional democracy.
So, how did we get this story?
Well, this one did not come about from a deep throat informant as much investigative reporting in South Africa does (in fact, is often forced to, due to the paucity of useful information in the public domain).
A couple of months back the Department of Mineral Resources published records of all mining and prospecting right applications in South Africa over the last couple of years. As is the habit of SA government departments, the records were published in pdf format which makes them pretty useless for any useful intensive examination of the data.
I decided I would try and apply the Python programming skills I have been tortuously teaching myself over the last year to see if we could do anything useful with them.
I downloaded all the pdf files, converted them into text documents and then wrote a bit of Python script which allowed us to do rapid searches of the records, extract relevant records into files which then allowed us to start looking a bit more closely at what was going on.
To give you an idea of the scale of this, the documents contained in total more than 1,5 MILLION individual records, so you can see how trying to manually examine them using a pdf search was pretty useless.
Once the script was working it was pretty easy to start mining them for interesting stuff. Chancellor House was an obvious search since the entity had already declared an interest in the mining sector, but even so I was surprised at the number of records which a query produced.
Using this script we were also able to identify more ICT mining rights which resulted in a story about the possible mining of the Vaal River a few weeks back (I didn’t blog about it at the time since we were still working on the Chancellor House story)
The useful thing about these records is that they also include geo-co-ordinates. Once, I had extracted the records, I used Excel to clean them up a bit more and then used a website called Scribblemaps to import the spreadsheet and map the prospecting rights which then showed us where they were located.
The map below shows how it worked.
The beauty of this is that the location of such information opens up other avenues for reporting and also provides a great reference for graphic artists to produce a map for print which is what happened in this case.
The rest of the reporting was straightforward.
We plan to be using these tools more in the future as we build computer-aided reporting and geo-location technologies into a central pillar of the work of the Media24 investigations team as we try and produce some innovative journalism using the powerful technical tools of our time.
I’d be interested if anyone could share other examples of such techniques being used in South African journalism or if you have some thoughts on other SA-based online datasets which would be worth us looking at to see what we can develop into some decent investigative journalism.