What is open data and how to request it

About me

Since creating my first open data set (the Commonwealth budget) I have worked directly and indirectly with government on opening specific datasets and tweaking quality issues to make the data more usable. I maintain a prototype budget transparency project here that links in multiple financial datasets made available as open data (documented here).

 


GovHack, the annual hackfest aimed at encouraging the government to open data and civil society to make use of it is on again this July. In preparation I created a quick how-to with the info you need if you want to seek to have particular datasets opened for release. They won’t count as ‘official datasets’ for use in GovHack and in fact are unlikely to be released within the coming weeks but release of data has to be requested by someone at some point. The following provides an overview of what, where and how.

The Open Data Institute has offered the following diagram to summarise where ‘open data’ falls in the spectrum of statuses that can apply to data:

Put simply, the same data set can be made available in multiple formats with different levels of access depending on the detail or level of abstraction in the data. A common example in Australia is census data which is made available to researchers at the individual level (hopefully de-identified) but is made available at a more abstract level in the statistics we can publicly access and use in our open data projects at the ABS site.

Tim Berners-Lee, who co-founded the Open Data Institute created a rating system that applies to open data which ranges from a single star for making data available in PDF form (as opposed to not publishing at all) through XLS, CSV, RDF & finally 5 stars for linked data.

Image copied from http://5stardata.info/en/ CC0 Public Domain Dedication

 

While data.gov.au boasts almost 50k ‘datasets’, these are not all open (but are documented on data.gov.au to inform researchers they can apply to access them) and this figure also includes PDF or Word documents which most people working with data don’t consider to be usable formats.

 

To carry out a request for open data we need to ask ourselves some key questions. For a data request to be successful it first needs to go through some basic checks.


1.Is the dataset solving a need?

Firstly, we need to ask what problems we are trying to solve? Problems need to be articulated in a way that makes clear the role of a specific dataset in solving it.

2. Is the data collected?

Secondly, we need to ask if the data is being collected by an agency? If this data is not currently being collected but there is a good case for why it should be collected then it is still worth asking for the data. It does however, need to be understood that it may not be a trivial task for a dataset to be collected and this is only recommended when there are lasting long term benefits expected to come from the data.

Sometimes data may be available but may not be sufficiently fine-grained to be useful in solving problems. It may be available from some agencies or areas but not others or it may vary in scope & granularity between jurisdictions making it hard to use data across multiple jurisdictions. Crime data is an example of this kind of problem where laws may differ between jurisdictions as well as the way data on it is defined and collected. Data may not be available in a timely fashion to make it useful or it could be of such poor quality that to use it would create more problems than it would solve (see #NotMyDebt for a prominent example of how using poor quality public data can cause problems instead of create solutions).

3. Should this data be open?

This brings us to the next question. Should the data in question actually be open data? This is a different question to whether the dataset actually exists. Are there privacy considerations which make it unsuitable for publication? An example here is the Medicare and Pharmaceutical Benefits Data that was released on data.gov.au and was subsequently removed when it was found that it was not properly anonymised.

Data that is about individuals at unit record level ie one line to one individual is not suitable for release as open data (see data spectrum image top of page). It may be released to specialist researchers for use in a closed environment but putting that level of data into the public domain is not justifiable as it is not possible to protect the identities of the individuals who are the subject of that data.

Open data is about the transparency of government and organisations, not about providing information about individuals to the public. Government contracts received by businesses or charities is suitable for open data, data about individual prescriptions or medical consultations is not.

Data about individuals can, however, be abstracted to provide useful data to work with. Census data is not made available at unit record level as open data but there are many datasets provided by the ABS where this data has been abstracted into statistics and these open datasets are in common usage by many people and organisations.

4. Is the data already open?

To answer this question you need to know where to look for your dataset. Data.gov.au provides a federated search of open datasets published by state and federal agencies. However not all agencies provide all of their data to data.gov.au. The ABS is a good example as they generate so many datasets it is probably not practical to integrate all these into data.gov.au.

You can also ask for help identifying whether data is available and where from by asking

While informal inquiries may help you find out if a dataset is already published and where, it is important that you make any subsequent request to the right jurisdiction. This depends on the ‘data custodian’ which is the agency that collects the data. If it is a state government agency then you need to make the request to the state open data portal. If it is a Commonwealth dataset then you need to make the request to data.gov.au.

I have published this several months ahead of GovHack to provide the time needed for people to think about what problems people think open data might assist in and to allow government to take steps to meet these requests. Some datasets may exist but have not been put to use in the way desired in past projects. Where data has been made available (either already existing or published in time for GovHack), the spreadsheet can form a point of reference for teams competing in GovHack or anyone looking to experiment with an idea outside of the competition. The last field in the spreadsheet is for contact info. People working on similar ideas can use it to get in contact and discuss requirements, if desired.

I am happy to run additional online or face to face events on specific projects through my meetup guiding people through the data available and the request process as well as hack events for people interested in working on specific projects. If you have a venue suitable for such an event please let me know.

If you want a quick word of advice on what to request, how to request, etc please ping me @info_aus or use the hashtag #opendata which is used worldwide but should be useful all round.

I’d like to point out that none of this is intended to ‘compete’ with official GovHack processes, just provide alternative ways of participation.


Print Friendly, PDF & Email
Liked it? Take a second to support Rosie on Patreon!