Wednesday, January 5, 2011

Citizen-centric access to statistics - project summary

The project will investigate the possibilities to define an easy-to-use, powerful, and flexible standard interface for communication of statistical data and metadata between the dissemina­tion databases of statistics producers (statistical agencies) and citizen-centric applications, developed by entrepreneurs or by citizens themselves.


  1. Hi project team. I welcome this project, support it and wish it luck. It is important to reach a situation where free access to data in standard format across organisations is available.
    However I find some exaggerations or doubtful statements that may be counterproductive. For instance this
    "many governmental agencies, including national statistical offices ... often still prefer to develop and implement new services themselves, rather than leaving this to the entrepreneurs, working in free competition and on their own risk, without the budget limitations and other restrictions existing for government agencies."
    There is no harm done in statistical offices offering good and understandable presentations of their data, as long as they allow everyone else to make better (or worse) presentations; on the contrary, statistical offices may in many cases possess insights that allow them to present data correctly, point out important aspects and avoid pitfalls of gross misunderstandings (I was recently presented with very nice and convincing presentations of statistics made by a very clever entrepreneur, but where the message was a clear nonsense correlation). As long as this doesn't prevent entrepreneurs to do their work, everyone should be happy - it is even beneficial for the entrepreneurs to have the competition from stats offices, and let users choose themselves if they believe in one or the other, and which presentations they want to look at.

  2. what's the difference with the SDMX project?

  3. SDMX is a standard developed by 7 international organisations for exchange of data and metadata between national statistical institutes and international organisations. Our project aims at citizens, entrepreneurs, and others who want to access statistical data and metadata in a simple, efficient, and standardised way for further use and analysis in their own applications. One of our project members, Björgvin Sigurdsson from Statistics Iceland, has analysed these needs in a paper and a presentation that can be downloaded from His conclusion is that registry-based approaches like SDMX tend to become too complex for individual citizens and entrepreneurs who do not necessarily have access to technically advanced IT specialists. Maybe SDMX can be further developed to meet these requirements, but other approaches also need to be investigated, like APIs, web services, and maybe others, as discussed by Björgvin.

  4. I agree with Lars Thygesen that statistical offices have a lot of valuable knowledge and competence to offer all users of statistics, casual users as well as professional analysts and entrepreneurs, developing new methods and tools for getting more value out of the statistics produced by tax-funded statistical offices. However, I do not think that statistical offices should *compete* with their own customers. Instead they should use their tax-funded knowledge and competence to *support* all their users, when these users have good ideas about how to further exploit the benefits of official statistics. This can be done in many important ways: by ensuring that statistical data are of good quality, well documented, etc, and by ensuring that statistical data and metadata are made available to everyone, including citizens, analysts, and entrepreneurs, as easily and efficiently as possible, in most cases free of charge – as part of the public task that the citizens and taxpayers have entrusted the statistical offices through their democratically elected leaders.

  5. NComVA is now developing and evaluating several data interfaces to statistical databases. Today a user of Statistics eXplorer who would like to analyse her own selection of indicators from, for example, Statistics Sweden statistical database, first selects indicators and time series and then saves (exports) selected data as an EXCEL file. This non-standard format file must then be converted into a format suitable for further analysis, in the case of Statistics eXplorer a spreadsheet type Unicode format. This is a rather cumbersome process and the possibility that the format is not correct described for the region identification or its associated metadata is substantial. We are therefore now evaluating alternative “standard” data formats that could be directly imported into Statistics eXplorer. We have tested SDMX with support from OECD and eXplorer can today import indicators and metadata in this format from the OECD.stat database. We are also developing an API interface for our World eXplorer to the World Bank database with thousands of indicators. This API is proven very successful and user can easily select indicators and time series from this database and analyse, create stories and finally publish dynamic visualization in blogs or web sites. This API approach, however, only works for the World Bank database and their chosen API. Our last ongoing development is the PC-AXIS format used today by, for example, Statistics Sweden and Denmark and but also many other statistical organizations and is regarded as “default” Nordic standard. The approach is similar to the SDMX format, save your statistics data as a PC-AXIS file and then imports this file with selected indicators and metadata into in our case the Statistics eXplorer. In our software world, we could continue to develop more data interfaces and make the process easy for our world-wide users to import statistical data, but our preference would be that the statistics community agree on a single standard data format, for example, the ongoing work with SDMX and as Bo writes “ensuring that statistical data and metadata are made available to everyone, including citizens, analysts, and entrepreneurs, as easily and efficiently as possible”. SDMX might me complex but statistical data integrated with its metadata is not a simple task to illustrate in a constructive format. SDMX will help the analysts, and entrepreneurs (such as NComVA) to provide statistics understanding and knowledge to the citizens in a much easier way.

  6. The registry standard is only a part of the SDMX standard. SDMX can be implemented and the registry part can be ignored completely. There is the SDMX-ML format itself (based on XML), the content guidelines, and (perhaps most important for this project) the SDMX web service standard that supports SOAP and REST interfaces (REST is a simple API to that can be used in a browser). It seems to me that SDMX already provides a well-supported solution to the problems pointed out in the "Systematic access to statistical data" section in the document at SDMX is sponsored by 7 organisations but is actually developed and used by many more. Here is a link to the SDMX web service guidelines: Several organisations and countries have already implemented SDMX web services. Why invent yet another format for exchanging statistical data when one exists?

  7. The SDMX question is a very good one. What is the difference? I guess not very much. But just because something exists doesn't mean we should stop looking for alternatives. I think that SDMX is not suitable for all wanting to get system to system access to data. I promise an article about why.
    I believe there is a good opportunities "in this market" for another format aimed at smaller entities such as small companies using specific statistics in specific systematic way, and for entrepreneurs, universities and other that want to download large quantities of statistical data.
    I think the journey and the discussions are perhaps the most important things. Isn't it time we ask those who are downloading how they want the data? I know there are many "complex" things that make up a statistical dataset, but I also know that a large proportion of users don't give about all that stuff. They want to get the CPI, GDP or whatever they are looking for. They know what they want, what is it and how it is defined. They just can't downloaded it in an efficient way.
    This project needs to address different types of needs and problems. That will make us better, I am sure.
    Regarding the question about the role of entrepreneurs and statistical offices. At least in my country the statistical office is bound by law to make statistics publicly available. Therefore we will always maintain systems for users to get our data. I don´t see enterprises, commercial or not, as competition. I see it as a great way to expand the use of statistics. There can be competition who is the most user-friendly, most coverage of data etc. Statistical offices are in competition in those areas and should compete. That is what good service is all about.

  8. It's a balancing act to to propose the data in a way that people want it, and to benefit from standards to avoid duplication of work and weakening of existing standards. The paper proposes a new format based on XML. The comment states that 'users can't download it [the data] in an efficient way'. The SDMX compact format is designed to solve that. There are tools that exist or that are being developed to support this. The reason that I'm bringing this up is that SDMX seems to have already been dismissed in the paper without considering of all of the features, certainly not in the document or the comments here anyway. Yes, it is a complex standard looking at the whole specification, but to actually create an SDMX document or consume one you don't have to care about most of the standard. I look forward to your article.

    David Barraclough
    STD/SIMS IT Unit

  9. I fully agree with David. It would be sad if we go ahead and invent something different - would we then expect e.g. statistical offices to offer support for this new thing (and maybe many others)alongside SDMX? On the other hand, if there are obstacles to the proliferation of SDMX, we should make sure they are eliminated. As for many other useful standards, you don't need to use all of its features, and you don't need to know them all as a common user.

  10. Very interesting discussion. Is there any study done by anyone comparing how useful different prevailing standards are from different criteria?


