Access to Statistics: February 2011

Wednesday, February 23, 2011

Testing-SDI: E-government Prospective, Requirements, and Challenges

International Journal of Public Information Systems (IJPIS)

International Journal of Public Information Systems
Vol. 2011:1, pp. 11-47
Testing-SDI: E-government Prospective, Requirements, and Challenges

Authors: Imad Abugessaisa, Anders Östman
Keywords: INSPIRE, Performance testing, SDI, Geoweb services, Usability testing
Abstract
Spatial Data Infrastructure denotes the collection of technologies, policies and institutional arrangements that facilitate the availability of and access to spatial information. During the last few years the development of spatial data infrastructure in Sweden has been influenced by two actions. The first was the European Directive in spatial data infrastructure namely Infrastructure for Spatial Information in Europe (INSPIRE), and the second action was the Swedish parliament's directive early in 2008 on e-Government. In a modern society, spatial data play major roles and have different applications such as information support during disaster prevention and management. These two milestones involving Geodata development have created huge demands and represent great challenges for researchers in the area of spatial data infrastructure. One of these challenges concerned the methodologies involved for testing proposed data specifications from INSPIRE. This paper addresses the above challenge and introduces a framework for testing Geodata. The testing of Geodata includes, the testing of the data specifications for different geographical themes and data structure, the performance testing of Opengeospatial Web Services (OWS) and the usability of Geoportals and services. The proposed methods were evaluated during a pilot test for a regional geoportal in Sweden, and the reported results in this paper show the feasibility and applicability of the methods used. The methods used assisted in the identification of the performance related defects and the bottleneck involved in relation to the response time, stress and load. The methods support the detection of different types of errors that occur during the testing time such as http error, timeout error, and socket error. During the pilot test of a geoportal, it was discovered that the response time was 30 seconds which is 6 times higher than the INSPIRE required time (Maximum 5 second), with 500 virtual users accessing the system and performing a specific task. A usability test was conducted which focused on the users' acceptance and the “think aloud” methods. The usability testing enabled the identification of user-interface related problems and the results were quantified to enable comparisons to be made with current results and those from the new test.

Download full article.

Tuesday, February 22, 2011

The factlab

peter.andersson@omnistat.com
www.thefactlab.com

Monday, February 21, 2011

What problem are we solving?

My interest in standards and exchange formats is driven by two facts. Me and my colleagues are in the process of creating a statistical publication data warehouse. That is a centralized repository for all published statistical data. One of the big reasons for doing so is to be able to centralize and effectively build services for our customers. We want to provide system to system access, we want them to be able to subscribe and browse our data and metadata, free to efficiently use it to whatever purpose they see fit. Number two, we have experienced much interest in system to system access of data from all over society. We would like to provide our customers with the service they want and need.

If you look at the spectrum of those who want to harvest data in a systematic way, there are on the other end big users that are interested in large portions of data, either in a specific domain or as a whole. These are typically scientists such as universities doing researches. Entepreneurs are part of the group and already there is a company called Datamarket http://datamarket.com that has downloaded incredible amount of data from our website. This was done in good ccoperation with us. The main characteristics of the scientific user are large and complex datasets with metadata but not frequently.

On the other end there are quite different type of users. They are typically interested in some small part of our data, for instance the monthly CPI or export of a set of goods to a certain country. In my experience these are typically enterprises interested in getting statistical data to use within their own information systems. Due to indexation of loans in Iceland, banks and other financial institutions have a great interest in getting indices in a systematic way as soon as they are published. In addition software developers are interested in adding functionality into their application. Lets call this group small data enterprises. Then we have all kinds of users in between in all kinds of flavours. I think it is helpful to look at the requests of the users in this perspective.

In order to succeed we need to come up with ways to serve the data in a way that the customers are content with and which allows them to gain something by using the system to system service. It will always be measured against going to a website, selecting the data and click the Excel, csv, xml button. If the system to system soluction doesn´t beat that, it will not be used at all. What are then the key questions system developers ask, faced with task such as described?

There are two key elements at work, the frequency of the data transfer and the development time/cost. If one is to write a program that harvests data every month and the lifetime of the program is five years, then we have 60 transmissions. If it takes one day (8 hrs) to write it, each transmission costs 8 minutes. So if it takes less than 8 minutes to get it by hand ...

There are other things to take into account, is the timing critical?, what are the consequences if the person responsible forgets? and the simple fact that these kind of tasks are boring. But from a cost/benefit point of view this is largely the case. Statistical data is seldomly published with more frequency than one month even though some have weekly publications. What if the developer doesn´t have to download the data and store it in their own system? What if they could simply create program that gets the data from the statistical office every time someone in their office needs it? The frequency is much higher, lets say that it is used within the enterprise only twice a day, over the same 5 years the data would be transmitted around 2000 times and the development time is likely to be less than before, because you dont have to design and create storage for the data. But even with same development time, 8 hrs. each transmission would cost around 15 seconds.

This is important because we need to think about the motives and the usability of the service that we want to create. From this simple example it seems that we need to design the service for small data enterprises in such a way that they can program their systems to get the data directly from the service. There isn´t a very big case for them to create a program that fetches small amount of data and stores it in their systems. The programming isn´t likely to save them much. You need considerable frequency to justify the program, which is hardly the case with official statistical data.

What about the scientists, what is their biggest concern? They usually want data for the purpose of storing it within their own systems, so they will probably not consider fetching the data on demand. They are usually getting large numbers of data and they are interested in details. They are likely to be interested in all the metadata, particularly classifications. So they will quite likely settle for longer time in implementing and getting the data if they get access to good metadata. Since the frequency is of little concern the main factor is development cost. How much time will it take to get the data from the statistical office and import it into my information structure, whether it is a database or something else.

Sunday, February 20, 2011

noi Italia - translated by Google

Thursday, February 17, 2011

Official Google Blog: Visualize your own data in the Google Public Data Explorer

From: Official Google Blog: Visualize your own data in the Google Public Data Explorer

2/16/2011 11:01:00 AM

(Cross-posted on the Google Code Blog)

Over the past two years, we’ve made public data easier to find, explore and understand in several ways, providing unemployment figures, population statistics and world development indicators in search results, and introducing the Public Data Explorer tool. Together with our data provider partners, we’ve curated 27 datasets including more than 300 data metrics. You can now use the Public Data Explorer to visualize everything from labor productivity (OECD) to Internet speed (Ookla) to gender balance in parliaments (UNECE) to government debt levels (IMF) to population density by municipality (Statistics Catalonia), with more data being added every week.

Access to Statistics