Tuesday, July 17, 2012

ScienceDaily - Getting to the Bottom of Statistics: Software Utilizes Data from the Internet for Interpreting Statistics


ScienceDaily (July 16, 2012) — Interpreting the results of statistical surveys, e.g., Transparency Internation­al's corruption indices, is not always a simple matter. As Dr. Heiko Paulheim of the Knowledge Engineering Group at the TU Darmstadt's Computer Sciences Dept. put it, "Although methods that will unearth explanations for statistics are available, they are confined to utilizing data contained in the statistics involved. Further, background information will not be taken into account. That is what led us to the idea of applying data-mining methods that we had been studying here to the semantic web in order to obtain further, background infor­ma­tion that will allow us to learn more from statistics."

The "Explain-a-LOD" tool that Paulheim developed accesses linked open data (LOD), i.e., enormous compilations of publicly available, semantically linked data accessible on the Internet, and, from that data, automatically formulates hypo­theses regarding the interpretation of arbitrary types of statistics. To start off, the statistics to be interpreted are read into Explain-a-LOD. Explain-a-LOD then automatically searches the pools of linked open data for associated records and adds them to the initial set. Paulheim explained that, "If, for example, the country "Germany" is listed in the corruption-index data, LOD‑records that contain information on Germany will be identified and further attributes, such as its population, its membership in the EU and OECD, or the total number of companies domiciled there, generated. Attributes that are unlikely to yield useful hypotheses will be automatically deleted in order to reduce the volumes of such enriched statistics.


Explain-a-LOD helps to interpret statistics, like for example the corruption perceptions index by Transparency International. (Credit: Diagram: Transparency International)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.