Access to Statistics: 2014

Friday, August 22, 2014

UNECE-coordinated work relating to Big Data

From: http://www1.unece.org/stat/platform/display/bigdata/Big+Data+in+Official+Statistics;jsessionid=AE7DF06FDB27C80A30DACD65F6BDADB6

Preliminary results of the survey "Skills necessary for people working with Big Data in Statistical Organisations". More detailed analysis will be prepared by October 2014.

International collaboration project on The Role of Big Data in the Modernisation of Statistical Production - This project, overseen by the High-Level Group for the Modernisation of Statistical Production and Services, will run during 2014, and will:
- identify, examine and provide guidance for statistical organizations to act upon the main strategic and methodological issues that Big Data poses for the official statistics industry
- demonstrate the feasibility of efficient production of both novel products and ‘mainstream’ official statistics using Big Data sources, and the possibility to replicate these approaches across different national contexts
- facilitate the sharing across organizations of knowledge, expertise, tools and methods for the production of statistics using Big Data sources.

Position paper What does Big data mean for official statistics? (March 2013) drafted for the High Level Group for the Modernization of Statistical Production and Services (HLG).

Draft classification of types of Big Data

Thursday, July 3, 2014

Flowing Data: Data science, big data, and statistics – all together now

From: http://flowingdata.com/2014/07/02/data-science-big-data-and-statistics-all-together-now/

JULY 2, 2014 | STATISTICS

Terry Speed, a emeritus professor in statistics at University of California at Berkeley, gave an excellent talk on how statisticians can play nice with big data and data science. Usually these talks go in the direction of saying data science is statistics. This one is more on the useful, non-snarky side.

Tuesday, April 15, 2014

The Guardian - Big data and open data: what's what and why does it matter?

From: http://www.theguardian.com/public-leaders-network/2014/apr/15/big-data-open-data-transform-government

Both types of data can transform the world, but when government turns big data into open data it's especially powerful

Joel Gurin, New York University
Guardian Professional, Tuesday 15 April 2014 10.49 BST

Big data and the new phenomenon open data are closely related but they're not the same. Open data brings a perspective that can make big data more useful, more democratic, and less threatening.

While big data is defined by size, open data is defined by its use. Big data is the term used to describe very large, complex, rapidly-changing datasets. But those judgments are subjective and dependent on technology: today's big data may not seem so big in a few years when data analysis and computing technology improve.

Open data is accessible public data that people, companies, and organisations can use to launch new ventures, analyse patterns and trends, make data-driven decisions, and solve complex problems. All definitions of open data include two basic features: the data must be publicly available for anyone to use, and it must be licensed in a way that allows for its reuse. Open data should also be relatively easy to use, although there are gradations of "openness". And there's general agreement that open data should be available free of charge or at minimal cost.

The relationship between big data and open data

Source: Joel Gurin

This Venn diagram maps the relationship between big data and open data, and how they relate to the broad concept of open government. More....

Friday, March 21, 2014

Report of MPs’ inquiry into UK statistics and open data published - StatsLife

From: http://www.statslife.org.uk/news/1299-report-of-mps-inquiry-into-uk-statistics-and-open-data-published?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+rss-enews+%28StatsLife+-+all+the+latest+statistics+news%29

Written by Web News Editor on 20 March 2014. Posted in News

Access to public sector data must never be sold or given away, and should be made open by default, according to a report on Statistics and Open Data published on 17 March 2014 by the Public Administration Select Committee (PASC).

One of the report's key recommendations is that data should be made 'open' by default, ie accessible to all, free of restrictions on use or redistribution, and in a digital, machine-readable format. 'There should be a presumption that restrictions on government data releases should be abolished,' the report notes. ‘It may be necessary to exempt certain data sets from this presumption, but this should be on a case-by-case basis.' The report also said that charging for some data may occasionally be appropriate, 'but this should become the exception rather than the rule.'

More....

Saturday, March 15, 2014

Why the wealthiest countries are also the most open with their data - Washington Post

From: http://www.washingtonpost.com/blogs/wonkblog/wp/2014/03/14/why-the-wealthiest-countries-are-also-the-most-open-with-their-data/?tid=hpModule_79c38dfc-8691-11e2-9d71-f0feafdd1394

BY EMILY BADGER
March 14 at 1:29 pm

The Oxford Internet Institute this week posted a nice visualization of the state of open data in 70 countries around the world, reflecting the willingness of national governments to release everything from transportation timetables to election results to machine-readable national maps. The tool is based on the Open Knowledge Foundation's Open Data Index, an admittedly incomplete but telling assessment of who willingly publishes updated, accurate national information on, say, pollutants (Sweden) and who does not (ahem, South Africa).

Tally up the open data scores for these 70 countries, and the picture looks like this, per the Oxford Internet Institute (click on the picture to link through to the larger interactive version):

Oxford Internet Institute

That's Great Britain in the lead at left, followed by the U.S., Denmark, Norway, the Netherlands and Australia. Each segment in the above chart corresponds to a country's score on one of the component metrics (election results, government budget, etc.). The orange outlier in that left group is Israel. Meanwhile, Kenya, Yemen and Bahrain are among the countries at the far right. More.....

Friday, March 14, 2014

Big Data - The 5 Vs Everyone Must Know

Big Data - The 5 Vs Everyone Must Know from Bernard Marr

Monday, March 10, 2014

VB News - Statwing picks up funding from data science luminary Hammerbacher

From: http://venturebeat.com/2014/01/30/statwing-picks-up-funding-from-data-science-luminary-hammerbacher/

Above: A correlation as shown in Statwing's software.
Image Credit: Statwing
January 30, 2014 3:01 PM
Jordan Novet

Big data projects are trendy, but they can be hard to pull off.

Venture capitalists understand the problem. They’ve been betting on startups like DataHero andChartio that aim to make analysis and visualization of data fast and simple. Now another startup,Statwing, has revealed new backing, and it comes from a leading figurehead in the big data world, Jeff Hammerbacher, a cofounder of fast-growing big data company Cloudera.

Statwing uses a clean point-and-click interface, as opposed to a clunky and overly complicated tool like Microsoft Excel. Users can drop in data from a spreadsheet and then get super-clear statements that tell users what they’re looking for, alongside visualizations and high-level statistics.

More......

Sunday, March 2, 2014

BusinessNewsDaily Reference: What is Statistical Analysis?

From: http://www.businessnewsdaily.com/6000-statistical-analysis.html
ByChad Brooks, BusinessNewsDaily Contributor | February 28, 2014 12:07am ET

In an effort to organize their data and predict future trends based on the information, many businesses rely on statistical analysis.

While organizations have lots of options on what to do with their big data, statistical analysis is a way for it to be examined as a whole, as well as broken down into individual samples.

The online technology firm TechTarget.com describes statistical analysis as an aspect of business intelligence that involves the collection and scrutiny of business data and the reporting of trends.

Thursday, January 9, 2014

Statistics eXplorer accessible for education and research

From: http://ncva.itn.liu.se/explorer?l=en

Statistics eXplorer integrates many common InfoVis and GeoVis methods required to make sense of statistical data, uncover patterns of interests, gain insight, tell-a-story and finally communicate knowledge. Statistics eXplorer was developed based on a component architecture and includes a wide range of visualization techniques enhanced with various interaction techniques and interactive features to support better data exploration and analysis. It also supports multiple linked views and a snapshot mechanism for capturing discoveries made during the exploratory data analysis process which can be used for sharing gained knowledge.

Statistical eXplorer Overview

More.....

Tuesday, January 7, 2014

Text Mining: The Next Data Frontier - Scientific Computing

From: http://www.scientificcomputing.com/articles/2014/01/text-mining-next-data-frontier#.UswIHNLuLTo

Mon, 01/06/2014 - 2:04pm
Mark Anawis

By some estimates, 80 percent of available information occurs as free-form text

Text Mining: The Next Data Frontier

Figure 1: Text Mining and Related Fields

Josiah Stamp said: “The individual source of the statistics may easily be the weakest link.” Nowhere is this more true than in the new field of text mining, given the wide variety of textual information. By some estimates, 80 percent of the information available occurs as free-form text which, prior to the development of text mining, needed to be read in its entirety in order for information to be obtained from it. It has been applied to spam filters, fraud detection, sentiment analysis, identification of trends and authorship.

Text mining can be defined as the analysis of semi-structured or unstructured text data. The goal is to turn text information into numbers so that data mining algorithms can be applied. It arose from the related fields of data mining, artificial intelligence, statistics, databases, library science, and linguistics (Figure 1).