Friday, August 22, 2014

UNECE-coordinated work relating to Big Data


Thursday, July 3, 2014

Flowing Data: Data science, big data, and statistics – all together now



Terry Speed, a emeritus professor in statistics at University of California at Berkeley, gave an excellent talk on how statisticians can play nice with big data and data science. Usually these talks go in the direction of saying data science is statistics. This one is more on the useful, non-snarky side.

Tuesday, April 15, 2014

The Guardian - Big data and open data: what's what and why does it matter?


Both types of data can transform the world, but when government turns big data into open data it's especially powerful

Joel Gurin, New York University
Guardian Professional, Tuesday 15 April 2014 10.49 BST

Big data and the new phenomenon open data are closely related but they're not the same. Open data brings a perspective that can make big data more useful, more democratic, and less threatening.

While big data is defined by size, open data is defined by its use. Big data is the term used to describe very large, complex, rapidly-changing datasets. But those judgments are subjective and dependent on technology: today's big data may not seem so big in a few years when data analysis and computing technology improve.

Open data is accessible public data that people, companies, and organisations can use to launch new ventures, analyse patterns and trends, make data-driven decisions, and solve complex problems. All definitions of open data include two basic features: the data must be publicly available for anyone to use, and it must be licensed in a way that allows for its reuse. Open data should also be relatively easy to use, although there are gradations of "openness". And there's general agreement that open data should be available free of charge or at minimal cost.

The relationship between big data and open data
Source: Joel Gurin

This Venn diagram maps the relationship between big data and open data, and how they relate to the broad concept of open government. More....

Friday, March 21, 2014

Report of MPs’ inquiry into UK statistics and open data published - StatsLife

Written by Web News Editor on . Posted in News
Access to public sector data must never be sold or given away, and should be made open by default, according to a report on Statistics and Open Data published on 17 March 2014 by the Public Administration Select Committee (PASC).
One of the report's key recommendations is that data should be made 'open' by default, ie accessible to all, free of restrictions on use or redistribution, and in a digital, machine-readable format. 'There should be a presumption that restrictions on government data releases should be abolished,' the report notes. ‘It may be necessary to exempt certain data sets from this presumption, but this should be on a case-by-case basis.' The report also said that charging for some data may occasionally be appropriate, 'but this should become the exception rather than the rule.'

Saturday, March 15, 2014

Why the wealthiest countries are also the most open with their data - Washington Post


The Oxford Internet Institute this week posted a nice visualization of the state of open data in 70 countries around the world, reflecting the willingness of national governments to release everything from transportation timetables to election results to machine-readable national maps. The tool is based on the Open Knowledge Foundation's Open Data Index, an admittedly incomplete but telling assessment of who willingly publishes updated, accurate national information on, say, pollutants (Sweden) and who does not (ahem, South Africa).
Tally up the open data scores for these 70 countries, and the picture looks like this, per the Oxford Internet Institute (click on the picture to link through to the larger interactive version):
Oxford Internet Institute
Oxford Internet Institute
That's Great Britain in the lead at left, followed by the U.S., Denmark, Norway, the Netherlands and Australia. Each segment in the above chart corresponds to a country's score on one of the component metrics (election results, government budget, etc.). The orange outlier in that left group is Israel. Meanwhile, Kenya, Yemen and Bahrain are among the countries at the far right. More.....

Monday, March 10, 2014

VB News - Statwing picks up funding from data science luminary Hammerbacher


Statwing picks up funding from data science luminary Hammerbacher
Above: A correlation as shown in Statwing's software.
Image Credit: Statwing
January 30, 2014 3:01 PM 
Jordan Novet

Big data projects are trendy, but they can be hard to pull off.

Venture capitalists understand the problem. They’ve been betting on startups like DataHero andChartio that aim to make analysis and visualization of data fast and simple. Now another startup,Statwing, has revealed new backing, and it comes from a leading figurehead in the big data world, Jeff Hammerbacher, a cofounder of fast-growing big data company Cloudera.

Statwing uses a clean point-and-click interface, as opposed to a clunky and overly complicated tool like Microsoft Excel. Users can drop in data from a spreadsheet and then get super-clear statements that tell users what they’re looking for, alongside visualizations and high-level statistics.   

Sunday, March 2, 2014

BusinessNewsDaily Reference: What is Statistical Analysis?

ByChad Brooks, BusinessNewsDaily Contributor   |   February 28, 2014 12:07am ET

Statistical analysis software

In an effort to organize their data and predict future trends based on the information, many businesses rely on statistical analysis.

While organizations have lots of options on what to do with their big data, statistical analysis is a way for it to be examined as a whole, as well as broken down into individual samples.

The online technology firm describes statistical analysis as an aspect of business intelligence that involves the collection and scrutiny of business data and the reporting of trends.

Thursday, January 9, 2014

Statistics eXplorer accessible for education and research

Statistics eXplorer
Statistics eXplorer integrates many common InfoVis and GeoVis methods required to make sense of statistical data, uncover patterns of interests, gain insight, tell-a-story and finally communicate knowledge. Statistics eXplorer was developed based on a component architecture and includes a wide range of visualization techniques enhanced with various interaction techniques and interactive features to support better data exploration and analysis. It also supports multiple linked views and a snapshot mechanism for capturing discoveries made during the exploratory data analysis process which can be used for sharing gained knowledge.

Statistical eXplorer Overview


Tuesday, January 7, 2014

Text Mining: The Next Data Frontier - Scientific Computing


Mon, 01/06/2014 - 2:04pm
Mark Anawis

By some estimates, 80 percent of available information occurs as free-form text

Text Mining: The Next Data Frontier
Figure 1: Text Mining and Related Fields

Josiah Stamp said: “The individual source of the statistics may easily be the weakest link.” Nowhere is this more true than in the new field of text mining, given the wide variety of textual information. By some estimates, 80 percent of the information available occurs as free-form text which, prior to the development of text mining, needed to be read in its entirety in order for information to be obtained from it. It has been applied to spam filters, fraud detection, sentiment analysis, identification of trends and authorship.

Text mining can be defined as the analysis of semi-structured or unstructured text data. The goal is to turn text information into numbers so that data mining algorithms can be applied. It arose from the related fields of data mining, artificial intelligence, statistics, databases, library science, and linguistics (Figure 1).