Google's BigQuery offers infrastructure to crunch big data
The BigQuery service is an online analytical processing (OLAP) system designed for terabyte-scale datasets
By Thor Olavsrud | CIO US | Published: 16:15, 02 May 2012
Few companies in the world have access to datasets as large as Google does, and, unsurprisingly, Google is one of the companies at the forefront of Big Data analytics. Now Google plans to share the wealth by giving others access to its data crunching infrastructure with its new Google BigQuery Service.
The BigQuery service is an online analytical processing (OLAP) system designed for terabyte-scale datasets. It gives customers the capability to run SQL-like queries against massive datasets that potentially have billions of rows without requiring the hardware and software costs associated with an on-premise solution. BigQuery has been in beta test, or what Google calls "limited preview," since last November. Now Google believes it's ready for prime time.
"The service is conceived so customers can upload their own data," says Ju-Kay Kwek, product manager of BigQuery and leader of some of Google's other Big Data efforts as well. "They can store it all in Google and then, either through a RESTful API or very simple Web UI, they can interrogate their data."
Related Articles on Techworld
"Imagine a big pharmaceutical company optimizing daily marketing spend using worldwide sales and advertisement data," Kwek adds. "Or think of a small online retailer that makes product recommendations based on user clicks."
Kwek notes that one BigQuery customer, social and mobile analytics specialist Claritics, leveraged the service for a Web application for game developers that gives them real-time insights into user behavior. Another customer, Amsterdam-based analytics firm Crystalloids, built a cloud-based application to help a resort network analyze customer reservations, optimize marketing and maximise revenue.
BigQuery ingestion API speeds data uploads
Customers upload their data to Google as CSV files using a data ingestion API. Kwek says the API uses concurrent compressed streams that allow customers to upload several hundred gigabytes in about 15 or 20 minutes.
"Because you're using Google's data centers, the amount of data you can put into the system is for all practical purposes unlimited," Kwek says.
Kwek notes the data is protected with multiple layers of security, replicated across multiple data centers and can easily be exported. Access to the data is managed via group- and user-based permissions using Google accounts.
Unlike many Big Data systems, the service does not leverage Apache Hadoop (derived from Google File System (GFS) and Google MapReduce), but Kwek says it does use a distributed query and data storage architecture. The service abstracts the guts of the analytics operation from the user, sharding out the data, distributing it and managing it.
Once uploaded to Google's storage, customers can use SQL-like query language to interrogate the data. Customers can use BigQuery through a Web UI called the BigQuery browser tool, the bq command-line tool or by making calls to the REST API using various client libraries in multiple languages, including Java and Python. Google's infrastructure can analyze billions of rows in seconds, Kwek says, adding that it is an ideal tool for ad-hoc analysis, standardized reporting, data exploration and Web applications.
Pricing is based on two components: storage and queries. Developers and businesses can sign up for BigQuery online and query up to 100 GB per month for free. Under the basic plan, storage costs 12 cents per gigabyte with a limit of 2 TB, and queries cost 35 cents per gigabyte, with a limit of 1,000 queries per day and 200 TB of data processed per day. Google says customers that require more storage and more processing should speak to a Google sales representative about its premium plan