Query a public dataset with the bq tool

Learn how to examine and query a public dataset with the bq command-line tool.

Before you begin

  1. In the Trusted Cloud console, on the project selector page, select or create a Trusted Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Trusted Cloud project.

  3. If you don't enable billing for the Trusted Cloud project that you use in this tutorial, then you will work with data in the BigQuery sandbox. The BigQuery sandbox lets you learn BigQuery with a limited set of BigQuery features at no charge.

  4. Ensure that the BigQuery API is enabled.

    Enable the API

    If you created a new project, the BigQuery API is automatically enabled.

  5. In the Trusted Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Trusted Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

Examine a public dataset

BigQuery offers several sample tables in the bigquery-public-data.samples dataset that you can query. In this tutorial, you run queries on the shakespeare table, which contains an entry for every word in every Shakespeare play.

Examine the shakespeare table in the samples dataset:

bq show bigquery-public-data:samples.shakespeare

The output is similar to the following. Some columns are omitted to simplify the output.

  Last modified                  Schema                 Total Rows   Total Bytes
----------------- ------------------------------------ ------------ ------------
 14 Mar 17:16:45   |- word: string (required)           164656       6432064
                   |- word_count: integer (required)
                   |- corpus: string (required)
                   |- corpus_date: integer (required)

Query a public dataset

Use the bq query command to run SQL queries on data.

  1. Determine how many times the substring raisin appears in Shakespeare's works:

    bq query --use_legacy_sql=false \
        'SELECT
          word,
          SUM(word_count) AS count
        FROM
          `bigquery-public-data.samples.shakespeare`
        WHERE
          word LIKE "%raisin%"
        GROUP BY
          word;'
    

    The output is similar to the following:

    +---------------+-------+
    |     word      | count |
    +---------------+-------+
    | praising      |     8 |
    | Praising      |     4 |
    | raising       |     5 |
    | dispraising   |     2 |
    | dispraisingly |     1 |
    | raisins       |     1 |
    +---------------+-------+
    
  2. Search for the substring huzzah in Shakespeare's works:

    bq query --use_legacy_sql=false \
        'SELECT
          word
        FROM
          `bigquery-public-data.samples.shakespeare`
        WHERE
          word = "huzzah";'
    

    Because the substring doesn't appear in Shakespeare's works, no results are returned.

Clean up

To avoid incurring charges to your Trusted Cloud account for the resources used on this page, delete the Trusted Cloud project with the resources.

Delete the project

If you used the BigQuery sandbox to query the public dataset, then billing is not enabled for your project.

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Trusted Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next