Google BigQuery offers you to analyse your data. You can build a data warehouse to run BI tools to do analysis. In this post we will discussing about a simple pipeline to publish your events data to Google BigQuery using Google pubsub.
Step by step guide
- Create a table in Google BigQuery. You can use Google cloud console or cli
Below is a simple schema for a fact table.
2. create a google cloud function to insert the data into BigQuery. A generic cloud function which we have created would work for json serialized objects
Below is the example
This function is generic it expected the message to contain datasetInfo which contains the database and table name where you want to insert the object.
The object to be inserted is send as a base64 encoded string as part of the json
payload = base64.b64decode(dw_message["payload"]).decode('utf-8')
You can send a array of items to push to insert into bigquery
rows_to_insert = json.loads(valid_json_string)
errors = client.insert_rows(table, rows_to_insert)
3. Once you have above code ready. Go to console create function and subscribe it to a pubsub topic something like below
4. Once you have a function which is listening to the topic and inserting into BigQuery. You now need to write a publisher to publish this information
Below you is the code written in golang which is publishing to google cloud pubsub topic
You can also find the code in our Coral server repository
You have make sure that the Column name in the Table are similar to the fields of your struct which is getting serialized.
With these simple steps you can quickly build a Datawarehouse of your real time apps in cost effective manner.
Feel free to reach out to us at email@example.com.
Follow us on twitter https://twitter.com/k8scaleio