Getting Data to Big Query

Co-posted on the Information Lab blog

The Problem

I have recently been working with a client who wanted to get their data into a Google Big Querytable with Alteryx. It sounded like a simple problem and should have been pretty easy for me to get working. 
Unfortunately that was not to be the case. Connecting to a big query table with the Simba ODBC driver is pretty easy, but that driver is just read only. Dead end there.
I found that uploading a single record was relatively easy but slow, uploading thousands of records should be easier. 
Deeper digging I went. Using the web interface I can upload a csv to Google Cloud Storage, then load that file into Big Query, the challenge I had was to automate the process.

Step 1 – Throw it into the (Google) Cloud

So I was quiet lucky that my colleagues Johanthan MacDonald and Craig Bloodworth had already made a functioning cloud uploader using the cURL program. All I had to do was doctor the URL to work for me. Ideally I wanted to make the entire process native to Alteryx. 
I thought it would be easy but alas I found a gap in my Alteryx skills I had to fill. So I wanted shelf that upgrade of the process for now and look at the next phase of the automation, importing the csv into big query.

Step 2 – Big Query your Data

So the second step was to get the data into big query. I found this a bit of a humdinger and it really stretched my API foo. So what did I have to do?

What query do I run?

I struggled with this for a while until I attended the Tableau User Group held at the Google Town Hall in London, I was able to get in touch with Reza Rokni who pointed me at the example builder at the bottom of the Jobs List Page where you could build an example of the query to send.
The biggest problem with this is getting the JSON file with the table schema right. I got it eventually, but there was a fair few challenges.

Step 3 – Get it to the Cloud with Alteryx

Once I had the whole process working I wanted to go back to the problem of uploading a file with Alteryx. I knew it was surely possible to upload a file with with the download tool. I just couldn’t work out how to get file in a row to upload.
What settings do I use?

Enter the Blob Family

What made this the upload work was finding what the blob tools do and how they work. So what is a blob? It’s a Binary Large Object, basically a file without the extension.
Blob Family
So now using the blob input tool I’m able to read in a file to a single field. This is exactly what I needed. Its relatively simple from here to read the file into the into a single cell and use the normal download tool for the upload process.

The Final Furlong

Now to pull this all together the last step is to put in the details needed to do the upload. So what are the settings? 
Well on the ‘Basic’ we simply want to set the target URL (check the API documentation to work out what that would be), and what we want to do with the response from the server.
On the headers page we need to define the ‘Authorization’ parameter and the ‘Content Type’ parameter that is needed by Google’s API.
The last set of configuration is the payload, and that is where we define the use of a POST command and what column contains the Blob.

Success!

And after all that I have managed to have success. By combining these two processes the records are uploaded as csv, then imported into a pre-defined Big Query table.
I have uploaded this to my Alteryx Gallery Page and would love to get feedback from any one who uses it in its current configuration. I would appreciate any feedback on how it works as it is and how easy it is to configure. I would also love to get some ideas on how to improve the tool. I already have ideas to automate the table schema file creation, develop separate up-loaders for Big Query and Google Cloud, see if changing the upload query will make it more robust, possibly build in compression, and who knows what else would be useful.
Till next time (and hopefully quicker than the last gap)

Data Engineering with Alteryx

  • Learn DataOps principles to build data pipelines with Alteryx
  • Build robust data pipelines with Alteryx Designer
  • Use Alteryx Server and Alteryx Connect to share and deploy your data pipelines

1 thought on “Getting Data to Big Query”

  1. Hi, Paul, thanks for posting this. I'm looking at the flow, but I can't figure out what "Macro Output (73)" is doing. Selecting it doesn't give any useful configuration options. Is there a dependency I need to install?

    Thanks!

    Reply

Leave a comment