So far in this series we have looked at the tools to get data into Alteryx and to make some changes. In this post we are going to look at the tools to bring many different data sources together. The posts in this series so far give an overview of the data and it is going to provide the basis for a bit of a 101 case study.
So the next step in the Alteryx tools framework is looking at the Join Palette.
So the first tool in the Join palette is the append field. This tool has two input the Target and the Source. What happens is that for every record in the target input all the records in the source will be added. This means that you will duplicate the target input records for each of the source records. This is really useful when you have a large set of records and you need to attach a few records to each entry, for example, you have some demographic information for everyone.
I think the Find Replace tool is hugely under-rated (so much that I forget about it all the time) but it is incredibly powerful. This allows you to search through fields looking for specific phrases, either in part, whole or just at the beginning of the field. Once the record has been matched you have the choice to either replace the found phrase with a chosen field from the replacement input. You can also chose to append fields from the replace input to each record.
The fuzzy matching is a pretty complex and useful way of matching a single stream of records and finding other records that are mostly the same. This tool on its own deserves more explanation and time than this short paragraph so I will leave that for another day. There is a really good video here by Chris Love that walks through the a way to find all the duplicated records in a list and many of the settings needed.
The make group tool is one that I don’t use that often and I would love to hear from anyone who gets much use from it. What the tool does is it creates a new grouping for records based on the combination of two key fields. I figure the best way to see this is from the Alteryx Online Help
The Make Group tool takes data relationships and assembles the data into groups based on those relationships. For instance:
Key_1 Key_2 X A Y B Z A B X L M
So, Group A would contain: A, B, X, Y, Z because X=A, B=X, Z=A, and Y=B
L and M would be their own group, L, as they do not relate to the other values in Group A.
The last three tools in the join palette are the meat and potatoes of data blending. Join, Join Multiple and Union.
The join tool takes 2 data streams and joins them together based on a set of key fields you define. This results in a wider table with more fields available. There is a really good post over on the Information lab blog (click the image below) which goes through all the different configurations for joins in Alteryx and Tableau. The key thing to remember is that Alteryx creates a ‘pure’ left or right join. This means that only it only finds records that are unique to each data stream (based on the key fields you define).
The Join Multiple tool extends on the normal join tool by taking the same output you would get from the inner join option but allowing you to do an inner join of 2 or more data streams. There is also the option to make the join based on record position (i.e. the first record of each stream is matched to the first of the others and so on) or by creating a Cartesian Join were every record from every data stream is matched to every other stream record.
The last tool in the Join palette is the Union tool. This tool allows you to bring multiple data streams together (similar to the join multiple) by stacking matching columns on top of each other. If there is a column which doesn’t appear in the other data streams it will return a warning message and populate all other records with Nulls.
Records are stacked on top of each other to make your data set longer. You can chose to match the columns automatically by name (so when the column names match including case), by position (the first column in each data stream is match to each other an so on) or you can match them manually defining that the what columns match to which columns.
So that covers off the first few tool palettes that are the keys to getting started with Alteryx. Next time in this series I’m going to start looking at a case study on how to actually create an analysis in Alteryx. I think the best way learn how to use Alteryx is to follow actual use cases and to understand they why each tool is used in each place.