What is the VLOOKUP formula?
Merging records or data tables into a single source is a common process that Analysts have to do every day. In Excel this is often achieved using the VLOOKUP formula. According to the Microsoft documentation you would:
Use VLOOKUP when you need to find things in a table or a range by row.
This means that if you have some information or data (like the prices of product parts) in a separate table or range and you want to match that against your main data table the you can use the VLOOKUP formula:
VLOOKUP (lookup_value, table_array, col_index_num, [range_lookup])
Which translates to: VLOOKUP(What you want to look up, where you want to look for it, the column number in the range containing the value to return, return an Approximate or Exact match – indicated as 1/TRUE, or 0/FALSE).
How a VLOOKUP is done in Alteryx
The most direct comparison of Excel’s VLOOKUP function in Alteryx is the Find Replace tool but it is most often completed with the Join tool. The Find Replace tool allows you to match the contents of one field in your targets against field in the source input.
The common alternative to the VLOOKUP is to use the Join tool. The join tool is more comparable to a Join in SQL for combining two tables. While it is similar to the Find Replace tool (it allows you to append fields based on a matching condition) it is not a direct comparison. However, often when a VLOOKUP is being used it is actually for joining tables together so a Join Tool is the better for the job.
Configuring the Find Replace tool
To achieve a VLOOKUP like configuration in the Find Replace tool, there would be three parameters to match; the Lookup value, the table array, and the column index number. In the Find Replace the Lookup value is the “Find Value” (#2 in image), the Column Index number and table array are defined in the “Append Field(s) to Record” (#3 in image). Unlike the VLOOKUP formula, you are able to return more than one field to each match.
A key difference in the configuration of the Find Replace tool, when compared to the VLOOKUP, is that you have to define the Lookup target. In the VLOOKUP formula this is defined as the first value in the table array, in the Find Replace tool it is the “Find Within Field” (#1 in image). This additional configuration means you can match column with any other column, so the column order is not important.
The Lookup condition could can be defined as matching the entire field, part of the field, or even parts of words. The matches can also be defined as case sensitive, so “The” would be treated as different to “the”. These additional matching options give much more flexibility in how you want do the lookup and what fields get appended or replaced.
The Find Replace tool can also be used to update values within a string based on the match condition. This is the functionality you would expect from the tool’s name.
Configuring the Join Tool
If you were trying to bring two tables together with a VLOOKUP then a join tool will achieve this better. There are a some significant differences in the way the two processes work and because of this you need to approach it slightly differently.
Join Input Configuration
On the Input side you two input anchors, the Left (L) and the Right (R) Input anchor. The left input is typically the primary or larger data set. The right input is what you are trying to merge into the main data set. In Alteryx there is no fundamental difference between the left and right inputs, changing what records are connected to each will just inverted the output based on the which data stream is connected to either the left or right input (I will clarify this later).
Once you have connected the inputs you need to identify how the lookup will take place. In a join tool the value you are matching (the lookup value) in each table is found in the “key fields”. The matching between these key fields must be an identical match, meaning that for string fields the case must match. Additionally, you do have the ability to use multiple fields for the match criteria and, in those cases, it is the combination of the fields together that will deliver the join.
Join Output Configuration
The output of the Join tool is significantly different from what you would see in a VLOOKUP table. Rather than a single unified table with the matched records getting populated and #N/A values when there is no match, the Join tool has three different output streams that give slightly different datasets for further use.
The matched lookups (where records in the key field appear in both datasets) are streamed out of the Join, or J, output anchor. This output would be all the VLOOKUP records where a match has been found.
The Left, or L, output anchor is all the records from the left input (usually your main table) that have no matching records in the Right table, this would represent all the #N/A values you would see in a VLOOKUP.
Finally the Right, or R, output anchor is anything from the right input that doesn’t have a match in the main (left input) table. In this case it would be anything from your VLOOKUP’s table array target, that don’t appear in the main table.
This flexibility of three outputs, means that you have the option to process the different outputs differently.
For example, if your main table (Left input) was a set of Product IDs and description, and the right table was the prices and sales figures for them then you can do different operations for each output. The Join output would allow for further analytics as you were originally planning. The Left output would be all the products with no prices or sales. You could then set up a process for identifying why there were no sales for those items. Finally, the right output is any value that you have prices and sales for but don’t have the product details for. For these records you can implement a system to identify what product information is missing and update the main table. This final case would have been lost from a VLOOKUP process and could result in misreporting if it wasn’t seen or found to be an obvious error.
Configuring the Join Multiple
The last option for doing a VLOOKUP using the Join Multiple tool is the most specialist. The Join Multiple tool is configured the same as a join tool using key fields to match between them but rather than only having two inputs, you can have as many inputs as you want. The key fields that you want to match against must be present in all tables but it does allow you to complete multiple VLOOKUPs (Joins in Alteryx) in a single tool. The main limitation is that the output is only the matches across all tables. So if you are trying to join three tables by product ID, any product IDs that don’t appear in all tables will be lost.
How to choose between the options
Choosing between the three options I have presented will be dependant on what you are trying to achieve. If you are trying to merge a handful of fields to enhance a primary table then a Find Replace is probably your go to tool. Find Replace is also the go to tool if you want the flexibility to match part of a field (like matching a surname to a full name field).
If, on the other hand you want to merge two entire datasets (or at least most of them) the you will probably want to look at the Join tool. This will give you options on how you want to address and manage any records from either table that do not match as well.
Finally, if you have multiple tables you want to bring together and don’t care about any records that are not present across the entire data set, then the Join Multiple is the tool you will want to use.
The other consideration you might consider is the performance differences between the Find and Replace operation compared to the Join operation (used by both the Join and Join Multiple). Once you have a process that is working, and you are trying to optimise the performance of your workflow, then comparing how each tool works when compared to the other can help decide on which would give the best result. There is no hard and fast tool and it will depend on your specific use case and what records are getting matched.
Putting it into practice
Learning the nuances between the different ways of completing VLOOKUPs in Alteryx will take practice but hopefully you now have a staring point on what tools to use. You can have a look at the One tool examples built into Alteryx, those examples (available for many tools in Alteryx) give a good demonstration of the common uses of the tool you are wanting to investigate.
Data Engineering with Alteryx
- Learn DataOps principles to build data pipelines with Alteryx
- Build robust data pipelines with Alteryx Designer
- Use Alteryx Server and Alteryx Connect to share and deploy your data pipelines