One of the most fundamental actions an analyst can perform is aggregating their data. Simplifying large data sets to support specific investigations and helping decision-makers to understand the overall trends in their business area are central activities that a data analyst will deliver.
To Aggregate data in Alteryx, you will generally be working with two tools, the Summarize tool or the CrossTab tool. The Summarize tool will provide the best flexibility and will be the go too for most aggregation activities.
The CrossTab tool requires aggregation when you can leverage the pivot activities for your analysis. The CrossTab is also a requirement for creating a Pivot Table Report Output.
The summarize tool
The summarize tool is the primary method for aggregating data sets. When configuring the summarize tool, there are two parts of the process: the Grouping and Aggregation options.
The above image shows the top of the Summarize configuration. In this section, you can either select fields individually, select multiple fields by holding the CTRL or SHIFT button or select all fields of a particular type using the “SELECT” drop-down in the top right corner.
Once fields are selected, you decide what action to apply with the “ADD” drop-down (seen in the following image). The first option in the drop-down is “Group By”. This identifies the key fields that the aggregations are calculated over. If you Group By multiple fields, and there is no limit to the number of fields you group by, Alteryx will calculate every unique combination of the selected fields.
The next set of entries (up to the horizontal divider seen in the image above) is the aggregation type you can apply to most fields.
The count options all provide record-counting aggregations. Any count with “Non Null” in the name will only skip any records that do not have a definite value in the target field. The “Distinct” suffix allows counting unique entries in a particular field.
The next four options, Min, Max, First and Last, provide aggregations that can be applied to any field, irrespective of the data type.
Once these global options are understood, the bottom half of the “Add” menu lists the aggregations available for each data type. The Alteryx documentation details every aggregation option available, including any optional parameters.
Numeric fields have several different aggregation options, split into two Menu lists; Finance and Numeric
The finance aggregation options apply common financial calculations across the dataset. For example, the “Net Present Value w/ Dates (XNPV)” will calculate the net present value with a configurable Finance Rate (which defaults to 8%) and apply this over a Date Field you define (the default is the first date type field).
The optional parameters for any aggregation always appear at the bottom of the configuration window. For example, in the image above, you can configure the XMPV properties from the bottom of the configuration window.
The second menu is for general numeric aggregations, including calculations like the Average (mean), Median or Standard Deviation. When looking at the aggregations, it is critical to note that Null values are excluded from the result. If you want Nulls included in your aggregation then you need to complete some pre-processing to impute the missing values (possibly with a zero), or you would need to manually calculate the aggregation by using first principles (Average (mean) = sum of all records/number of records)
When you try to aggregate string fields, you have different options. First, there are a couple of String-specific Count options where you decide how to count Blank values.
The other options are merging all the individual records (concatenate), identifying the most common value (mode) or the longest or shortest value. With these options, you can aggregate the string fields by what is the most useful for you.
Alteryx has always had a robust set of spatial operators, and the Summarize field leverages this capability in analytics.
Within the spatial options, you can merge spatial fields, identify the central point of the aggregated fields and simplify the spatial information to show the general area covered in the aggregated records.
One reason you might do this is, after the aggregation, you can pass that spatial field into a Spatial Info tool to understand the geographic coverage of a particular pattern of use.
The Other Special Fields
You can also aggregate outside the main numeric aggregations based on the behaviour records (after using any of the tools in the behavior analysis tool pallet) or any Report records.
In these two situations, you can combine the individual records for a group into a single summarised value.
The CrossTab Tool
The second method for aggregating is to use the CrossTab tool takes a different approach to aggregating as it will also reshape the data to generate a pivot or text table. This is needed to create a pivot table in the Table Report tool for export into reports or final outputs.
You configure the CrossTab tool by setting parameters in the three sections in the image below. The first section covers the Group By field. This section performs the same operation as the Group By from the summarize tool. This section is where you set the key fields that will repeat for each unique combination of grouped records.
The second section is where you select the new headings and the values that will be aggregated. Each unique value will create a new column for the heading field, effectively creating an additional grouping field. The value option will let you choose the field you want to aggregate. A limitation of using the CrossTab tool is that you can only aggregate one field at a time. Of course, the CrossTab tool is used in specific applications, especially when aggregating a single field.
The final section allows you to choose how you want to aggregate. If you are Cross Tabbing a numeric field, you can choose between Sum, Average, Count (with or without Nulls), and Percent row or column. The percent options will be a percent of total calculation for your chosen direction.
If you are aggregating for any data type other than numeric, you only have the three options of Concatenate, First or Last. These options behave the same as they do in the summarize tool.
“Summing Up” the Benefits of Aggregation in Alteryx
Aggregation is a fundamental technique for data analysts to simplify large datasets and identify overall trends. Alteryx provides two tools to perform aggregation: the Summarize tool and the CrossTab tool. While the Summarize tool is the primary method for aggregating data sets, the CrossTab tool is essential for generating pivot tables. Both tools provide flexible grouping and aggregation options, with the Summarize tool allowing for greater customization. Aggregation options vary based on data types, with numeric fields offering financial and general aggregations and string fields offering concatenation, mode, and longest/shortest value aggregations. Alteryx also provides spatial aggregations and aggregations based on behavior and report records. By leveraging Alteryx’s aggregation capabilities, analysts can gain valuable insights into their data and produce high-quality reports.
I hope this blog post has provided a clear understanding of performing aggregation in Alteryx using the Summarize and CrossTab tools. If you have any feedback or other methods for aggregation that you would like to share, please don’t hesitate to message me on LinkedIn. I am always eager to learn from other Alteryx users and improve my skills. Thanks for reading!