A while ago I looked at the tools to get data into Alteryx. While getting data into a workflow is the first step of an analysis the preparation tools is where the real work starts.
The Tools of the Trade
The auto field takes your input data and automatically decides and sets the field to the smallest size and type for what is contained in the column. It examines the records in each field and works out whether the field should be set to a Int16, V_String or any field type in between.
I would caution when using the auto field not to use it in production. It’s a great tool for development workflows or situations where you aren’t sure what the best data type and size is, but it can be pretty slow at times especially on larger data sets. when you head off and use this in production I suggest replacing the auto field with a standard select (see down the page) and hard code the changes that are needed.
The data cleansing tool is pretty new (just added in 10.5) but it is incredibly powerful. The macro applys a range of common data cleansing operations, including: replacing nulls, removing unwanted characters, and modifying the case of text.
Filter & Date Filter
The filter and date filter tools allow you to manage the records included in your data set. The more generic filter tool allows you define any condition to complete the separate your data into two data flows.
This is were a lot of the grunt work of analysis is done. With this tool you can perform any operation that you want to on a single record value. This can also specify different columns, constants, manual entry fields and functions.
The Formula tool also provides the basic interface for any other formula that you might need else where, from filter tools to multi-field tools to row generation, the layout and logic all stem from the Formula tool.
The generate rows tool is great when you have a condition and you want to fill in the gaps. For example, if you had a set of dates when sales happened but you want to fill in the gaps where a date is missing for some reason this is the tool for you.
Using the generate you specify the start of the field (e.g. the first date), what the condition that says you have enough records (e.g. today’s date), and how you are going to progress from one to the other, usually by an increment (e.g. date + 1 day).
The imputation tool is a key tool for managing the appearance of nulls in field. This tool (and its acutally a macro) allows you to update a column and replace any NULL fields with a result of your choice. That choice could be the mean, median, a zero or a value of the users choice. The imputation tool can be used on either numeric or text field
The multi-row formula tool is a hugely powerful tool that allows the referencing of rows (and therefore records) other than the one that it currently being looked at. The multi-row tool has the flexibility to reference any number of rows back or forward from the current record and allows the user to undertake actions based on a grouping field as well.
Random % Sample
The Random % Sample tool is allows the user to reduce the data stream (by sampling the data) and extracting a randomised proportion of the data for further analysis.
The record ID simply adds an incremeting integer value to each record in the data set.
The sample tool
Select tool is all about choosing the fields available for analysis and the form that those fields take. The capability to manually force data types onto a field and modifying the name and description of the field is build into the tool. This doesn’t take into account the contents of any field and will only prevent trying to duplicate field names. No other logic is applied in the Select tool and it is completely manual.
The Sort tool simply applies sorting rules onto the data set. This can be a multi level sorting where the data is sorted by one field, followed by another field and so on until the sort rules are all applied.
The tile tools is a fasinating, and in my experience, under used tool which applies a binning or grouping to each record based on the rules defined.
The unique tool simply takes the first appearance of each unique combination of fields defined in the tool. All other records are filtered out into the Duplicate output.