I love that I can run huge amounts of data thru Alteryx. However, I also have to think about how much data I’m going to take from Alteryx and push to a BI application like Spotfire or PowerBI. As a general rule, less is more. Now, you may think you can’t do anything to reduce the size of your data set without cutting columns or rows, but that’s not true. Read on to learn two quick tricks to materially reduce the size of your data set.
Blog or Video?
If you prefer watching to reading, you can find this same content on my YouTube channel. If you like the content, spread it around for the rest of the world to learn from. Please subscribe and share!
For this post, we’ll work with a small set of tools, and I want you to pay particular attention to the size of the output. If you don’t see the size when running your own workflows, click on the canvas, go to the Canvas tab and choose Show from Connection Progress.
First, I use a Select tool to rename and change data types. 1.3 GB of data exit the tool. Then, I perform a calculation that increases the size of the data set from 1.3 GB to 1.5 GB. Next, let’s see what I do to cut that down.
Tip No 1 – Trimming
First, I use a Multi-Field Formula tool to efficiently trim away whitespace from text fields. I use the Multi-Field formula tool instead of a regular Formula tool because I can apply an expression to all of my Text fields. This tool allows me to select all Text fields (including Unknown fields) and then apply an expression with the Trim function. The Trim function removes whitespace from the start and end of the string.
I used to use the Data Cleansing tool for this purpose, but that tool is unreliable because you cannot select Dynamic or Unknown fields. Furthermore, if you rename fields, they are no longer selected in the Data Cleansing tool. I got so tired of going back to the Data Cleasing tool over and over to update it that I have stopped using it altogether. Anyway, trimming whitespace takes .1 GB off the size of my data set. That’s not a lot, but it’s something. Now, let’s take a look at what I do with the Auto Field tool.
Tip No 2 – Auto Field
The Auto Field tool is easily overlooked by new users, but it is incredibly helpful in several ways (and there will be future blog posts on it). This tool reads through all the records, and for a given column, it sets the field type to the smallest possible size relative to the data contained within the column.
You can see that 1.4 GB of data flows into the Auto Field tool, but only 936 MB of data flows out. The tool shaved off almost half a gig just by shortening the field size.
And, compare the field sizes in the first Select tool to another Select tool placed after the Auto Field tool. See how they have shrunk!
Lastly, I just want to note that I use trim and Auto Field at the start and end of all my workflows, since new columns are usually calculated during workflows that might not be included in this process at the start. Now, you know two ways to reduce the size of data sets in Alteryx!
Content created with Alteryx 2020.3.