Last week, I wrote a post showing how the Auto Field tool could be used to reduce the size of data sets. This week, I’m going to show how the same tool can be used to programmatically remove columns with zeros or nulls. This can also help reduce the size of your data sets and get rid of columns. Read on to learn how.
Blog or Video
My Use Case & Workflow
When drilling and completing wells, we receive data from service providers in CSV format. The CSVs and their schemas (i.e. columns) differ by vendor but need to be combined into one table. Last week, I built a workflow that merged all the CSV files, and as expected, my final table contained over 400 columns. Many of those columns contain only zero or null values, and the entire column can be removed. However, I have no way of knowing which columns will be null or zero, so I needed a programmatic way to remove them. This week’s post will show you how to do that with the Auto Field and Dynamic Select tools.
The screenshot below shows my workflow, and the tools we’ll step thru in detail are highlighted in red. They are (in order) a Multi-Field Formula tool, Auto Field tool, Select tool, (skip the DateTime tool), Dynamic Select tool, and a Multi-Field formula tool. The Select tool at the end is optional just to see how data types have changed.
Why the Auto Field Tool?
The Auto Field tool reads through all records, and for a given column, it sets the field type to the smallest possible size relative to the data contained within the column. While that doesn’t seem helpful, what you might not know is that if all of the values are null, it will set the data type to Boolean. From there, you can use the Dynamic Select tool to remove all Boolean data type fields. And that will programmatically remove all columns where the values are all null. To also include zeros in this process, review the detailed steps below.
The Detailed Process
There are 5 steps in this process, and the configuration for each tool is shown below with an explanation.
- Use the Multi-Field Formula tool to convert all zeros to null.
- Apply the Auto Field tool to change the data type.
- Use the Select tool to review the data types.
- Use the Dynamic Select tool to remove all Boolean data type fields.
- Insert another Multi-Field Formula tool to convert the remaining null back to zero.
The Multi-Field formula tool converts zero values to null. They need to be null for the Auto Field tool to do it’s thing. Now, you’ll notice that I have selected all Texts fields and that my expression accounts for zero in a few different ways. Why did I do that? Why didn’t I convert these columns to numbers first to make this easier and more foolproof?
I had to do it this way for two reasons. First, when reading data from CSVs, Alteryx imports all columns as String data types. I left them as strings because, as you’ll see in the screenshot below, the Auto Field tool only converts string data types. If I changed the data type to numbers, the process wouldn’t work.
Then, use the Auto Field tool on all columns….and I have a lot of columns. (See how it says Select String Fields to Auto Change Field Type).
Next, the Select tool after the Auto Field tools to confirms the strings converted to numbers. If a column contained only null values, the data type was converted to a Boolean field.
Now, the Dynamic Select tool is configured to select all field types except Boolean, so the Boolean columns will be removed.
Lastly, I use another Multi-Field Formula tool, this time selecting only Numeric fields, to replace all nulls with zeros. Since the fields are numeric, my zero is no longer in quotes.
Of course, if you have valid Boolean data type columns, you’ll want to perform this process in a separate stream of the workflow where they are not included. If you don’t have any Boolean data type columns, this will work perfectly.
And that is how you can programmatically remove columns with zeros or nulls from an Alteryx workflow. Next week, I’ll explain how I used a File Directory tool and a Batch macro to merge all those CSVs with different schemas.
Content created with Alteryx 2020.3.
Pingback: Runtime Settings To Make Workflows Run Faster » The Analytics Corner
Pingback: Using An Output as An Input in Alteryx » The Analytics Corner
Pingback: 3 Tools to Start & End Every Alteryx Workflow » The Analytics Corner
Pingback: How to Fix Query Timeouts in Alteryx » The Analytics Corner