Introduction
Hi, I am Akira, the editor-in-chief of Data Without Code. In our previous tutorial, we learned how to split text into columns using the Cell Splitter node. Now that your data is perfectly clean and categorized, it is time for the fun part: summarizing it.
If you come from an Excel background like I did, you already know the holy grail of data analysis: The Pivot Table. It is the fastest way to turn a massive list of transactions into a neat summary of sales by region, product, or month.
But Excel pivot tables have a major flaw. Every time you get new data, you have to change the data source range, click “Refresh,” and pray that your formulas don’t break. As a DX manager, I can tell you that manual refreshing is the enemy of automation.
In this guide, I will show you how to recreate the magic of Excel Pivot Tables in KNIME Analytics Platform using two essential tools: the GroupBy node and the Pivoting node.
Excel Pivot Table vs. KNIME Logic
In Excel, a Pivot Table handles everything in one menu: Rows, Columns, and Values. In KNIME, we split this functionality into two different nodes depending on what you want your final table to look like.
- If you only need Rows and Values (e.g., Total Sales per Region), you use the GroupBy node.
- If you need a 2D matrix with Rows, Columns, and Values (e.g., Total Sales per Region, broken down by Month across the top), you use the Pivoting node.
1. The Simple Summary: GroupBy Node
We briefly used the GroupBy node when we built our RFM customer segmentation model. It is the most frequently used data aggregation tool in KNIME.
How to Configure the GroupBy Node
- Drag the GroupBy node onto your canvas and connect it to your dataset.
- Double-click to open it. The first tab is Groups. This is exactly like the “Rows” box in an Excel Pivot Table. Select the column you want to group by (e.g., “Region”) and move it to the right side.
- Go to the second tab: Manual Aggregation. This is your “Values” box in Excel. Select the column you want to calculate (e.g., “Revenue”), move it to the right, and change the aggregation method to Sum, Mean (Average), or Count.
Execute the node. You instantly have a clean, one-dimensional summary table. Simple, right?
2. The True Matrix: Pivoting Node
Now, let’s say your boss wants to see the “Region” on the left side (Rows), the “Product Category” across the top (Columns), and the “Total Revenue” in the middle (Values). For this, we need the true Pivoting node.
How to Configure the Pivoting Node
Search for the Pivoting node in your Node Repository, connect it to your data, and open the configuration window. You will notice it has three main tabs that perfectly match Excel’s layout:
- Groups: Select your Row headers here. (e.g., Move “Region” to the right).
- Pivots: Select your Column headers here. (e.g., Move “Product Category” to the right). KNIME will create a new column for every unique category it finds.
- Manual Aggregation: Select your Values here. (e.g., Move “Revenue” to the right and set it to Sum).
Click OK and press F8 to execute. When you right-click and view the “Pivot table” output, you will see a beautiful 2D matrix, perfectly summarized.
Why KNIME Pivots Are a Game Changer
Why go through the trouble of learning this when Excel already does it? Total Automation.
Once you configure the GroupBy or Pivoting node, the rule is locked in. Next week, when you drop a new file with 500,000 rows into your workflow, KNIME will automatically aggregate the data, pivot it, and spit out the final report in seconds. You never have to click “Refresh” or adjust a data range ever again.
Conclusion: Your Next Steps
Mastering the GroupBy and Pivoting nodes is a huge milestone. You can now slice, dice, and summarize millions of rows of data instantly without crashing your computer.
However, when you use the Pivoting node, you might notice something annoying. If a certain Region didn’t sell a specific Product Category, KNIME puts a red question mark (?) in that cell. In the data world, this is called a Null (Missing Value), and it can break your future calculations if you don’t handle it properly.
Ready to clean up those pesky question marks? Join me in our next tutorial where I explain exactly how to handle missing values (Nulls) in KNIME to keep your data perfectly clean!
