Introduction
Hi, I am Akira, the editor-in-chief of Data Without Code. In our previous tutorial, we mastered how to clean and format messy text using the String Manipulation node. Your dataset is now perfectly standardized.
But having clean data is only the beginning. Eventually, your boss or client is going to ask you a very common question: “Who are our top 10 customers this month?” or “What are our 5 worst-performing products?”
If you are still using Excel, you know the drill: Apply an auto-filter, sort the revenue column from largest to smallest, highlight the top 10 rows, copy them, and paste them into a new presentation. Doing this manually every week is tedious and prone to human error.
As a DX manager, I want to show you how to put this reporting task on autopilot. In this tutorial, we will learn how to sort data and automatically extract the Top N records using two brilliant nodes in KNIME Analytics Platform.
1. How to Sort Data (The Sorter Node)
Before you can find the top records, you usually need to sort them. In KNIME, this is incredibly straightforward using the Sorter node.
Drag the Sorter node from your Node Repository and connect it to your dataset. Double-click to open the configuration window.
- In the “Sort by” box, select the column you want to evaluate. For example, select “Total Revenue”.
- Choose your order. Select Descending if you want the highest numbers at the top (to find your best customers), or Ascending to put the lowest numbers at the top (to find your worst performers).
- Bonus Tip: You can sort by multiple columns! Click the “Add Rule” button. You could sort first by “Region” (Ascending: A to Z), and then by “Total Revenue” (Descending: Largest to Smallest) to see the best customers perfectly organized by region.
Execute the node, and your entire dataset is now perfectly ordered. But what if you only want to keep the top 10 rows and delete the rest?
2. How to Find the Top N Records (The Top k Selector Node)
You could theoretically use a Row Filter node to keep only rows 1 through 10, but KNIME has a dedicated node that does the sorting and the filtering all in one single step.
It is called the Top k Selector node. (In data science, “k” is just a mathematical variable that means “a specific number.” So “Top k” just means “Top N” or “Top 10”).
Step-by-Step Configuration
- Add the Top k Selector node to your canvas and connect it to your raw (unsorted) data.
- Double-click to open it. In the top section, type the number of rows you want to keep. If your boss asked for the Top 10 customers, type
10. - In the bottom section, select the criteria. Choose your “Total Revenue” column and set it to Descending.
Execute the node. When you right-click and view the output table, KNIME has instantly scanned your massive dataset, sorted it in the background, and output exactly 10 rows. Your report is done in one click!
Why Automating This Matters
In Excel, sorting is a static action. If you add new data next week, you have to remember to click “Sort” again.
In KNIME, the Top k Selector node acts as a permanent rule. When you automate your workflow to combine new Excel sheets next month, this node will automatically evaluate the new data, sort it, and spit out the new Top 10 list without you ever touching the mouse.
Conclusion: Your Next Steps
Congratulations! You have just learned how to automatically generate “Top N” leaderboards for your business reports. Whether you are extracting the top 5 sales reps or the top 20 best-selling products, the Top k Selector is your best friend.
We are getting very close to mastering the Data Prep module. We have handled text, numbers, missing values, and sorting. But there is one data type that strikes fear into the hearts of all Excel users: Dates.
Different formats, international time zones, and extracting specific months can be a nightmare. Ready to conquer time itself? Join me in our next tutorial where I show you date and time manipulation in KNIME: Extracting months and years effortlessly.
