String manipulation in KNIME: Replace extract and format text

Introduction

Hi, I am Akira, the editor-in-chief of Data Without Code. In our previous tutorial, we covered how to handle missing values (Nulls) to keep your data calculation-ready. Now that your empty cells are fixed, we need to talk about the data that is there—specifically, messy text data.

If you have ever used Excel’s LEFT, RIGHT, MID, or SUBSTITUTE formulas to extract a specific product code or fix a misspelled city name, you know how frustrating it can be when a formula breaks because of one extra space.

As a DX manager, I used to waste hours writing nested text formulas. But in KNIME, you can do all of this in a much cleaner, visual way without writing complex code. In this tutorial, we will master the String Manipulation node to replace, extract, and format text effortlessly.

The Swiss Army Knife of Text: String Manipulation Node

In KNIME, a “String” simply means text. The String Manipulation node is the ultimate tool for modifying text columns. Instead of using five different Excel formulas, this single node contains dozens of built-in functions that you can just click and apply.

Drag the node onto your canvas, connect it to your data, and double-click to open it. You will see your columns on the left, a list of functions in the middle, and an “Expression” box at the bottom where the magic happens.

Let’s look at the three most common text operations you will use in business automation.

1. Replacing Text (The SUBSTITUTE Alternative)

Imagine your CRM exported data with product codes like PROD-1234-XYZ, but your boss wants the hyphens (-) replaced with underscores (_).

In the function list, double-click the replace() function. It will appear in your Expression box. The syntax is simple: replace(string, search, replace).

  1. Replace the word “string” by double-clicking your “Product Code” column from the left panel.
  2. Replace “search” with "-" (the character you want to find).
  3. Replace “replace” with "_" (what you want to change it to).

Your final expression will look like this: replace($Product Code$, "-", "_"). Choose whether to append a new column or replace the existing one, and execute!

2. Extracting Text (The LEFT/RIGHT/MID Alternative)

What if you only want the first 4 letters of a customer’s ID? In Excel, you would use LEFT(A2, 4). In KNIME, we use the substr() function.

The substr() function lets you extract text by defining the starting position and the length. Remember, computers start counting at zero!

To get the first 4 characters of a “Customer ID” column, your expression will be: substr($Customer ID$, 0, 4). This tells KNIME: “Start at the very beginning (0) and grab the next 4 characters.”

3. Formatting Text (The UPPER/LOWER/PROPER Alternative)

Customer names submitted via web forms are notoriously messy. Sometimes they type “JOHN SMITH”, sometimes “john smith”. To make your reports look professional, you need to standardize the capitalization.

KNIME makes this incredibly easy with one-click functions:

  • To make everything uppercase: upper($Customer Name$)
  • To make everything lowercase: lower($Customer Name$)
  • To capitalize just the first letter of each word (Proper case): capitalize($Customer Name$)

Nesting Functions: The Ultimate DX Hack

The true power of the String Manipulation node is that you can combine (nest) these functions together in a single step.

For example, if you want to remove invisible spaces from a name AND make it uppercase at the same time, you just wrap them like an onion. We briefly touched on this in our early data types and string cleaning guide.

The expression looks like this: upper(strip($Customer Name$)). KNIME will first strip the spaces, then convert the result to uppercase. One node, multiple cleanups!

Conclusion: Your Next Steps

Congratulations! You just replaced the most annoying Excel text formulas with a single, highly visual KNIME node. By standardizing your text data, you guarantee that your future VLOOKUPs (Joiner nodes) and GroupBy summaries will be 100% accurate.

Now that your data is clean, formatted, and calculation-ready, how do you present it? Sometimes, your boss doesn’t want to see a million rows—they just want to see the Top 10 Customers or the Top 5 Products.

Are you ready to filter for the best of the best? Join me in our next tutorial where I will show you exactly how to sort data and find the Top N records in KNIME!

Copied title and URL