Introduction
Hi, I am Akira, the editor-in-chief of Data Without Code. Over the last few tutorials, we have built some incredibly powerful tools. You now know how to combine multiple files, clean the data, and even automate email sending directly from a KNIME workflow.
But all of these workflows assume one thing: you already have the data in a CSV, Excel file, or Google Sheet.
What happens when the data you need is trapped on a public website? For example, monitoring competitor pricing, extracting financial tables from Yahoo Finance, or gathering a list of real estate properties. If you are still highlighting text in your browser, hitting Ctrl+C, and pasting it into Excel, you are trapped in the ultimate manual data entry nightmare.
In the tech world, automating this is called Web Scraping. People assume you need to learn Python to do this, but as a DX manager, I am here to tell you that you don’t. In this Automation Hack, I will show you the basics of how to extract data from a website using KNIME Analytics Platform.
Step 1: Install the Web Scraping Extensions
To communicate with websites properly, we need to add a few special “Lego blocks” to our workspace. If you remember my guide on how to install extensions in KNIME Analytics Platform, you will need to open the installation menu and search for two things:
- KNIME REST Client Extension (Contains the GET Request node).
- KNIME XML-Processing (Contains the XPath node).
Install these, restart your KNIME, and you are ready to scrape the web!
Step 2: Download the Webpage (GET Request Node)
When you visit a website in your browser, your computer sends a request to a server, and the server sends back the website’s code (HTML). We need KNIME to do exactly the same thing.
Search for the GET Request node and drag it onto your canvas. Double-click to open the configuration.
In the URL box, simply paste the web address of the page you want to scrape (e.g., https://example.com/pricing). Click OK, and press F8 to execute.
When you view the output table, you will see a single row. One of the columns will contain a massive block of confusing text. That is the raw HTML code of the website! KNIME has successfully downloaded the page.
Step 3: Extract the Data You Need (XPath Node)
Now that we have the entire webpage inside KNIME, we need to extract the specific piece of data we want (like a price or a product name) from that giant block of code.
We do this using the XPath node. XPath is simply a map that tells the computer exactly where to look inside the HTML code.
Akira’s DX Hack: Getting the XPath Without Coding
You do not need to learn how to write XPath. Here is the ultimate non-programmer trick:
- Open the website in your Google Chrome browser.
- Right-click on the exact piece of data you want (e.g., the $99 price tag) and select “Inspect” (要素を検証).
- A panel will open highlighting a specific line of code. Right-click that highlighted code, select “Copy”, and then click “Copy XPath”.
Now, go back to KNIME, connect your XPath node to your GET Request node, and double-click to configure it. Click “Add XPath”, paste the code you just copied from Chrome into the XPath box, and choose the output type as “String”.
Execute the node. Boom! KNIME instantly searches the messy HTML and outputs a clean new column containing only the $99 price tag.
Conclusion: Your Next Steps
Congratulations! You have just executed your first web scraping workflow without writing a single line of Python. By combining the GET Request and XPath nodes, you can pull text, links, and prices from almost any standard website.
(Disclaimer: Always check a website’s Terms of Service before scraping, as some sites explicitly forbid automated data extraction!)
Now you know how to get data from local folders, Google Sheets, and the public internet. But what about the heart of your own company’s IT infrastructure?
If your company stores its official data in a secure, internal database (like MySQL or PostgreSQL), you don’t need to beg the IT team for a CSV export every week. Join me in our next tutorial where I will show you connecting KNIME to SQL databases without code!
