You can do it! Enhance your projects with XPath, regex and more
Mozenda is designed to allow almost anyone to harvest data from the internet—regardless of their technical skill or experience with coding. However, a basic understanding of a few common web languages can help you create more precise agents and tackle powerful and complex projects.
Hypertext Markup Language (HTML) creates the structure or outline of a website. HTML uses nested sections to define and organize the content of the site.
Cascading Style Sheets (CSS) is a language used to define a website's visual presentation, including its layout, fonts, and colors.
- Load new content without reloading a page
- Animate text or images on a page
- Record and transmit user actions for web analytics
- Enable interactive elements like games and media players
The XML Path Language (XPath) pinpoints locations in the pages (HTML) your agents interact with.
Mozenda automatically generates XPath expressions to identify the items you select in the Agent Builder. Most of the time, these paths work perfectly—depending on the complexity of the target website, however, there may be times when it's helpful to edit an existing path or create a new one.
Learn more here.
Text refinement & regular expression
After you've zeroed in on the text you want to harvest, you have the option of refining the format of the text. Mozenda features a built-in text refinement tool and also supports regular expression (often called regex).
You can use these tools to identify patterns, change the display format, and remove excess data.
Learn more about regex here.