- Print
- DarkLight
- PDF
You can do it! Enhance your projects with XPath, regex and more
Mozenda is designed to allow almost anyone to harvest data from the internet—regardless of their technical skill or experience with coding. However, a basic understanding of a few common web languages can help you create more precise agents and tackle powerful and complex projects.
Website elements
Almost every website on the internet uses some combination of three cornerstone languages: HTML, which defines the structure of the site, CSS, which defines the appearance of the site, and JavaScript, which defines the behavior of the site.
HTML
Hypertext Markup Language (HTML) creates the structure or outline of a website. HTML uses nested sections to define and organize the content of the site.
CSS
Cascading Style Sheets (CSS) is a language used to define a website's visual presentation, including its layout, fonts, and colors.
JavaScript
JavaScript (often referred to as JS) governs how a website behaves and interacts with visitors. JS scripts can do things like:
- Load new content without reloading a page
- Animate text or images on a page
- Record and transmit user actions for web analytics
- Enable interactive elements like games and media players
XPath
The XML Path Language (XPath) pinpoints locations in the pages (HTML) where your agents interact.
Mozenda automatically generates XPath expressions to identify the items you select in the Agent Builder. Most of the time these paths work perfectly, depending on the complexity of the target website. However, there might be times when it's helpful to edit an existing path or create a new one.
Learn more here.
Text refinement & regular expression
After you've zeroed in on the text you want to harvest, you have the option of refining the format of the text. Mozenda features a built-in text refinement tool and also supports regular expression (often called RegEx).
You can use these tools to identify patterns, change the display format, and remove excess data.
Learn more about RegEx here.