You can do it! Enhance your projects with XPath, regex and more
  • 25 May 2021
  • 1 Minute to read
  • Contributors
  • Dark
    Light
  • PDF

You can do it! Enhance your projects with XPath, regex and more

  • Dark
    Light
  • PDF

Article Summary

Mozenda is designed to allow almost anyone to harvest data from the internet—regardless of their technical skill or experience with coding. However, a basic understanding of a few common web languages can help you create more precise agents and tackle powerful and complex projects.

Website elements

Almost every website on the internet uses some combination of three cornerstone languages: HTML, which defines the structure of the site, CSS, which defines the appearance of the site, and JavaScript, which defines the behavior of the site.

HTML

Hypertext Markup Language (HTML) creates the structure or outline of a website. HTML uses nested sections to define and organize the content of the site.

CSS

Cascading Style Sheets (CSS) is a language used to define a website's visual presentation, including its layout, fonts, and colors.

JavaScript

JavaScript (often referred to as JS) governs how a website behaves and interacts with visitors. JS scripts can do things like:

  • Load new content without reloading a page
  • Animate text or images on a page
  • Record and transmit user actions for web analytics
  • Enable interactive elements like games and media players

XPath

The XML Path Language (XPath) pinpoints locations in the pages (HTML) where your agents interact.

Mozenda automatically generates XPath expressions to identify the items you select in the Agent Builder. Most of the time these paths work perfectly, depending on the complexity of the target website. However, there might be times when it's helpful to edit an existing path or create a new one.

Learn more here.

Text refinement & regular expression

After you've zeroed in on the text you want to harvest, you have the option of refining the format of the text. Mozenda features a built-in text refinement tool and also supports regular expression (often called RegEx).

You can use these tools to identify patterns, change the display format, and remove excess data.

Learn more about RegEx here.


Was this article helpful?