Israeli Developers Community Conference 2009


Crawling and Parsing the Web

Uri Lavi Uri Lavi
How to crawl and parse the web? During the session we will review web parsing technologies and web crawling architectures that will allow us to easily extract required information. The session will outline: Parsers - Methodologies - DOM Streams Regexes Open source solutions Examples Crawling - Policies (selection, revisit, politeness) Web-traps Distributed Architecture