Crawling and Parsing the Web
Uri Lavi
How to crawl and parse the web? During the session we will review web parsing technologies and web crawling architectures that will allow us to easily extract required information. The session will outline: Parsers - Methodologies - DOM Streams Regexes Open source solutions Examples Crawling - Policies (selection, revisit, politeness) Web-traps Distributed Architecture