home | changes | index | login

Screen Scraping

Everyone uses RegEx to match and pull text out of web documents. This usually works but is painful and tends to produce collections of special cases rather than generalizing well. The 34th time you do it is nearly as much effort as the first, though by that time you hate yourself a lot more for it. Open question: Since HTML generally has enough structure to display, and since display appearance is often the targetted layer of structure (people tend to encode this kind of meaning for other people, not for machine readability), shouldn't there be a graceful way to do this from the browser?

<Brennen> One conclusion is that I probably need to learn JavaScript.

pick a name (required to comment or edit a page)
last edited March 6, 2007