Crawler Python API

Getting started with Crawler is easy. The main class you need to care about is

crawler.utils.should_ignore(ignore_list, url)

Returns True if the URL should be ignored

Parameters:
  • ignore_list – The list of regexs to ignore.
  • url – The fully qualified URL to compare against.
>>> should_ignore(['blog/$'], 'http://ericholscher.com/blog/')
True

>>> should_ignore(['home'], 'http://ericholscher.com/blog/')
False

>>> log('http://ericholscher.com/blog/', 200)
OK: 200 http://ericholscher.com/blog/

>>> log('http://ericholscher.com/blog/', 500)
ERR: 500 http://ericholscher.com/blog/

Other directive is testcode

log('http://ericholscher.com/blog/', 500)

That requires separate testoutput

ERR: 500 http://ericholscher.com/blog/

If i add this text and push will it automatically appear in the docs?