Crawler Python API¶
Getting started with Crawler is easy. The main class you need to care about is
-
crawler.utils.
should_ignore
(ignore_list, url)¶ Returns True if the URL should be ignored
Parameters: - ignore_list – The list of regexs to ignore.
- url – The fully qualified URL to compare against.
>>> should_ignore(['blog/$'], 'http://ericholscher.com/blog/')
True
>>> should_ignore(['home'], 'http://ericholscher.com/blog/')
False
>>> log('http://ericholscher.com/blog/', 200)
OK: 200 http://ericholscher.com/blog/
>>> log('http://ericholscher.com/blog/', 500)
ERR: 500 http://ericholscher.com/blog/
Other directive is testcode
log('http://ericholscher.com/blog/', 500)
That requires separate testoutput
ERR: 500 http://ericholscher.com/blog/
If i add this text and push will it automatically appear in the docs?