Thursday, 22 August 2013

Scrapy - accessing spider log within custom middleware

Scrapy - accessing spider log within custom middleware

I've written some custom middleware - a replacement for the
scrapy.contrib.downloadermiddleware.stats module. It works, but I'm having
some oop failure, in that I cannot figure out how to access the project
log from within the module I'm adding.
The log is defined and instantiated in my spider, like this:
import logging
from scrapy.log import ScrapyFileLogObserver
from scrapy import log
# ...other imports here...
class MySpider(BaseSpider):
name = 'myspider'
def __init__(self):
ScrapyFileLogObserver(open('mylogfile.log', 'wb'),
level=logging.DEBUG).start()
The log works fine for everything in my spider's class. My middleware is
in a different class, and a different module. For argument's sake, let's
say both class and module are carbon copies of
scrapy.contrib.downloadermiddleware.stats sitting in my project,
pre-empting the 'factory' one. To get any of the methods to write to the
log, I suspect I would have to do something with this:
@classmethod
def from_crawler(cls, crawler):
if not crawler.settings.getbool('DOWNLOADER_STATS'):
raise NotConfigured
return cls(crawler.stats)
I've tried a lot of different combinations of importing different logging
pieces within the module, then trying self.log.msg or log.msg etc, but I'm
just not hitting on it. I need to log from the process_exception method in
particular. Am also not sure if I need to import all the same logging
stuff again at all. Or maybe I just need to be logging in a completely
different way project-wide (best-practices?).
Last thing. My actual problem is that I need to capture ALL Twisted
exceptions (without knowing beforehand what they might be), and matching
them with metadata that I've sent through in the request. Logging from
stats, and simply logging the actual ex_class string in conjunction with
my metadata seems like the way to go, but maybe there's a better way. I
tried adapting this, but again, I don't know in advance what all the
errors might be, and I also believe that some errors are silently
discarded by Scrapy.

No comments:

Post a Comment