A dict containing the item pipelines to use, and their orders. Order values are arbitrary, but it is customary to define them in the 0-1000 range. Lower orders process before higher orders. File name to use for logging output. If None, standard error will be used. Refer to the Python logging documentation for the whole list of available directives.

The class to use for formatting log messages for different actions. Minimum level to log. Available levels are: CRITICAL, ERROR, WARNING, INFO, DEBUG. For more info see Logging. If True, all standard output (and error) of your process will be redirected to the log. For example if you print('hello') it will appear in the Scrapy log. If True, the logs will just contain the root path. If it is set to False then it displays the component responsible for the log output. The interval (in seconds) between each logging printout of the stats. When memory debugging is enabled a memory report will be sent to the specified addresses if this setting is not empty, otherwise the report will be written to the log.

This extension keeps track of a peak memory used by the process (it writes it to stats). See Memory usage extension. If zero, no check will be performed. If zero, no warning will be produced. Module where to create new spiders using the genspider command.

This randomization decreases the chance of the crawler being detected (and subsequently blocked) by sites which analyze requests looking for statistically significant similarities in the time between their requests. The randomization policy is the same used by wget --random-wait option. The maximum limit for Twisted Reactor thread pool size. This is common multi-purpose thread pool used by various Scrapy components. Threaded DNS Resolver, BlockingFeedStorage, S3FilesStore just to name a few.

For more information see RobotsTxtMiddleware. While the default value is False for historical reasons, this option is enabled by default in settings. The parser backend to use for parsing robots.txt. The user agent string to use for matching in the robots. Scheduler. Setting to True will log debug information about the requests scheduler.

This currently logs (only once) if the requests cannot be serialized to disk. Type of disk queue that will be used by scheduler. Other available types are scrapy. Type of memory queue used by scheduler. Other available type is: scrapy. Type of priority queue used by the scheduler. Another available type is scrapy. DownloaderAwarePriorityQueue works better than scrapy.

ScrapyPriorityQueue when you crawl many different domains in parallel. While the sum of the sizes of all requests being processed is above this value, Scrapy does not process new requests. A dict containing the spider contracts enabled in your project, used for testing spiders. For more info see Spiders Contracts. The class that will be used for loading spiders, which must implement the SpiderLoader API.

Some scrapy commands run with this setting to True already. A dict containing the spider middlewares enabled in your project, and their orders. For more info see Activating a spider middleware.

Low orders are closer to the engine, high orders are closer to the spider. The class to use for collecting stats, who must implement the Stats Collector API. Dump the Scrapy stats (to the Scrapy log) once the spider finishes. For more info see Stats Collection. See StatsMailer for more info.



