Management of multiple crawlers of the scrapy by using the scrapyd

Note: Environmental preparation is based on Ubuntu16.04

1, Installation

  1. sudo pip install scrapyd
  2. sudo pip install scrapyd-client

2, Verification

Command line input: scraper
The output is as follows to indicate successful opening:

bdccl@bdccl-virtual-machine:~$ scrapyd
Removing stale pidfile /home/bdccl/twistd.pid
2017-12-15T19:01:09+0800 [-] Removing stale pidfile /home/bdccl/twistd.pid
2017-12-15T19:01:09+0800 [-] Loading /usr/local/lib/python2.7/dist-packages/scrapyd/txapp.py...
2017-12-15T19:01:10+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/
2017-12-15T19:01:10+0800 [-] Loaded.
2017-12-15T19:01:10+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 17.9.0 (/usr/bin/python 2.7.12) starting up.
2017-12-15T19:01:10+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
2017-12-15T19:01:10+0800 [-] Site starting on 6800
2017-12-15T19:01:10+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site instance at 0x7f9589b0fa28>
2017-12-15T19:01:10+0800 [Launcher] Scrapyd 1.2.0 started: max_proc=4, runner=u'scrapyd.runner'

4, Publish crawler

Common commands:

PS:
*This article simply records some common commands of scrapyd. If you need to know more about scrapyd, it is recommended to read Official document of scrapyd*

Tags: curl JSON sudo pip

Posted on Tue, 19 May 2020 11:47:57 -0400 by patrikG