Note: Environmental preparation is based on Ubuntu16.04
2, Verification
- sudo pip install scrapyd
- sudo pip install scrapyd-client
Command line input: scraper
The output is as follows to indicate successful opening:
bdccl@bdccl-virtual-machine:~$ scrapyd Removing stale pidfile /home/bdccl/twistd.pid 2017-12-15T19:01:09+0800 [-] Removing stale pidfile /home/bdccl/twistd.pid 2017-12-15T19:01:09+0800 [-] Loading /usr/local/lib/python2.7/dist-packages/scrapyd/txapp.py... 2017-12-15T19:01:10+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/ 2017-12-15T19:01:10+0800 [-] Loaded. 2017-12-15T19:01:10+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 17.9.0 (/usr/bin/python 2.7.12) starting up. 2017-12-15T19:01:10+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor. 2017-12-15T19:01:10+0800 [-] Site starting on 6800 2017-12-15T19:01:10+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site instance at 0x7f9589b0fa28> 2017-12-15T19:01:10+0800 [Launcher] Scrapyd 1.2.0 started: max_proc=4, runner=u'scrapyd.runner'4, Publish crawler
Common commands:
-
Deploy crawler to scraper:
First, switch to the root directory of the crawler project and modify it scrapy.cfg , remove the comment from the following line:url = http://localhost:6800/
Then execute the following command in the terminal:
scrapyd-deploy <*target> -p PROJECT_ Name (target is the item label, and scrapy.cfg The [deploy] option in the file corresponds, optional)
Then open in the browser: http://localhost:6800/ or http://127.0.0.1:6800/ You can view the execution status of the crawler task and the job of the corresponding crawler in the browser_ ID
-
View status:
scrapyd-deploy -l
-
Start crawler:
curl http://localhost:6800/schedule.json -d project=PROJECT_NAME -d spider=SPIDER_NAME
-
Stop crawler:
curl http://localhost:6800/cancel.json -d project=PROJECT_NAME -d job=JOB_ID
-
Delete item:
curl http://localhost:6800/delproject.json -d project=PROJECT_NAME
-
List deployed projects:
-
List the crawlers in an item:
curl http://localhost:6800/listspiders.json?project=PROJECT_NAME
-
List job s for a project:
curl http://localhost:6800/listjobs.json?project=PROJECT_NAME
PS:
*This article simply records some common commands of scrapyd. If you need to know more about scrapyd, it is recommended to read Official document of scrapyd*