{"id":209,"date":"2020-08-13T10:01:53","date_gmt":"2020-08-13T01:01:53","guid":{"rendered":"http:\/\/localhost:8000\/?p=209"},"modified":"2021-01-16T10:07:31","modified_gmt":"2021-01-16T01:07:31","slug":"scrapy-manual","status":"publish","type":"post","link":"http:\/\/localhost:8000\/2020\/08\/scrapy-manual.html","title":{"rendered":"Scrapy\u306e\u4f7f\u3044\u65b9"},"content":{"rendered":"

Scrapy<\/a> \u306f\u3001Web\u30b5\u30a4\u30c8\u3092\u30af\u30ed\u30fc\u30eb\u3057\u3001\u30da\u30fc\u30b8\u304b\u3089\u69cb\u9020\u5316\u30c7\u30fc\u30bf\u3092\u62bd\u51fa\u3059\u308b\u305f\u3081\u306b\u4f7f\u7528\u3055\u308c\u308bWeb\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3067\u3059\u3002Scrapy\u306b\u95a2\u3057\u3066\u306f\u308f\u304b\u308a\u3084\u3059\u3044\u8a18\u4e8b\u304c\u305f\u304f\u3055\u3093\u3042\u308b\u306e\u3067\u3001\u3053\u3053\u3067\u306f\u5b9f\u88c5\u30b5\u30f3\u30d7\u30eb\u3092\u7d39\u4ecb\u3057\u307e\u304f\u308b\u30b9\u30bf\u30f3\u30b9\u306b\u3057\u3088\u3046\u3068\u601d\u3044\u307e\u3059\u3002
\n\"\"<\/p>\n

\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb<\/h2>\n
pip install scrapy\n# or\npoetry add scrapy<\/code><\/pre>\n

\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3092\u8a66\u3059<\/h2>\n

\u3053\u3061\u3089<\/a>\u306b\u3057\u305f\u304c\u3063\u3066\u3001\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3092\u8a66\u3057\u3066\u307f\u307e\u3059\u3002<\/p>\n

scrapy startproject tutorial\nor\npoetry run scrapy startproject tutorial<\/code><\/pre>\n

\u3092\u5b9f\u884c\u3059\u308b\u3068tutorial<\/code>\u30d5\u30a9\u30eb\u30c0\u304c\u3067\u304d\u3066\u305d\u306e\u4e0b\u306b\u30c6\u30f3\u30d7\u30ec\u30fc\u30c8\u306e\u30bd\u30fc\u30b9\u30b3\u30fc\u30c9\u4e00\u5f0f\u304c\u51fa\u529b\u3055\u308c\u307e\u3059\u3002<\/p>\n

tutorial\/spiders<\/code>\u30d5\u30a9\u30eb\u30c0\u306e\u4e0b\u306b\u4ee5\u4e0b\u306e\u5185\u5bb9\u3067quotes_spider.py<\/code>\u3092\u4f5c\u308a\u307e\u3059\u3002<\/p>\n

import scrapy\nfrom typing import List\n\nclass QuotesSpider(scrapy.Spider):\n    name: str = "quotes"\n\n    def start_requests(self):\n        urls: List[str] = [\n            'http:\/\/quotes.toscrape.com\/page\/1\/',\n            'http:\/\/quotes.toscrape.com\/page\/2\/',\n        ]\n        for url in urls:\n            yield scrapy.Request(url=url, callback=self.parse)\n\n    def parse(self, response):\n        page: str = response.url.split("\/")[-2]\n        filename: str = 'quotes-%s.html' % page\n        with open(filename, 'wb') as f:\n            f.write(response.body)\n        self.log('Saved file %s' % filename)<\/code><\/pre>\n

\u30af\u30ed\u30fc\u30ea\u30f3\u30b0\u3092\u5b9f\u884c\u3057\u307e\u3059\u3002<\/p>\n

scrapy crawl quotes\nor\npoetry run scrapy crawl quotes<\/code><\/pre>\n

quotes-1.html<\/code>\u3068quotes-2.html<\/code>\u304c\u51fa\u529b\u3055\u308c\u3066\u3044\u308c\u3070\u6210\u529f\u3067\u3059\u3002<\/p>\n

Scrapy\u306e\u69cb\u6210\u8981\u7d20<\/h2>\n

\u4f55\u3082\u8003\u3048\u305a\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3092\u5b9f\u884c\u3057\u3066\u307f\u307e\u3057\u305f\u304c\u3001Scrapy\u306b\u306f\u3044\u304f\u3064\u304b\u306e\u767b\u5834\u4eba\u7269\u304c\u3044\u307e\u3059\u3002<\/p>\n

Spiders<\/a><\/h3>\n

\u4e0a\u306e\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3082\u5b9f\u88c5\u3057\u307e\u3057\u305f\u304c\u3001\u3069\u306e\u3088\u3046\u306b\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3059\u308b\u304b\u3092\u5b9a\u7fa9\u3059\u308b\u4e2d\u5fc3\u7684\u306a\u30af\u30e9\u30b9\u3067\u3059\u3002Spider\u30af\u30e9\u30b9\u3060\u3051\u3067\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3092\u5b8c\u7d50\u3055\u305b\u308b\u3053\u3068\u3082\u3067\u304d\u307e\u3059\u3002<\/p>\n

Spider\u30af\u30e9\u30b9\u3067\u306f\u3001\u4ee5\u4e0b\u306e\u6d41\u308c\u3067\u51e6\u7406\u304c\u5b9f\u884c\u3055\u308c\u307e\u3059<\/p>\n

    \n
  1. start_requests()<\/code>\u3067\u6700\u521d\u306eRequest\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\uff08\u306e\u30ea\u30b9\u30c8\uff09\u3092\u8fd4\u5374\u3057\u307e\u3059<\/li>\n
  2. Request\u304c\u5b9f\u884c\u3055\u308c\u3001callback\u95a2\u6570\u3067\u3042\u308bparse()<\/code>\u304c\u547c\u3073\u51fa\u3055\u308c\u308b\u3002\u5f15\u6570\u306bResponse\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u304c\u6e21\u3063\u3066\u304f\u308b<\/li>\n
  3. callback\u95a2\u6570parse()<\/code>\u3067\u306f\u3001Response\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u3092\u5143\u306b\u51e6\u7406\u3092\u884c\u3044\uff08\u4f8b\u3048\u3070\u30d5\u30a1\u30a4\u30eb\u3092\u51fa\u529b\u3057\u305f\u308a\u3068\u304b\u3001DB\u306b\u8a18\u9332\u3057\u305f\u308a\u3068\u304b\uff09\u3001\u6b21\u306eRequest\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\uff08or \u30ea\u30b9\u30c8\uff09\u304bItem\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u3092\u8fd4\u5374\u3059\u308b<\/li>\n
  4. Request\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\uff08or \u30ea\u30b9\u30c8\uff09\u304c\u8fd4\u5374\u3055\u308c\u305f\u5834\u5408\u306f\u3001Request\u304c\u5b9f\u884c\u3055\u308c\u3001\u307e\u305fparse()<\/code>\u304c\u547c\u3073\u51fa\u3055\u308c\u308b<\/li>\n
  5. Item\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u304c\u8fd4\u5374\u3055\u308c\u305f\u5834\u5408\u306f\u3001Pipeline\u5b9a\u7fa9\u306b\u3057\u305f\u304c\u3063\u3066Pipeline\u304c\u5b9f\u884c\u3055\u308c\u308b<\/li>\n<\/ol>\n

    Item Pipeline<\/a><\/h3>\n

    Item Pipeline\u306f\u3001\u53d6\u5f97\u3057\u305fResonse\u306b\u5bfe\u3057\u3066\u306a\u3093\u3089\u304b\u306e\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u884c\u3044\u305f\u3044\u5834\u5408\u306b\u5229\u7528\u3059\u308b\u3082\u306e\u3067\u3059\u3002\u5229\u7528\u3057\u306a\u304f\u3066\u3082\u5168\u7136\u69cb\u3044\u307e\u305b\u3093\u3002
    \n\u4e00\u822c\u7684\u306a\u7528\u9014\u306f\u3001\u4ee5\u4e0b\u306e\u901a\u308a\u3068\u306e\u3053\u3068\u3067\u3059\u3002<\/p>\n