{"id":163,"date":"2020-11-23T10:03:01","date_gmt":"2020-11-23T01:03:01","guid":{"rendered":"http:\/\/localhost:8000\/?p=163"},"modified":"2021-01-16T00:30:08","modified_gmt":"2021-01-15T15:30:08","slug":"python-asyncio","status":"publish","type":"post","link":"http:\/\/localhost:8000\/2020\/11\/python-asyncio.html","title":{"rendered":"Python\u3067asyncio\u3092\u4f7f\u3063\u305f\u975e\u540c\u671f\u51e6\u7406"},"content":{"rendered":"

Python\u3067\u975e\u540c\u671f\u51e6\u7406\u3092\u5b9f\u88c5\u3059\u308b\u5834\u5408\u3001 asyncio<\/a> \u30e2\u30b8\u30e5\u30fc\u30eb\u3092\u4f7f\u3046\u306e\u304c\u4e00\u822c\u7684\u3060\u3068\u601d\u3044\u307e\u3059\u304c\u3001\u4e26\u884c\u51e6\u7406\u3092\u3055\u305b\u305f\u308a\u3001\u30cd\u30b9\u30c8\u3057\u305f\u308a\u3059\u308b\u30b1\u30fc\u30b9\u306f\u591a\u5c11\u6c17\u3092\u3064\u3051\u308b\u3053\u3068\u3082\u3042\u3063\u305f\u306e\u3067\u3001\u30e1\u30e2\u7684\u306b\u3084\u308a\u65b9\u3092\u6b8b\u3057\u3066\u304a\u304d\u307e\u3059\u3002<\/p>\n

\u30b7\u30fc\u30b1\u30f3\u30b7\u30e3\u30eb\u306a\u975e\u540c\u671f\u51e6\u7406<\/h2>\n

\u307e\u305a\u306f\u30b7\u30f3\u30d7\u30eb\u306a\u30b1\u30fc\u30b9\u3068\u3057\u3066\u30013\u30b5\u30a4\u30c8\uff08qiita, google, yahoo\uff09\u306e\u5408\u8a086\u30da\u30fc\u30b8\u306b\u3064\u3044\u3066\u3001pyppeteer\uff08chromium\u3092\u8d77\u52d5\u3057\u3066\u975e\u540c\u671f\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3059\u308b\u30e9\u30a4\u30d6\u30e9\u30ea\uff09\u3092\u4f7f\u3063\u3066\u3001HTML\u3092\u53d6\u5f97\u3059\u308b\u30b1\u30fc\u30b9\u3092\u66f8\u3044\u3066\u307f\u307e\u3057\u305f\u3002<\/p>\n

from pyppeteer.browser import Browser\nfrom pyppeteer.page import Page\nfrom pyppeteer.launcher import launch\nfrom typing import List\nimport datetime\nfrom asyncio import AbstractEventLoop\nimport asyncio\n\nURLS: List[str] = [\n    'https:\/\/qiita.com\/search?q=1',\n    'https:\/\/qiita.com\/search?q=2',\n    'https:\/\/www.google.com\/search?q=1',\n    'https:\/\/www.google.com\/search?q=2',\n    'https:\/\/search.yahoo.co.jp\/search?p=1',\n    'https:\/\/search.yahoo.co.jp\/search?p=2',\n]\n\ndef log(message: str):\n    print(f'{datetime.datetime.now()} {message}')\n\nasync def main():\n    log('main started.')\n    browser: Browser = await launch()\n    try:\n        htmls: List[str] = []\n        for url in URLS:\n            html: str = await scrape_one_page(browser, url)\n            htmls.append(html)\n    finally:\n        await browser.close()\n    log(f'htmls count: {len(htmls)}')\n    log('main finished.')\n\nasync def scrape_one_page(browser: Browser, url: str) -> str:\n    """\u3072\u3068\u3064\u306e\u30da\u30fc\u30b8\u3092\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3057\u3066\u3001HTML\u3092\u8fd4\u3059"""\n    log(f'scrape_one_page started. url: {url}')\n    log(f'task count: {len(asyncio.all_tasks())}')\n    page: Page = await browser.newPage()\n    try:\n        await page.goto(url)\n        html: str = await page.content()\n        return html\n    finally:\n        await page.close()\n        log(f'scrape_one_page finished. url: {url}')\n\nif __name__ == "__main__":\n    event_loop: AbstractEventLoop = asyncio.get_event_loop()\n    event_loop.run_until_complete(main())<\/code><\/pre>\n