{"id":212,"date":"2020-08-09T10:09:02","date_gmt":"2020-08-09T01:09:02","guid":{"rendered":"http:\/\/localhost:8000\/?p=212"},"modified":"2021-01-16T10:14:48","modified_gmt":"2021-01-16T01:14:48","slug":"pyppeteer-manual","status":"publish","type":"post","link":"http:\/\/localhost:8000\/2020\/08\/pyppeteer-manual.html","title":{"rendered":"pyppeteer\u306e\u4f7f\u3044\u65b9"},"content":{"rendered":"
pyppeteer<\/a>\u306f\u3001npm\u30e2\u30b8\u30e5\u30fc\u30eb\u3067\u3042\u308bpuppeteer<\/a>\u3092python\u306b\u79fb\u690d\u3057\u305f\u3082\u306e\u3067\u3059\u3002 \u3053\u3061\u3089<\/a>\u306e\u30b5\u30a4\u30c8\u306b\u30ed\u30b0\u30a4\u30f3\u3059\u308b\u30b5\u30f3\u30d7\u30eb\u3092\u66f8\u3044\u3066\u307f\u307e\u3057\u305f\u3002<\/p>\n \u30dd\u30a4\u30f3\u30c8\u306f\u4ee5\u4e0b\u306e\u901a\u308a\u3067\u3059\u3002<\/p>\n \u30da\u30fc\u30b8\u6570\u306e\u591a\u3044\u30b5\u30a4\u30c8\u3092\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3059\u308b\u3053\u3068\u306b\u306a\u3063\u305f\u3089\u3001\u5206\u6563\u51e6\u7406\u3067\u4e00\u6c17\u306b\u51e6\u7406\u3092\u5b9f\u884c\u3057\u305f\u304f\u306a\u308b\u3053\u3068\u3082\u3042\u308b\u3068\u601d\u3044\u307e\u3059\u3002\u5148\u307b\u3069\u306e\u30b7\u30f3\u30d7\u30eb\u306a\u30b1\u30fc\u30b9\u3092\u30de\u30eb\u30c1\u30d7\u30ed\u30bb\u30b9\u3067\u4e00\u6c17\u306b\u52d5\u304b\u3057\u3066\u5b9f\u969b\u306b\u52d5\u304f\u304b\u8a66\u3057\u3066\u307f\u307e\u3057\u305f\u3002\u3053\u3061\u3089\u3082\u554f\u984c\u306a\u304f\u52d5\u304d\u307e\u3057\u305f\u3002<\/p>\n \u30dd\u30a4\u30f3\u30c8\u306f\u4ee5\u4e0b\u3067\u3059\u3002<\/p>\n Dataproc\u3092\u4f7f\u3063\u3066PySpark\u30af\u30e9\u30b9\u30bf\u69cb\u6210\u306e\u30ef\u30fc\u30ab\u30fc\u30ce\u30fc\u30c9\u5074\u3067\u5206\u6563\u3057\u3066\u51e6\u7406\u3092\u5b9f\u884c\u3057\u3066\u3044\u308b\u306e\u3067\u3059\u304c\u3001Mac OS X\u3060\u3068\u3046\u307e\u304f\u52d5\u304f\u306e\u3067\u3059\u304c\u3001Linux\uff08Ubuntu 18.04 LTS\uff09\u3060\u3068\u3046\u307e\u304f\u52d5\u304b\u306a\u3044\u90e8\u5206\u304c\u305f\u304f\u3055\u3093\u51fa\u3066\u304d\u3066\u5927\u5909\u3067\u3057\u305f\u3002\u4ee5\u4e0b\u306e\uff12\u3064\u306f\u89e3\u6c7a\u306b\u5c11\u3057\u6642\u9593\u304c\u304b\u304b\u3063\u305f\u306e\u3067\u7d39\u4ecb\u3057\u3066\u304a\u304d\u307e\u3059\u3002 \u305d\u3053\u3067\u3001chromium\u3092\u5b9f\u884c\u3059\u308b\u30b3\u30de\u30f3\u30c9\u3092\u76f4\u63a5\u5b9f\u884c\u3059\u308b\u51e6\u7406\u3092\u66f8\u3044\u3066\u3001\u5b9f\u884c\u3057\u3066\u307f\u307e\u3057\u305f\u3002<\/p>\n \u3059\u308b\u3068\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u30ed\u30b0\u304c\u51fa\u529b\u3055\u308c\u305f\u305f\u3081\u3001 \u3067\u3001\u7d50\u5c40\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306bUbuntu\u306b\u30e9\u30a4\u30d6\u30e9\u30ea\u3092\u8ffd\u52a0\u3057\u3066\u3042\u3052\u308b\u3053\u3068\u3067\u3053\u306e\u554f\u984c\u306f\u89e3\u6c7a\u3067\u304d\u307e\u3057\u305f\u3002<\/p>\n issue<\/a>\u306b\u3082\u4e0a\u304c\u3063\u3066\u3044\u3066\u3001 \u30d1\u30c3\u30c1\u304c\u63d0\u4f9b\u3055\u308c\u3066\u3044\u308b\u306e\u3067\u8a66\u3057\u3066\u307f\u307e\u3057\u305f\u3002 \u4ed5\u65b9\u306a\u3044\u306e\u3067\u3001\u5c11\u3057\u30bb\u30ad\u30e5\u30ea\u30c6\u30a3\u7684\u306b\u306f\u6016\u3044\u306e\u3067\u3059\u304c\u3001urllib3\u306e\u30d0\u30fc\u30b8\u30e7\u30f3\u30921.24.3\u306b\u843d\u3068\u3057\u3066\u307f\u305f\u3068\u3053\u308d\u3001\u3046\u307e\u304f\u52d5\u304f\u3088\u3046\u306b\u306a\u308a\u307e\u3057\u305f\u3002<\/p>\n chromium\u306e\u30b3\u30de\u30f3\u30c9\u30e9\u30a4\u30f3\u30aa\u30d7\u30b7\u30e7\u30f3<\/a>\u306f\u8272\u3005\u3042\u308b\u306e\u3067\u3059\u304c\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u3057\u3066\u6307\u5b9a\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002ignoreDefaultArgs\u3067\u4e00\u65e6\u30c7\u30d5\u30a9\u30eb\u30c8\u5f15\u6570\u3092\u7121\u8996\u3057\u305f\u4e0a\u3067\u3001\u8ffd\u52a0\u3057\u305f\u3044\u30aa\u30d7\u30b7\u30e7\u30f3\u3092\u8a2d\u5b9a\u3059\u308b\u611f\u3058\u3067\u3059\u3002<\/p>\n \u4f55\u6545\u304ba\u30bf\u30b0\u304c\u3046\u307e\u304f\u63cf\u753b\u3055\u308c\u305a\u3001a\u30bf\u30b0\u306eHtmlElement\u3092\u53d6\u5f97\u3057\u3066click()\u3057\u3066\u3082\u753b\u9762\u9077\u79fb\u304c\u767a\u751f\u3057\u306a\u3044\u3053\u3068\u304c\u3042\u308a\u307e\u3057\u305f\u3002a\u30bf\u30b0\u306eHtmlElement\u306f\u6b63\u5e38\u306b\u53d6\u5f97\u3067\u304d\u3066\u3044\u305f\u305f\u3081\u305d\u306e\u30b1\u30fc\u30b9\u306f \u4e0a\u306e\u30b5\u30f3\u30d7\u30eb\u3067\u3001 \u81ea\u5206\u306e\u5834\u5408\u306f\u5b89\u5168\u3081\u306b\u3001\u753b\u50cf\u3084\u30b9\u30af\u30ea\u30d7\u30c8\u304c\u5168\u90e8\u8aad\u307f\u8fbc\u307e\u308c\u3066 \u4f8b\u3048\u3070next\u30ea\u30f3\u30af\u304c\u51fa\u73fe\u3059\u308b\u307e\u3067\u5f85\u3061\u5408\u308f\u305b\u305f\u3044\u5834\u5408\u306f\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306b JavaScript\u306e\u95a2\u6570\u304cTrue\u3092\u8fd4\u3059\u307e\u3067\u5f85\u3064\u65b9\u6cd5\u3082\u3042\u308a\u307e\u3059\u3002 \u4ee5\u524d\u8a66\u3057\u305f\u6642\u306f\u8abf\u67fb\u304c\u8db3\u308a\u305a\u30d5\u30a1\u30a4\u30eb\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u306f\u8ae6\u3081\u305f\u306e\u3067\u3059\u304c\u3001\u300c\u30ea\u30af\u30a8\u30b9\u30c8\u30a4\u30d9\u30f3\u30c8\u306e\u5185\u5bb9\u3092\u30ad\u30e3\u30d7\u30c1\u30e3\u3057\u3066\u305d\u306e\u5185\u5bb9\u3067requests\u30e2\u30b8\u30e5\u30fc\u30eb\u3067\u5225\u9014\u30ea\u30af\u30a8\u30b9\u30c8\u3092\u9001\u4fe1\u3059\u308b\u300d\u3068\u3044\u3046\u5f62\u3067\u30d5\u30a1\u30a4\u30eb\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3092\u5b9f\u73fe\u3067\u304d\u307e\u3057\u305f\u3002\u5225\u8a18\u4e8b\u3092\u8d77\u3053\u3057\u3066\u307e\u3059\u306e\u3067\u3001\u305d\u3061\u3089\u3092\u53c2\u7167\u304f\u3060\u3055\u3044\u3002 \u4eca\u56de\u691c\u8a3c\u3057\u305f\u30bd\u30fc\u30b9\u30b3\u30fc\u30c9\u306f\u5168\u3066github\u306b\u3042\u3052\u3066\u3042\u308a\u307e\u3059\u306e\u3067\u3001\u5fc5\u8981\u304c\u3042\u308c\u3070\u3053\u3061\u3089<\/a>\u3092\u53c2\u7167\u304f\u3060\u3055\u3044\u3002<\/p>\n javascript\u306e\u4e16\u754c\u3067\u306f\u3001puppeteer\u306e\u307b\u3046\u304cselenium\u3088\u308a\u9ad8\u901f\u3060\u3057\u5b89\u5b9a\u6027\u304c\u3042\u308b\u3057\u65b0\u3057\u3044\u3057\u5727\u5012\u7684\u306b\u4eba\u6c17\u306a\u306e\u3067\u3001\u3042\u307e\u308a\u6df1\u304f\u8003\u3048\u305a\u306bpyppeteer\u3092\u5229\u7528\u3059\u308b\u3053\u3068\u306b\u3057\u305f\u306e\u3067\u3059\u304c\u3001\u7279\u306b\u4eca\u306e\u3068\u3053\u308d\u306f\u554f\u984c\u306f\u306a\u3055\u305d\u3046\u3067\u3059\u3002 pyppeteer\u306f\u3001npm\u30e2\u30b8\u30e5\u30fc\u30eb\u3067\u3042\u308bpuppeteer\u3092python\u306b\u79fb\u690d\u3057\u305f\u3082\u306e\u3067\u3059\u3002 \u4f8b\u3048\u3070\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002 \u30d8\u30c3\u30c9\u30ec\u30b9\u30d6\u30e9\u30a6\u30b6\uff08chromium\uff09\u3092\u958b\u304f \u5b9f\u969b\u306b\u30d6\u30e9\u30a6\u30b6\u5185\u3067\u30da\u30fc\u30b8\u3092\u8aad\u307f\u8fbc\u3080 CSS\u30bb\u30ec\u30af\u30bf\u3067HTML\u30a8\u30ec\u30e1\u30f3\u30c8\u3092\u53d6\u5f97\u3059\u308b HTML\u30a8\u30ec\u30e1\u30f3\u30c8\u3092\u30af\u30ea\u30c3\u30af\u3057\u3001\u753b\u9762\u9077\u79fb\u3059\u308b JavaScript\u3082\u5b9f\u884c\u3055\u308c\u308b\u306e\u3067\u52d5\u7684\u306b\u63cf\u753b\u3055\u308c\u305f\u5f8c\u306eHTML\u3092\u53d6\u5f97\u3059\u308b Cookie\u3082\u5171\u6709\u3055\u308c\u308b\u306e\u3067\u3001Cookie\u3084\u30bb\u30c3\u30b7\u30e7\u30f3\u304c\u5fc5\u8981\u306a\u30b5\u30a4\u30c8\u306e\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3082\u3067\u304d\u308b \u30a4\u30f3\u30b9\u30c8\u30fc\u30eb pip install pyppeteer # or poetry add pyppeteer \u30b7\u30f3\u30d7\u30eb\u306a\u4f7f\u3044\u65b9 \u3053\u3061\u3089\u306e\u30b5\u30a4\u30c8\u306b\u30ed\u30b0\u30a4\u30f3\u3059\u308b\u30b5\u30f3\u30d7\u30eb\u3092\u66f8\u3044\u3066\u307f\u307e\u3057\u305f\u3002 import asyncio from pyppeteer.launcher import launch from pyppeteer. <\/span>Continue Reading<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[8,7,9],"tags":[],"_links":{"self":[{"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/posts\/212"}],"collection":[{"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/comments?post=212"}],"version-history":[{"count":1,"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/posts\/212\/revisions"}],"predecessor-version":[{"id":213,"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/posts\/212\/revisions\/213"}],"wp:attachment":[{"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/media?parent=212"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/categories?post=212"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/localhost:8000\/wp-json\/wp\/v2\/tags?post=212"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}
\n\u4f8b\u3048\u3070\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n\n
\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb<\/h2>\n
pip install pyppeteer\n# or\npoetry add pyppeteer<\/code><\/pre>\n
\u30b7\u30f3\u30d7\u30eb\u306a\u4f7f\u3044\u65b9<\/h2>\n
import asyncio\nfrom pyppeteer.launcher import launch\nfrom pyppeteer.page import Page, Response\nfrom pyppeteer.browser import Browser\nfrom pyppeteer.element_handle import ElementHandle\nfrom typing import List, Any\n\ndef main():\n html: str = asyncio.get_event_loop().run_until_complete(extract_html())\n print(html)\n\nasync def extract_html() -> str:\n # \u30d6\u30e9\u30a6\u30b6\u3092\u8d77\u52d5\u3002headless=False\u306b\u3059\u308b\u3068\u5b9f\u969b\u306b\u8868\u793a\u3055\u308c\u308b\n browser: Browser = await launch()\n try:\n page: Page = await browser.newPage()\n\n # \u30ed\u30b0\u30a4\u30f3\u753b\u9762\u306b\u9077\u79fb\n response: Response = await page.goto('http:\/\/quotes.toscrape.com\/login')\n if response.status != 200:\n raise RuntimeError(f'site is not available. status: {response.status}')\n\n # Username\u30fbPassword\u3092\u5165\u529b\n await page.type('#username', 'hoge')\n await page.type('#password', 'fuga')\n login_btn: ElementHandle = await page.querySelector('form input.btn')\n\n # Login\u30dc\u30bf\u30f3\u30af\u30ea\u30c3\u30af\n results: List[Any] = await asyncio.gather(login_btn.click(), page.waitForNavigation())\n if results[1].status != 200:\n raise RuntimeError(f'site is not available. status: {response.status}')\n\n # \u30ed\u30b0\u30a4\u30f3\u5f8c\u306e\u753b\u9762\u306eHTML\u53d6\u5f97\n html: str = await page.content()\n\n return html\n finally:\n await browser.close()\n\nif __name__ == "__main__":\n main()<\/code><\/pre>\n
\n
await<\/code>\u3059\u308b\u5fc5\u8981\u304c\u3042\u308b<\/li>\n
goto<\/code>\u3067\u753b\u9762\u9077\u79fb\u3057\u305f\u5834\u5408\u306fResponse\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u3092\u53d7\u3051\u53d6\u308c\u308b\u3002\u305f\u3060\u3057\u3001\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30a8\u30e9\u30fc\u306a\u3069\u306eResponse\u304c\u53d6\u5f97\u3067\u304d\u306a\u3044\u72b6\u6cc1\u306e\u5834\u5408\u306f\u3001\u30a8\u30e9\u30fc\u304craise\u3055\u308c\u308b<\/li>\n
asyncio.gather()<\/code>\u3092\u4f7f\u3063\u3066\u8907\u6570\u306e\u975e\u540c\u671f\u51e6\u7406\u3092\u5f85\u3061\u5408\u308f\u305b\u308b\u5fc5\u8981\u304c\u3042\u308b\n
\n
asyncio.gather()<\/code>\u306f\u8907\u6570\u306e\u975e\u540c\u671f\u51e6\u7406\u306e\u623b\u308a\u5024\u3092List\u5f62\u5f0f\u3067\u53d7\u3051\u53d6\u308c\u308b\u3002\u9806\u756a\u306f\u5f15\u6570\u306b\u6307\u5b9a\u3057\u305f\u9806\u756a\u3068\u306a\u308b<\/li>\n
page.waitForNavigation()<\/code>\u3092\u3001\u7279\u5b9a\u306e\u30a8\u30ec\u30e1\u30f3\u30c8\u306e\u63cf\u753b\u3092\u5f85\u3061\u5408\u308f\u305b\u305f\u3044\u5834\u5408\u306f
page.waitForSelector()<\/code>\u3092\u5229\u7528\u3059\u308b<\/li>\n
page.content()<\/code>\u3067HTML\u3092\u53d6\u5f97\u3057\u3088\u3046\u3068\u3059\u308b\u3068
pyppeteer.errors.NetworkError: Execution context was destroyed, most likely because of a navigation.<\/code>\u3068\u3044\u3046\u30a8\u30e9\u30fc\u304c\u767a\u751f\u3059\u308b<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n
\u30de\u30eb\u30c1\u30d7\u30ed\u30bb\u30b9\u3067\u5b9f\u884c<\/h2>\n
import asyncio\nfrom pyppeteer.launcher import launch\nfrom pyppeteer.page import Page, Response\nfrom pyppeteer.browser import Browser\nfrom pyppeteer.element_handle import ElementHandle\nfrom typing import List, Any\n\nfrom concurrent import futures\nfrom concurrent.futures import Future\n\ndef main():\n future_list: List[Future] = []\n htmls: List[str] = []\n with futures.ProcessPoolExecutor(max_workers=2) as executor:\n for i in range(10):\n future_list.append(executor.submit(pallalel_func))\n\n for future in futures.as_completed(fs=future_list):\n htmls.append(future.result())\n\n print(f'htmls count: {len(htmls)}')\n\ndef pallalel_func() -> str:\n return asyncio.get_event_loop().run_until_complete(extract_html())\n\nasync def extract_html() -> str:\n # \u300c\u30b7\u30f3\u30d7\u30eb\u306a\u4f7f\u3044\u65b9\u300d\u3068\u540c\u4e00\u306a\u306e\u3067\u7701\u7565\n\nif __name__ == "__main__":\n main()<\/code><\/pre>\n
\n
ProcessPoolExecutor<\/code>\u3092\u4f7f\u3063\u3066\u30de\u30eb\u30c1\u30d7\u30ed\u30bb\u30b9\u3067\u52d5\u304b\u3057\u3066\u3044\u308b\u304c\u3001
executor.submit()<\/code>\u306e\u5f15\u6570\u306b\u306f\u30b3\u30eb\u30fc\u30c1\u30f3\u95a2\u6570\uff08async\u304c\u3064\u3044\u305f\u95a2\u6570\uff09\u306f\u4f7f\u3048\u306a\u3044\u306e\u3067\u3001
pallalel_func<\/code>\u3068\u3044\u3046\u95a2\u6570\u3092\u9593\u306b\u631f\u3093\u3067\u3044\u308b\n
\n
TypeError: cannot pickle 'coroutine' object<\/code>\u3068\u3044\u3046\u30a8\u30e9\u30fc\u304c\u767a\u751f\u3059\u308b<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n
Linux\u4e0a\u3067\u5b9f\u884c<\/h2>\n
\n\u3061\u306a\u307f\u306b\u3001\u5b9f\u88c5\u81ea\u4f53\u306fMac OS X\u4e0a\u3067\u52d5\u304b\u3057\u3066\u3044\u308b\u6642\u3068\u7279\u306b\u5909\u66f4\u3042\u308a\u307e\u305b\u3093\u3002<\/p>\npyppeteer.errors.BrowserError: Browser closed unexpectedly<\/h3>\n
launch<\/code>\u51e6\u7406\u3067\u306b\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u30a8\u30e9\u30fc\u304c\u767a\u751f\u3057\u305f\u306e\u3067\u3059\u304c\u3001\u307e\u3063\u305f\u304f\u539f\u56e0\u304c\u30ed\u30b0\u306b\u51fa\u529b\u3055\u308c\u3066\u3044\u307e\u305b\u3093\u3002\u30e9\u30a4\u30d6\u30e9\u30ea\u306e\u5b9f\u88c5\u3092\u898b\u308b\u3068\u30d6\u30e9\u30a6\u30b6\u306eWebSocket\u306e\u30a8\u30f3\u30c9\u30dd\u30a4\u30f3\u30c8\u3092\u53d6\u5f97\u3057\u3088\u3046\u3068\u3057\u3066\u30bf\u30a4\u30e0\u30a2\u30a6\u30c8\u304c\u767a\u751f\u3057\u3066\u3044\u308b\u3088\u3046\u3067\u3059\u3002<\/p>\n
File "\/opt\/conda\/default\/lib\/python3.7\/site-packages\/pyppeteer\/launcher.py", line 305, in launch\n return await Launcher(options, **kwargs).launch()\n File "\/opt\/conda\/default\/lib\/python3.7\/site-packages\/pyppeteer\/launcher.py", line 166, in launch\n self.browserWSEndpoint = get_ws_endpoint(self.url)\n File "\/opt\/conda\/default\/lib\/python3.7\/site-packages\/pyppeteer\/launcher.py", line 225, in get_ws_endpoint\n raise BrowserError('Browser closed unexpectedly:\\n')\npyppeteer.errors.BrowserError: Browser closed unexpectedly:<\/code><\/pre>\n
from pyppeteer.launcher import Launcher\nimport os\n\ncmd: str = " ".join(Launcher().cmd)\nprint(f'cmd: {cmd}')\nos.system(cmd)<\/code><\/pre>\n
libatk-1.0.so.0<\/code>\u304c\u5b58\u5728\u3057\u306a\u3044\u3053\u3068\u304c\u771f\u306e\u539f\u56e0\u3068\u3044\u3046\u3053\u3068\u304c\u5224\u660e\u3057\u307e\u3057\u305f\u3002<\/p>\n
A 2020-08-07T01:46:22.313971522Z cmd: \/home\/.local\/share\/pyppeteer\/local-chromium\/588429\/chrome-linux\/chrome --disable-background-networking --disable-background-timer-throttling --disable-breakpad --disable-browser-side-navigation --disable-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic --use-mock-keychain --headless --hide-scrollbars --mute-audio about:blank --remote-debugging-port=42815 --user-data-dir=\/home\/.local\/share\/pyppeteer\/.dev_profile\/tmp2xd0ki_i \nA 2020-08-07T01:46:22.318604089Z error while loading shared libraries: libatk-1.0.so.0: cannot open shared object file: No such file or directory <\/code><\/pre>\n
sudo apt-get update\nsudo apt-get install -y libatk1.0-0\nsudo apt-get install -y libatk-bridge2.0-0\nsudo apt-get install -y libgtk-3-0<\/code><\/pre>\n
ssl.SSLError: ("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])",)<\/h3>\n
urllib3 (1.25).<\/code> \u306e\u554f\u984c\u3089\u3057\u3044\u3067\u3059\u3002<\/p>\n
pip install pyppdf<\/code>\u3057\u3066\u3001
launch<\/code>\u3092import\u3059\u308b\u524d\u306b\u30d1\u30c3\u30c1\u3092\u5f53\u3066\u307e\u3057\u305f\u304c\u3001\u9069\u7528\u65b9\u6cd5\u304c\u9593\u9055\u3063\u3066\u308b\u306e\u304b\u30a8\u30e9\u30fc\u306f\u89e3\u6c7a\u3067\u304d\u307e\u305b\u3093\u3067\u3057\u305f\u3002<\/p>\n
import pyppdf.patch_pyppeteer\nfrom pyppeteer import launch<\/code><\/pre>\n
\u305d\u306e\u4ed6Tips<\/h2>\n
chromium\u306e\u30b3\u30de\u30f3\u30c9\u30e9\u30a4\u30f3\u30aa\u30d7\u30b7\u30e7\u30f3\u3092\u6307\u5b9a\u3059\u308b<\/h3>\n
browser: Browser = await launch(\n ignoreDefaultArgs=True,\n args=['--disable-gpu', '--user-data-dir=\/tmp']\n)<\/code><\/pre>\n
a\u30bf\u30b0\u306e\u30a8\u30ec\u30e1\u30f3\u30c8\u3092click()\u3057\u3066\u3082\u4f55\u6545\u304b\u753b\u9762\u9077\u79fb\u3057\u306a\u3044\u3053\u3068\u304c\u3042\u308b<\/h3>\n
element.click()<\/code>\u3067\u306f\u306a\u304f
page.goto()<\/code>\u3067\u753b\u9762\u9077\u79fb\u3057\u3066\u3057\u307e\u3046\u3053\u3068\u306b\u3057\u307e\u3057\u305f\u3002\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u611f\u3058\u3067\u3059\u3002<\/p>\n
link_element: ElementHandle = await page.querySelector('.hoge .fuga a')\nhref_property: JSHandle = await element.getProperty('href')\nhref_value: str = str(await href_property.jsonValue()).strip()\nif href_value.startswith(('http:\/\/', 'https:\/\/')):\n await page.goto(href_value)\nelse:\n await link_element.click()<\/code><\/pre>\n
page.goto()<\/code>\u306b\u3088\u308b\u753b\u9762\u9077\u79fb\u306e\u5f85\u3061\u5408\u308f\u305b<\/h3>\n
element.click()<\/code>\u306e\u65b9\u306e\u5f85\u3061\u5408\u308f\u305b\u306e\u65b9\u6cd5\u306f\u8efd\u304f\u7d39\u4ecb\u3057\u305f\u306e\u3067\u3059\u304c\u3001\u3053\u3053\u3067\u306f
page.goto()<\/code>\u306b\u3088\u308b\u5f85\u3061\u5408\u308f\u305b\u306e\u65b9\u6cd5\u3092\u7d39\u4ecb\u3057\u307e\u3059<\/p>\n
waitUntil\u3067\u7279\u5b9a\u306e\u30a4\u30d9\u30f3\u30c8\u3092\u5f85\u3064<\/h4>\n
page.goto()<\/code>\u306f
waitUntil<\/code>\u3068\u3044\u3046\u5f15\u6570\u3067\u3044\u3064\u307e\u3067\u5f85\u3064\u304b\u3092\u6307\u5b9a\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002waitUntil\u306b\u306f\u6700\u59274\u3064\u306e\u30a4\u30d9\u30f3\u30c8\u3092\u6307\u5b9a\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n
\n
load<\/code>\u30a4\u30d9\u30f3\u30c8\u304c\u767a\u706b\u3057\u3001\u304b\u3064\u30010\u30b3\u30cd\u30af\u30b7\u30e7\u30f3\u306b\u306a\u3063\u3066500msec\u7d4c\u3064\u3068\u3044\u3046\u8a2d\u5b9a\u3067\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n
await page.goto('http:\/\/quotes.toscrape.com', waitUntil=['load', 'networkidle0'])<\/code><\/pre>\n
page.waitForSelector<\/code>\u3067\u7279\u5b9a\u306e\u30a8\u30ec\u30e1\u30f3\u30c8\u306e\u51fa\u73fe\u3092\u5f85\u3064<\/h4>\n
asyncio.gather()<\/code>\u3092\u4f7f\u3063\u3066\u5b9f\u88c5\u3057\u307e\u3059\u3002\u3053\u306e\u623b\u308a\u5024\u306f\u5f15\u6570\u306e\u9806\u756a\u306b\u683c\u7d0d\u3055\u308c\u308b\u305f\u3081\u3001\u3053\u306e\u5834\u5408\u306findex=0\u304c
page.goto()<\/code>\u306e\u7d50\u679c\u306eResponse\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n
results: List[Any] = await asyncio.gather(\n page.goto('http:\/\/quotes.toscrape.com', waitUntil=['load', 'networkidle0']),\n page.waitForSelector('document.querySelector('li.next')')\n)\nresponse: Optional[Response] = results[0]<\/code><\/pre>\n
page.waitForFunction<\/code>\u3067\u6307\u5b9a\u3057\u305fJavaScript\u306e\u95a2\u6570\u304cTrue\u3092\u8fd4\u3059\u307e\u3067\u5f85\u3064<\/h4>\n
waitForSelector<\/code>\u306e\u4f8b\u3068\u540c\u69d8\u306b\u3001next\u30ea\u30f3\u30af\u304c\u8868\u793a\u3055\u308c\u308b\u306e\u3092\u5f85\u3061\u305f\u3044\u5834\u5408\u306f\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u5b9f\u88c5\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n
await asyncio.gather(\n page.goto('http:\/\/quotes.toscrape.com', waitUntil=['load', 'networkidle0']),\n page.waitForFunction('() => { return document.querySelector("li.next") != null; }')\n)<\/code><\/pre>\n
\u30d5\u30a1\u30a4\u30eb\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3059\u308b<\/h3>\n
\nhttps:\/\/rinoguchi.net\/2020\/12\/pyppeteer-fire-download.html<\/a><\/p>\n\u30ea\u30dd\u30b8\u30c8\u30ea<\/h2>\n
\u3055\u3044\u3054\u306b<\/h2>\n
\npython\u3067async\/await\u3067\u975e\u540c\u671f\u51e6\u7406\u3092\u540c\u671f\u7684\u306b\u66f8\u304f\u306e\u306f\u521d\u3081\u3066\u3060\u3063\u305f\u306e\u3067\u3059\u304c\u3001\u307b\u307cjavascript\u3068\u540c\u3058\u30ce\u30ea\u3067\u304b\u3051\u308b\u306e\u3067\u305d\u3053\u3082\u3059\u3050\u6163\u308c\u307e\u3057\u305f\u3002
\npython\u3067\u30d8\u30c3\u30c9\u30ec\u30b9\u30d6\u30e9\u30a6\u30b6\u304c\u5fc5\u8981\u306a\u72b6\u6cc1\u306a\u3089\u3001\u81ea\u5206\u306e\u4e2d\u3067\u306fpyppeteer\u306f\u7b2c\u4e00\u5019\u88dc\u3067\u554f\u984c\u306a\u3044\u3068\u601d\u3044\u307e\u3059\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"