Latest posts on Не получается парсить syscanner topichttps://python.su/forum/topic/40567/2021-08-09T03:15:54+03:00Общий :: Python для новичков :: Не получается парсить syscanner
2021-08-09T03:15:54+03:00TERRA.NOVA_S217784<br/>URL в браузере открывается корректно, а в файл выводится только обрывок.<br/>Код:<br/><div class="code"><pre> <span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span>
<span class="n">url</span> <span class="o">=</span> <span class="s1">'https://www.skyscanner.ru/transport/flights/mosc/znz/?adults=1&adultsv2=1&cabinclass=economy&children=0&childrenv2=&destinationentityid=39828399&inboundaltsenabled=false&infants=0&originentityid=27539438&outboundaltsenabled=false&preferdirects=false&preferflexible=false&ref=home&rtn=1&oym=2110&selectedoday=01&iym=2110&selectediday=01'</span> <span class="c1"># url страницы</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'2.txt'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">output_file</span><span class="p">:</span>
<span class="n">output_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
</pre></div><br/>Похоже, что сайт использует защиту от парсинга на JS. Что делать?<br/><br/>Получаю при парсинге:<br/><div class="code"><pre> <span class="cp"><!doctype html></span><span class="p"><</span><span class="nt">html</span> <span class="na">lang</span><span class="o">=</span><span class="s">"en"</span><span class="p">><</span><span class="nt">head</span><span class="p">><</span><span class="nt">meta</span> <span class="na">charset</span><span class="o">=</span><span class="s">"utf-8"</span><span class="p">><</span><span class="nt">meta</span> <span class="na">http-equiv</span><span class="o">=</span><span class="s">"x-ua-compatible"</span> <span class="na">content</span><span class="o">=</span><span class="s">"ie=edge"</span><span class="p">><</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">"viewport"</span> <span class="na">content</span><span class="o">=</span><span class="s">"width=device-width,initial-scale=1,shrink-to-fit=no"</span><span class="p">><</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">"theme-color"</span> <span class="na">content</span><span class="o">=</span><span class="s">"#000000"</span><span class="p">><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"manifest"</span> <span class="na">href</span><span class="o">=</span><span class="s">"./manifest.json"</span><span class="p">><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"shortcut icon"</span> <span class="na">href</span><span class="o">=</span><span class="s">"./favicon.ico"</span><span class="p">><</span><span class="nt">title</span><span class="p">></span>Skyscanner<span class="p"></</span><span class="nt">title</span><span class="p">><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"icon"</span> <span class="na">href</span><span class="o">=</span><span class="s">"/favicon.ico"</span><span class="p">><</span><span class="nt">script</span> <span class="na">type</span><span class="o">=</span><span class="s">"text/javascript"</span><span class="p">></span><span class="nb">window</span><span class="p">.</span><span class="nx">__pageLoadedTime</span><span class="o">=</span><span class="nb">Date</span><span class="p">.</span><span class="nx">now</span><span class="p">()</</span><span class="nt">script</span><span class="p">><</span><span class="nt">link</span> <span class="na">href</span><span class="o">=</span><span class="s">"./static/css/main.83f6a466.css"</span> <span class="na">rel</span><span class="o">=</span><span class="s">"stylesheet"</span><span class="p">></</span><span class="nt">head</span><span class="p">><</span><span class="nt">body</span><span class="p">><</span><span class="nt">noscript</span><span class="p">></span>You need to enable JavaScript to run this app.<span class="p"></</span><span class="nt">noscript</span><span class="p">><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"root"</span><span class="p">></</span><span class="nt">div</span><span class="p">><</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">"./static/js/main.768c00c0.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></</span><span class="nt">body</span><span class="p">></</span><span class="nt">html</span><span class="p">></span>
</pre></div>