Форум сайта python.su
Я только начинаю изучать парсинг сайтов.
Хочу скачать с сайта тексты песенб начальный кусок кода работает, читает страницу и выводит страницу html.
Но когда я пытаюсь вывести песни по тегу (….startswith('href') мне отвечает: к Nonetype не может юыть использована команда startswith. Я пробовала обойти это циклом но не получаеться. Спасибо заранее
Кусочек документа html:
<a href="../lyrics/a1/takeonme.html" target="_blank">Take On Me</a><br/> <a href="../lyrics/a1/sameoldbrandnewyou.html" target="_blank">Same Old Brand New You</a><br/> <a href="../lyrics/a1/nomore.html" target="_blank">No More</a><br/> <a href="../lyrics/a1/onemoretry.html" target="_blank">One More Try</a><br/> <a href="../lyrics/a1/thethingsweneverdid.html" target="_blank">The Things We Never Did</a><br/> <a href="../lyrics/a1/toobadbaby.html" target="_blank">Too Bad Baby</a><br/> <a href="../lyrics/a1/nothingbuttrouble.html" target="_blank">Nothing But Trouble</a><br/> <a href="../lyrics/a1/tomorrow.html" target="_blank">Tomorrow</a><br/> <a href="../lyrics/a1/shedidntseeme.html" target="_blank">She Didn't See Me</a><br/> <a href="../lyrics/a1/scared.html" target="_blank">Scared</a><br/> <a href="../lyrics/a1/celebrateourlove.html" target="_blank">Celebrate Our Love</a><br/> <a href="../lyrics/a1/livinthedream.html" target="_blank">Livin' The Dream</a><br/> <a href="../lyrics/a1/iwonderwhy.html" target="_blank">I Wonder Why</a><br/> <a href="../lyrics/a1/illtakethetear.html" target="_blank">I'll Take The Tear</a><br/> <a href="../lyrics/a1/oneinlove.html" target="_blank">One In Love</a><br/> <a id="100"></a><div class="album">album: <b>"Make It Good"</b> (2002)</div> <a href="../lyrics/a1/caughtinthemiddle.html" target="_blank">Caught In The Middle</a><br/> <a href="../lyrics/a1/makeitgood.html" target="_blank">Make It Good</a><br/> <a href="../lyrics/a1/herecomestherain.html" target="_blank">Here Comes The Rain</a><br/> <a href="../lyrics/a1/whenimmissingyou.html" target="_blank">When I'm Missing You</a><br/> <a href="../lyrics/a1/thisaintwhatloveisabout.html" target="_blank">This Ain't What Love Is About</a><br/> <a href="../lyrics/a1/crazyforleavingyou.html" target="_blank">Crazy For Leaving You</a><br/> <a href="../lyrics/a1/learntofly.html" target="_blank">Learn To Fly</a><br/> <a href="../lyrics/a1/isntitcheap.html" target="_blank">Isn't It Cheap</a><br/> <a href="../lyrics/a1/ificanthaveyou.html" target="_blank">If I Can't Have You</a><br/> <a href="../lyrics/a1/makeitthroughthenight.html" target="_blank">Make It Through The Night</a><br/> <a href="../lyrics/a1/cherishthislove.html" target="_blank">Cherish This Love</a><br/> <a href="../lyrics/a1/doyouremember.html" target="_blank">Do You Remember?</a><br/> <a href="../lyrics/a1/onelastsong.html" target="_blank">One Last Song</a><br/> <a href="../lyrics/a1/letitout.html" target="_blank">Let It Out</a><br/> <a href="../lyrics/a1/nosdifferences.html" target="_blank">Nos Differences</a><span class="comment">[French Bonus Track]</span><br/> <a id="10749"></a><div class="album">album: <b>"Waiting For Daylight"</b> (2010)</div> <a href="../lyrics/a1/ithappenseveryday.html" target="_blank">It Happens Everyday</a><br/> <a href="../lyrics/a1/dontwannalooseyouagain.html" target="_blank">Don't Wanna Loose You Again</a><br/> <a href="../lyrics/a1/inloveandihateit.html" target="_blank">In Love And I Hate It</a><br/> <a href="../lyrics/a1/badenough.html" target="_blank">Bad Enough</a><br/> <a href="../lyrics/a1/nothingincommon.html" target="_blank">Nothing In Common</a><br/> <a href="../lyrics/a1/takeyouhome.html" target="_blank">Take You Home</a><br/> <a href="../lyrics/a1/sixfeetunder.html" target="_blank">Six Feet Under</a><br/> <a href="../lyrics/a1/goodthingsbadpeople.html" target="_blank">Good
for link in soup.find_all('a'):
if (link.get('href').startswith('/lyrics/a1')):
print(link.get('href'))
[code python]
Офлайн
Lena13_08startswith('../lyrics/a1') ?
if (link.get('href').startswith('/lyrics/a1')
Офлайн
vic57Уже пробовала это
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-69-e66db351bb09> in <module>() 1 for link in soup.find_all('a'): ----> 2 if (link.get('href').startswith('../lyrics/a1')): 3 print(link.get('href')) 4 5 AttributeError: 'NoneType' object has no attribute 'startswith'
Офлайн
имхо lxml лучше
s='''<a href="../lyrics/a1/sameoldbrandnewyou.html" target="_blank">Same Old Brand New You</a><br/> <a href="../lyrics/a1/nomore.html" target="_blank">No More</a><br/> <a href="../lyrics/a1/onemoretry.html" target="_blank">One More Try</a><br/> <a href="../lyrics/a1/thethingsweneverdid.html" target="_blank">The Things We Never Did</a><br/> <a href="../lyrics/a1/toobadbaby.html" target="_blank">Too Bad Baby</a><br/> <a href="../lyrics/a1/nothingbuttrouble.html" target="_blank">Nothing But Trouble</a><br/> <a href="../lyrics/a1/tomorrow.html" target="_blank">Tomorrow</a><br/> <a href="../lyrics/a1/shedidntseeme.html" target="_blank">She Didn't See Me</a><br/> <a href="../lyrics/a1/scared.html" target="_blank">Scared</a><br/> <a href="../lyrics/a1/celebrateourlove.html" target="_blank">Celebrate Our Love</a><br/> <a href="../lyrics/a1/livinthedream.html" target="_blank">Livin' The Dream</a><br/> <a href="../lyrics/a1/iwonderwhy.html" target="_blank">I Wonder Why</a><br/> <a href="../lyrics/a1/illtakethetear.html" target="_blank">I'll Take The Tear</a><br/> <a href="../lyrics/a1/oneinlove.html" target="_blank">One In Love</a><br/> <a id="100"></a><div class="album">album: <b>"Make It Good"</b> (2002)</div> <a href="../lyrics/a1/caughtinthemiddle.html" target="_blank">Caught In The Middle</a><br/> <a href="../lyrics/a1/makeitgood.html" target="_blank">Make It Good</a><br/>''' from lxml import html htm = html.fromstring(s) path = htm.xpath('//a') for i in path: print(i.tag,i.attrib,i.text)
a {'href': '../lyrics/a1/sameoldbrandnewyou.html', 'target': '_blank'} Same Old Brand New You a {'href': '../lyrics/a1/nomore.html', 'target': '_blank'} No More a {'href': '../lyrics/a1/onemoretry.html', 'target': '_blank'} One More Try a {'href': '../lyrics/a1/thethingsweneverdid.html', 'target': '_blank'} The Things We Never Did a {'href': '../lyrics/a1/toobadbaby.html', 'target': '_blank'} Too Bad Baby a {'href': '../lyrics/a1/nothingbuttrouble.html', 'target': '_blank'} Nothing But Trouble a {'href': '../lyrics/a1/tomorrow.html', 'target': '_blank'} Tomorrow a {'href': '../lyrics/a1/shedidntseeme.html', 'target': '_blank'} She Didn't See Me a {'href': '../lyrics/a1/scared.html', 'target': '_blank'} Scared a {'href': '../lyrics/a1/celebrateourlove.html', 'target': '_blank'} Celebrate Our Love a {'href': '../lyrics/a1/livinthedream.html', 'target': '_blank'} Livin' The Dream a {'href': '../lyrics/a1/iwonderwhy.html', 'target': '_blank'} I Wonder Why a {'href': '../lyrics/a1/illtakethetear.html', 'target': '_blank'} I'll Take The Tear a {'href': '../lyrics/a1/oneinlove.html', 'target': '_blank'} One In Love a {'id': '100'} None a {'href': '../lyrics/a1/caughtinthemiddle.html', 'target': '_blank'} Caught In The Middle a {'href': '../lyrics/a1/makeitgood.html', 'target': '_blank'} Make It Good >>>
Отредактировано vic57 (Ноя. 17, 2017 14:50:25)
Офлайн
Lena13_08Не у всех тегов “a” есть атрибут “href”, на них и выпадает программа. Надо разделить взятие атрибута и проверку его значения.
мне отвечает: к Nonetype не может юыть использована команда startswith
Офлайн
py.user.nextможно проще
е у всех тегов “a” есть атрибут “href”, на них и выпадает программа. Надо разделить взятие атрибута и проверку его значения.
path = htm.xpath('//a[@href][@target="_blank"]') #или еще проще path = htm.xpath('//@href')
Офлайн