Уведомления

Группа в Telegram: @pythonsu
  • Начало
  • » Data Mining
  • » Scrapy. Correct way to parse product attributes table with attributes groups and save result to 4 mysql tables [RSS Feed]

#1 Сен. 18, 2014 23:03:41

mikhaild
Зарегистрирован: 2014-09-18
Сообщения: 1
Репутация: +  0  -
Профиль   Отправить e-mail  

Scrapy. Correct way to parse product attributes table with attributes groups and save result to 4 mysql tables

Need advice: what is correct way to parse product attributes html table with attributes groups and save results to 4 mysql tables: attribute, attribute_description, attribute_group, attribute_group_description.

Product attribute group names and attribute names unknown, different from product to product. Number of attributes group in html table unknown, but we can count it with

product_attribute_group_number = response.xpath('count(//th)').extract()
print ‘###product_attribute_group_number###’, int(float(product_attribute_group_number))

We can loop over every attribute group with:

for x in range (1,product_attribute_group_number):
for sel in response.xpath('//tr[th]/following-sibling::tr[count(.|//tr[th]/preceding-sibling::tr)=count(//tr[th]/preceding-sibling::tr)]|//tr[th]' %(x, x+1, x+1, x)):
product_attribute_group_name = sel.xpath('th/text()').extract()
print ‘###product_attribute_group_name###’, product_attribute_group_name
item = {}
for prop_row in product_attributes:
try:
prop = prop_row.xpath('th/text()').extract()
except IndexError, e:
print e# or pass, do nothing just ignore that row
prop = prop.strip()
try:
val = prop_row.xpath('td/text()').extract()
except IndexError, e:
print e# or pass, do nothing just ignore that row…
val = val.strip()
item = val
yield item

Is it correct way with correct selector xpath? Next question: what is correct selector xpath for last attributes group? (It hasn`t following-sibling::tr) Are there more elegant methods to parse html table with product attributes which are grouped to attribute groups?

Table example:
Operating Systemattributes group name)
OS(attribute name) Windows 8(attribute value)
OS Language(attribute name) English(attribute value)
Audioattributes group name)
Speakers(attribute name) Stereo Speakers(attribute value)
Mic In(attribute name) Yes(attribute value)
Headphone(attribute name) Yes(attribute value)
Batteryattributes group name)
Battery Type(attribute name) 4 Cell Li-ion(attribute value)
Battery life(attribute name) 41 WHr(attribute value)

<div class=“parameters-wrapper”>
<table class=“techSpecs”>
<tr>
<th class=“tech-specs-category” colspan=“2”>Operating System:</th>
</tr>
<tr>
<th>OS</th>
<td>Windows 8</td>
</tr>
<tr>
<th>OS Language</th>
<td>English</td>
</tr>
<tr>
<th class=“tech-specs-category” colspan=“2”>Audio:</th>
</tr>
<tr>
<th>Speakers</th>
<td>Stereo Speakers</td>
</tr>
<tr>
<th>Mic In</th>
<td>Yes</td>
</tr>
<tr>
<th>Headphone</th>
<td>Yes</td>
</tr>
<tr>
<th class=“tech-specs-category” colspan=“2”>Battery:</th>
</tr>
<tr>
<th>Battery Type</th>
<td>4 Cell Li-ion</td>
</tr>
<tr>
<th>Battery life</th>
<td>41 WHr</td>
</tr>
</table>
</div>

Офлайн

#2 Сен. 25, 2014 14:20:07

lorien
От:
Зарегистрирован: 2006-08-20
Сообщения: 755
Репутация: +  37  -
Профиль  

Scrapy. Correct way to parse product attributes table with attributes groups and save result to 4 mysql tables

I am afraid you've chosen wrong place to ask questions in English language :) This is Russian board. Try official mailing list of scrapy framework.

Офлайн

  • Начало
  • » Data Mining
  • » Scrapy. Correct way to parse product attributes table with attributes groups and save result to 4 mysql tables[RSS Feed]

Board footer

Модераторировать

Powered by DjangoBB

Lo-Fi Version