Python-сообщество

vmprog · Сен. 22, 2011 22:08:38

Здравствуйте.
У меня ОС WinXP Pyton 27 IDE PyScripter
Код такой

# coding: cp1251
import urllib2
import html5lib
import lxml.etree as etree
import codecs, sys

xpath_abr='/html/body/div[3]/div[1]/div[2]/ul[1]/text()[1]'

builder = html5lib.getTreeBuilder('lxml')
parser  = html5lib.HTMLParser(builder, namespaceHTMLElements = False)
doc_tree= parser.parse(urllib2.urlopen('http://vmcorp.ru/').read())

r = doc_tree.xpath(xpath_abr.decode('utf8'))
n = str(r)

#print n.decode('utf8').encode('cp1251').decode('cp866')
#n.encode('utf-8', 'ignore')
#n.decode("cp1251").encode("utf8")
#n.decode("utf-8").encode("cp1251")

print n.encode('cp1251')
#print n

При запуске выводит
>>>

>>>

Как быть?

pill · Сен. 22, 2011 23:03:09

Поменяйте n = str(r) на n = r
в r у вас список ведь.

vmprog · Сен. 26, 2011 21:36:26

Подправил. Начал ерорить

Message File Name Line Position
Traceback
<module> G:\base_known\c++ project\Pyton\3-1.py 22
encode C:\Python27\lib\encodings\cp1251.py 12
UnicodeEncodeError: ‘charmap’ codec can't encode characters in position 0-6: character maps to <undefined>

Если оставляю просто print n то при запуске вывод вообще пустой.

py.user.next · Сен. 27, 2011 04:07:28

>>> s = '\xd1\xe4\xe5\xeb\xe0\xf2\xfc \xe0\xf0\xf5\xe8\xe2\xed\xf3\xfe \xea\xee\xef\xe8\xfe \xe1\xe0\xe7\xfb \xe4\xe0\xed\xed\xfb\xf5.'
>>> print s.decode('cp1251')
Сделать архивную копию базы данных.
>>>

ещё

>>> s = u'\xd1\xe4\xe5\xeb\xe0\xf2\xfc \xe0\xf0\xf5\xe8\xe2\xed\xf3\xfe \xea\xee\xef\xe8\xfe \xe1\xe0\xe7\xfb \xe4\xe0\xed\xed\xfb\xf5.'
>>> print s.encode('latin1').decode('cp1251')
Сделать архивную копию базы данных.
>>>

Отредактировано (Сен. 27, 2011 04:21:19)

vmprog · Сен. 27, 2011 20:55:32

Спасибо :))) получилось так print s.encode('latin1').decode('cp1251')

vmprog · Окт. 1, 2011 17:54:30

Помогите пожалуйста. Вот еще проблема.
Не удается нормально перекодировать и вывести переменную.
Вывод получается такой:
nt (unicode) u'\xce\xf1\xed\xee\xe2\xed\xe0\xff'
nt1 ‘\xc3\x8e\xc3\xb1\xc3\xad\xc3\xae\xc3\xa2\xc3\xad\xc3\xa0\xc3\xbf’
nt2 u'\u0413\u040b\u0413\xb1\u0413\xad\u0413\xae\u0413\u045e\u0413\xad\u0413\xa0\u0413\u0457'

Как быть?

# -*- coding: utf-8 -*-
import urllib2
import html5lib
import lxml.etree as etree
import codecs, sys
import platform
xpath_const='/html/body/div[2]/div[1]/a/text()[1]'
xpath_abr="//a"
builder = html5lib.getTreeBuilder('lxml')
parser  = html5lib.HTMLParser(builder, namespaceHTMLElements = False)
doc_tree= parser.parse(urllib2.urlopen('http://vmcorp.ru/').read())

root = doc_tree.getroot()
xpath_me = etree.XPath(xpath_abr.decode('utf8'))
nodes    = xpath_me(root)
for node in nodes:
    h  = doc_tree.getpath(node)
    nt = node.text
    nt1 = nt.encode('utf-8')
    nt2 = nt.encode('utf-8').decode('cp1251')
    print nt
    break

shep · Окт. 1, 2011 21:22:25

Попробуй заменить
xpath_abr.decode('utf8')
на
xpath_abr.decode('cp1251')

И наверно после этого encode и decode больше не нужны

py.user.next · Окт. 2, 2011 08:30:15

>>> nt = u'\xce\xf1\xed\xee\xe2\xed\xe0\xff'
>>> print nt.encode('latin1').decode('cp1251')
Основная
>>>
>>> nt1 = '\xc3\x8e\xc3\xb1\xc3\xad\xc3\xae\xc3\xa2\xc3\xad\xc3\xa0\xc3\xbf'
>>> print nt1.decode('utf-8').encode('latin1').decode('cp1251')
Основная
>>>
>>> nt2 = u'\u0413\u040b\u0413\xb1\u0413\xad\u0413\xae\u0413\u045e\u0413\xad\u0413\xa0\u0413\u0457'
>>> print nt2.encode('cp1251').decode('utf-8').encode('latin1').decode('cp1251')
Основная
>>>

vmprog · Окт. 2, 2011 12:28:03

Спасибо большое. Получилось.

Python-сообщество

Уведомления

#1 Сен. 22, 2011 22:08:38

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

#2 Сен. 22, 2011 23:03:09

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

#3 Сен. 26, 2011 21:36:26

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

#4 Сен. 27, 2011 04:07:28

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

#5 Сен. 27, 2011 20:55:32

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

#6 Окт. 1, 2011 17:54:30

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

#7 Окт. 1, 2011 21:22:25

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

#8 Окт. 2, 2011 08:30:15

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

#9 Окт. 2, 2011 12:28:03

print выводит [u'\xd1\xe4\xe5\xeb\ как быть

Board footer