python 2.7 + eclipse + PyDev
Код:
import urllib import re regex = '<title>(.+?)</title>' pattern= re.compile(regex) htmlfile = urllib.urlopen ("http://ya.ru") htmltext = htmlfile.read() titles = re.findall(pattern,htmltext) print titles
['\xd0\xaf\xd0\xbd\xd0\xb4\xd0\xb5\xd0\xba\xd1\x81']