Форум сайта python.su
Помогите заставить это рег-выражение работать с русским текстом.
word = "Ветер++,;"
non_word_regex = re.compile('\W+')
word = non_word_regex.sub('', word)
print word
Офлайн
# -*- coding: Windows-1251 -*- import re word = u"Ветер++,;" non_word_regex = re.compile(r'\W+', re.U) word = non_word_regex.sub('', word) print word
Отредактировано (Янв. 7, 2008 15:51:03)
Офлайн
Огромное спасибо. Еше один вопрос: “Что означает r В r'\W+' и естх ли подробное описание строкоформирующих символов?”
Офлайн
http://docs.python.org/ref/strings.html
String literals may optionally be prefixed with a letter “r” or “R”; such strings are called raw strings and use different rules for interpreting backslash escape sequences.
…
When an “r” or “R” prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r“\n” consists of two characters: a backslash and a lowercase “n”. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r“\”" is a valid string literal consisting of two characters:a backslash and a double quote;
>>> print "\naaa\n" aaa >>> print r"\naaa\n" \naaa\n >>>
Офлайн
Офлайн