`utf8` Module¶

This file is part of the web2py Web Framework
Copyrighted by Massimo Di Pierro <mdipierro@cs.depaul.edu>
License: LGPLv3 (http://www.gnu.org/licenses/lgpl.html)
Created by Vladyslav Kozlovskyy (Ukraine) <dbdevelop©gmail.com>
for Web2py project

Utilities and class for UTF8 strings managing¶

class gluon.utf8.Utf8[source]¶

Bases: str

Class for utf8 string storing and manipulations

The base presupposition of this class usage is: “ALL strings in the application are either of utf-8 or unicode type, even when simple str type is used. UTF-8 is only a “packed” version of unicode, so Utf-8 and unicode strings are interchangeable.”

CAUTION! This class is slower than str/unicode! Do NOT use it inside intensive loops. Simply decode string(s) to unicode before loop and encode it back to utf-8 string(s) after intensive calculation.

You can see the benefit of this class in doctests() below

capitalize() → string[source]¶: Return a copy of the string S with only its first character capitalized.

center(width[, fillchar]) → string[source]¶: Return S centered in a string of length width. Padding is done using the specified fill character (default is a space)

count(sub[, start[, end]]) → int[source]¶: Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

decode([encoding[, errors]]) → object[source]¶: Decodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is ‘strict’ meaning that encoding errors raise a UnicodeDecodeError. Other possible values are ‘ignore’ and ‘replace’ as well as any other name registered with codecs.register_error that is able to handle UnicodeDecodeErrors.

encode([encoding[, errors]]) → object[source]¶: Encodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that is able to handle UnicodeEncodeErrors.

endswith(suffix[, start[, end]]) → bool[source]¶: Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

expandtabs([tabsize]) → string[source]¶: Return a copy of S where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[, start[, end]]) → int[source]¶

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args, **kwargs) → string[source]¶: Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{‘ and ‘}’).

index(sub[, start[, end]]) → int[source]¶: Like S.find() but raise ValueError when the substring is not found.

isalnum() → bool[source]¶: Return True if all characters in S are alphanumeric and there is at least one character in S, False otherwise.

isalpha() → bool[source]¶: Return True if all characters in S are alphabetic and there is at least one character in S, False otherwise.

isdigit() → bool[source]¶: Return True if all characters in S are digits and there is at least one character in S, False otherwise.

islower() → bool[source]¶: Return True if all cased characters in S are lowercase and there is at least one cased character in S, False otherwise.

isspace() → bool[source]¶: Return True if all characters in S are whitespace and there is at least one character in S, False otherwise.

istitle() → bool[source]¶: Return True if S is a titlecased string and there is at least one character in S, i.e. uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return False otherwise.

isupper() → bool[source]¶: Return True if all cased characters in S are uppercase and there is at least one cased character in S, False otherwise.

join(iterable) → string[source]¶: Return a string which is the concatenation of the strings in the iterable. The separator between elements is S.

ljust(width[, fillchar]) → string[source]¶: Return S left-justified in a string of length width. Padding is done using the specified fill character (default is a space).

lower() → string[source]¶: Return a copy of the string S converted to lowercase.

lstrip([chars]) → string or unicode[source]¶: Return a copy of the string S with leading whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

partition(sep) -> (head, sep, tail)[source]¶: Search for the separator sep in S, and return the part before it, the separator itself, and the part after it. If the separator is not found, return S and two empty strings.

replace(old, new[, count]) → string[source]¶: Return a copy of string S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

rfind(sub[, start[, end]]) → int[source]¶

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int[source]¶: Like S.rfind() but raise ValueError when the substring is not found.

rjust(width[, fillchar]) → string[source]¶: Return S right-justified in a string of length width. Padding is done using the specified fill character (default is a space)

rpartition(sep) -> (head, sep, tail)[source]¶: Search for the separator sep in S, starting at the end of S, and return the part before it, the separator itself, and the part after it. If the separator is not found, return two empty strings and S.

rsplit([sep[, maxsplit]]) → list of strings[source]¶: Return a list of the words in the string S, using sep as the delimiter string, starting at the end of the string and working to the front. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator.

rstrip([chars]) → string or unicode[source]¶: Return a copy of the string S with trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

split([sep[, maxsplit]]) → list of strings[source]¶: Return a list of the words in the string S, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.

splitlines(keepends=False) → list of strings[source]¶: Return a list of the lines in S, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool[source]¶: Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

strip([chars]) → string or unicode[source]¶: Return a copy of the string S with leading and trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is unicode, S will be converted to unicode before stripping

swapcase() → string[source]¶: Return a copy of the string S with uppercase characters converted to lowercase and vice versa.

title() → string[source]¶: Return a titlecased version of S, i.e. words start with uppercase characters, all remaining cased characters have lowercase.

translate(table[, deletechars]) → string[source]¶: Return a copy of the string S, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256 or None. If the table argument is None, no translation is applied and the operation simply removes the characters in deletechars.

upper() → string[source]¶: Return a copy of the string S converted to uppercase.

zfill(width) → string[source]¶: Pad a numeric string S with zeros on the left, to fill a field of the specified width. The string S is never truncated.

utf8 Module¶

Utilities and class for UTF8 strings managing¶

`utf8` Module¶