String Operations¶
Built-in methods of str
, bytes
, and bytearray
are provided as functions.
Some functions have enhanced parameters.
All functions support function composition modes. See Function Composition for details.
List of APIs¶
- calcpy.str.capitalize(value, /)[source]¶
Capitalize the first character of each word in the string.
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
- Return type:
str | bytes | bytearray | (list | tuple | pd.Series)[str]
Examples
>>> capitalize("hello world") 'Hello world' >>> capitalize(["hello", "world"]) ['Hello', 'World'] >>> import pandas as pd >>> capitalize(pd.Series(["hello", "world"])) 0 Hello 1 World dtype: object
- calcpy.str.capwords(value, /)[source]¶
Capitalize the first character of each word in the string.
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
- Return type:
str | bytes | bytearray | (list | tuple | pd.Series)[str]
Examples
>>> capwords("hello world") 'Hello World' >>> capwords(["hello", "world"]) ['Hello', 'World'] >>> import pandas as pd >>> capwords(pd.Series(["hello", "world"])) 0 Hello 1 World dtype: object
- calcpy.str.casefold(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
- Return type:
str | bytes | bytearray | (list | tuple | pd.Series)[str]
Examples
>>> casefold("Hello World") 'hello world' >>> casefold(["Hello", "World"]) ['hello', 'world'] >>> import pandas as pd >>> casefold(pd.Series(["Hello", "World"])) 0 hello 1 world dtype: object
- calcpy.str.center(value, /, width, fillchar=' ')[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
width (int) – width of the string
fillchar (str) – fill character
- Return type:
str | bytes | bytearray | (list | tuple | pd.Series)[str]
Examples
>>> center("Hello World", 20) ' Hello World ' >>> center(["Hello", "World"], 20) [' Hello ', ' World '] >>> import pandas as pd >>> center(pd.Series(["Hello", "World"]), 20) 0 Hello 1 World dtype: object
- calcpy.str.contains(value, /, pat, case=True, flags=0, na=None, regex=True)[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
pat (str) – pattern to match
case (bool) – case sensitive
flags (int) – flags
na (bool) – na value
regex (bool) – regex
- Return type:
bool | (list | tuple | pd.Series)[str]
Examples
>>> contains("Hello World", "World") True >>> contains(["Hello", "World"], "World") [False, True] >>> import pandas as pd >>> contains(pd.Series(["Hello", "World"]), "World") 0 False 1 True dtype: bool
- calcpy.str.count(value, /, sub, start=0, end=None)[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
sub (str) – substring to count
start (int) – start index
end (int) – end index
- Return type:
int | (list | tuple | pd.Series)
Examples
>>> count('Hello World', 'o') 2 >>> count(['Hello', 'World'], 'o') [1, 1] >>> import pandas as pd >>> count(pd.Series(['Hello', 'World']), 'o') 0 1 1 1 dtype: int64
- calcpy.str.decode(value, /, encoding='utf-8', errors='strict')[source]¶
- Parameters:
value (bytes | bytearray | (list | tuple | pd.Series)[bytes | bytearray])
encoding (str) – encoding
errors (str) – errors
- Return type:
str | list | tuple | pd.Series
Examples
>>> decode(b'Hello World', 'utf-8', 'strict') 'Hello World' >>> decode([b'Hello', b'World'], 'utf-8', 'strict') ['Hello', 'World'] >>> import pandas as pd >>> decode(pd.Series([b'Hello', b'World']), 'utf-8', 'strict') 0 Hello 1 World dtype: object
- calcpy.str.dedent(text)[source]¶
Remove any common leading whitespace from every line in text.
This can be used to make triple-quoted strings line up with the left edge of the display, while still presenting them in the source code in indented form.
Note that tabs and spaces are both treated as whitespace, but they are not equal: the lines “ hello” and “thello” are considered to have no common leading whitespace.
Entirely blank lines are normalized to a newline character.
- calcpy.str.encode(value, /, encoding='utf-8', errors='strict')[source]¶
- Parameters:
value (str | (list | tuple | pd.Series)[str])
encoding (str) – encoding
errors (str) – errors
- Return type:
bytes | list | tuple | pd.Series
Examples
>>> encode('Hello World', 'utf-8', 'strict') b'Hello World' >>> encode(['Hello', 'World'], 'utf-8', 'strict') [b'Hello', b'World'] >>> import pandas as pd >>> encode(pd.Series(['Hello', 'World']), 'utf-8', 'strict') 0 b'Hello' 1 b'World' dtype: object
- calcpy.str.endswith(value, /, suffix, start=0, end=None)[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
suffix (str) – suffix to endswith
start (int) – start index
end (int) – end index
- Return type:
str | bytes | bytearray | (list | tuple | pd.Series)[str]
Examples
>>> endswith('Hello World', 'World') True >>> endswith(['Hello', 'World'], 'World') [False, True] >>> import pandas as pd >>> endswith(pd.Series(['Hello', 'World']), 'World') 0 False 1 True dtype: bool
- calcpy.str.expandtabs(value, /, tabsize=8)[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
tabsize (int) – tabs
- Return type:
str | bytes | bytearray | (list | tuple | pd.Series)[str]
Examples
>>> expandtabs('Hello World', 4) 'Hello World' >>> expandtabs(['Hello World', 'Hello World'], 4) ['Hello World', 'Hello World'] >>> import pandas as pd >>> expandtabs(pd.Series(['Hello World', 'Hello World']), 4) 0 Hello World 1 Hello World dtype: object
- calcpy.str.fill(text, width=70, **kwargs)[source]¶
Fill a single paragraph of text, returning a new string.
Reformat the single paragraph in ‘text’ to fit in lines of no more than ‘width’ columns, and return a new string containing the entire wrapped paragraph. As with wrap(), tabs are expanded and other whitespace characters converted to space. See TextWrapper class for available keyword args to customize wrapping behaviour.
- calcpy.str.find(value, /, sub, start=0, end=None)[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
sub (str) – substring to find
start (int) – start index
end (int) – end in
- Return type:
int | list | tuple | pd.Series
Examples
>>> find('Hello World', 'World') 6 >>> find(['Hello', 'World'], 'World') [-1, 0] >>> import pandas as pd >>> find(pd.Series(['Hello', 'World']), 'World') 0 -1 1 0 dtype: int64
- calcpy.str.format_(value, /, *args, **kwargs)[source]¶
Perform a string formatting operation.
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
*args
**kwargs
- Return type:
str | bytes | bytearray | (list | tuple | pd.Series)[str]
Examples
>>> format_('Hello {0}', 'World') 'Hello World' >>> format_(['Hello', 'World'], '{0}') ['Hello', 'World'] >>> import pandas as pd >>> format_(pd.Series(['Hello', 'World']), '{0}') 0 Hello 1 World dtype: object
- calcpy.str.format_map(value, /, mapping)[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
mapping (dict) – mapping
- Return type:
str | bytes | bytearray | (list | tuple | pd.Series)[str]
Examples
>>> format_map('Hello {w}', {'w': 'World'}) 'Hello World' >>> format_map(['Hello', '{w}'], {'w': 'World'}) ['Hello', 'World'] >>> import pandas as pd >>> format_map(pd.Series(['Hello', '{w}']), {'w': 'World'}) 0 Hello 1 World dtype: object
- calcpy.str.index(value, /, sub, start=0, end=None)[source]¶
- Parameters:
value (str | bytes | bytearray | (list | tuple | pd.Series)[str])
sub (str) – substring to index
start (int) – start index
end (int) – end index
- Return type:
int | list | tuple | pd.Series
- Raises:
ValueError – if
sub
is not found
Examples
>>> index('Hello World', 'World') 6 >>> index(['Hello', 'World'], 'World') Traceback (most recent call last): ValueError: substring not found >>> import pandas as pd >>> index(pd.Series(['Hello', 'World']), 'World') Traceback (most recent call last): ValueError: substring not found
- calcpy.str.indent(text, prefix, predicate=None)[source]¶
Adds ‘prefix’ to the beginning of selected lines in ‘text’.
If ‘predicate’ is provided, ‘prefix’ will only be added to the lines where ‘predicate(line)’ is True. If ‘predicate’ is not provided, it will default to adding ‘prefix’ to all non-empty lines that do not consist solely of whitespace characters.
- calcpy.str.isalnum(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> isalnum('Hello World') False >>> isalnum(['Hello', 'World']) [True, True] >>> import pandas as pd >>> isalnum(pd.Series(['Hello', 'World'])) 0 True 1 True dtype: bool
- calcpy.str.isalpha(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> isalpha('Hello World') False >>> isalpha(['Hello', 'World']) [True, True] >>> import pandas as pd >>> isalpha(pd.Series(['Hello', 'World'])) 0 True 1 True dtype: bool
- calcpy.str.isascii(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> isascii('Hello World') True >>> isascii(['Hello', 'World']) [True, True] >>> import pandas as pd >>> isascii(pd.Series(['Hello', 'World'])) 0 True 1 True dtype: bool
- calcpy.str.isdecimal(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> isdecimal('Hello World') False >>> isdecimal(['Hello', 'World']) [False, False] >>> import pandas as pd >>> isdecimal(pd.Series(['Hello', 'World'])) 0 False 1 False dtype: bool
- calcpy.str.isidentifier(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> isidentifier('Hello World') False >>> isidentifier(['Hello', 'World']) [True, True] >>> import pandas as pd >>> isidentifier(pd.Series(['Hello', 'World'])) 0 True 1 True dtype: bool
- calcpy.str.islower(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> islower('Hello World') False >>> islower(['Hello', 'World']) [False, False] >>> import pandas as pd >>> islower(pd.Series(['Hello', 'World'])) 0 False 1 False dtype: bool
- calcpy.str.iskeyword(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> iskeyword('Hello World') False >>> iskeyword(['Hello', 'World']) [False, False] >>> import pandas as pd >>> iskeyword(pd.Series(['Hello', 'World'])) 0 False 1 False dtype: bool
- calcpy.str.isnumeric(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> isnumeric('Hello World') False >>> isnumeric(['Hello', 'World']) [False, False] >>> import pandas as pd >>> isnumeric(pd.Series(['Hello', 'World'])) 0 False 1 False dtype: bool
- calcpy.str.isprintable(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
Examples
>>> isprintable('Hello World') True >>> isprintable(['Hello', 'World']) [True, True] >>> import pandas as pd >>> isprintable(pd.Series(['Hello', 'World'])) 0 True 1 True dtype: bool
- calcpy.str.isspace(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
- calcpy.str.istitle(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
- calcpy.str.isupper(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
bool | pd.Series | pd.DataFrame
- calcpy.str.join(value, /, sep='')[source]¶
- Parameters:
value (list[str] | tuple[str] | pd.Series[str]) – string to join
sep (str) – separator
- Returns:
str
Examples
>>> join(['Hello', 'World'], ' ') 'Hello World' >>> import pandas as pd >>> join(pd.Series(['Hello', 'World']), ' ') 'Hello World'
- calcpy.str.ljust(value, /, width, fillchar=' ')[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
width (int)
fillchar (str)
- Return type:
str | pd.Series | pd.DataFrame
- calcpy.str.lower(value, /)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
- Return type:
str | pd.Series | pd.DataFrame
- calcpy.str.lstrip(value, /, chars=None)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
chars (str)
- Return type:
str | pd.Series | pd.DataFrame
- calcpy.str.partition(value, /, sep, expand=True)[source]¶
- Parameters:
value (str | bytes | bytearray | list | pd.Series[str]) – string to partition
sep (str) – separator
expand (bool) – expand or not. Only used when input is an
NDFrame
. Should beFalse
when value is apd.DataFrame
.
- Return type:
tuple | list | pd.DataFrame
Examples
>>> partition('Hello World', ' ') ('Hello', ' ', 'World') >>> partition(['Hello', 'World'], ' ') [('Hello', '', ''), ('World', '', '')] >>> import pandas as pd >>> partition(pd.Series(['Hello', 'World']), 'l') 0 1 2 0 He l lo 1 Wor l d
- calcpy.str.removeprefix(value, /, prefix)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
prefix (str)
- Return type:
str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame
- calcpy.str.removesuffix(value, /, suffix)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
suffix (str)
- Return type:
str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame
- calcpy.str.replace(value, pattern, new=None, /, count=inf)[source]¶
- Parameters:
value (str | list[str] | tuple[str] | pd.Series[str]) – string to replace
pattern (str | dict) – old string, or a dict from old string to new string
new (str | None) – new string if
pattern
is the old stringcount (int | inf) – maximum number of replacements. Default value (inf) means do not limit the number of replacements. 0 means disabling replacements.
- Returns:
string replaced.
Notes
The parameters differ from either the built-in
str.replace
method orpd.Series.str.replace
method.Examples
>>> replace('Hello World', 'World', 'Earth') 'Hello Earth' >>> replace(['Hello', 'World'], 'World', 'Earth') ['Hello', 'Earth'] >>> import pandas as pd >>> replace(pd.Series(['Hello', 'World']), 'World', 'Earth') 0 Hello 1 Earth dtype: object >>> replace('aaaa', {'a': 'b'}, count=0) 'aaaa' >>> replace('aaaa', {'a': 'b'}, count=2) 'bbaa'
- calcpy.str.rfind(value, /, sub, start=0, end=None)[source]¶
- Parameters:
value (str | list[str] | tuple[str] | pd.Series[str]) – string to find
sub (str) – substring to find
start (int) – start index
end (int) – end in
- Returns:
Found indices.
Examples
>>> rfind('Hello World', 'World') 6 >>> rfind(['Hello', 'World'], 'World') [-1, 0] >>> import pandas as pd >>> rfind(pd.Series(['Hello', 'World']), 'World') 0 -1 1 0 dtype: int64
- calcpy.str.rindex(value, /, sub, start=0, end=None)[source]¶
- Parameters:
value (str | bytes | bytearray | list | tuple | pd.Series | pd.DataFrame)
sub (str) – substring to index
start (int) – start index
end (int) – end index
- Returns:
indices
- Raises:
ValueError – if
sub
is not found
Examples
>>> rindex('Hello World', 'World') 6 >>> rindex(['Hello', 'World'], 'World') Traceback (most recent call last): ValueError: substring not found >>> import pandas as pd >>> rindex(pd.Series(['Hello', 'World']), 'World') Traceback (most recent call last): ValueError: substring not found
- calcpy.str.rsplit(value, /, sep=None, maxsplit=-1, minsplit=0, fillvalue=None, expand=False)[source]¶
Split string by separator.
- Parameters:
s (str | pd.Series) – string to split
sep (str, optional) – separator. By default split on whitespace
maxsplit (int) – maximum number of splits
minsplit (int) – minimum number of splits
fillvalue (Optional) – fill value if not enough splits
expand (bool) – expand result
pd.Series
topd.DataFrame
. Can beTrue
only when input is anpd.Series
- Returns:
Splitted strings
Examples
>>> rsplit('abc def ghi') ['abc', 'def', 'ghi'] >>> rsplit('abc def ghi', ' ', maxsplit=1) ['abc def', 'ghi'] >>> rsplit('abc def ghi', ' ', minsplit=2) ['abc', 'def', 'ghi'] >>> rsplit('abc def ghi', ' ', minsplit=4, fillvalue="") ['abc', 'def', 'ghi', ''] >>> rsplit(pd.Series(['abc def', 'ABC']), ' ', minsplit=3, fillvalue="", expand=True) 0 1 2 0 abc def 1 ABC
- calcpy.str.shorten(text, width, **kwargs)[source]¶
Collapse and truncate the given text to fit in the given width.
The text first has its whitespace collapsed. If it then fits in the width, it is returned as is. Otherwise, as many words as possible are joined and then the placeholder is appended:
>>> textwrap.shorten("Hello world!", width=12) 'Hello world!' >>> textwrap.shorten("Hello world!", width=11) 'Hello [...]'
- calcpy.str.split(value, /, sep=None, maxsplit=-1, minsplit=0, fillvalue=None, expand=False)[source]¶
Split string by separator.
- Parameters:
value (str | pd.Series) – string to split
sep (str, optional) – separator. By default split on whitespace
maxsplit (int) – maximum number of splits
minsplit (int) – minimum number of splits
fillvalue (Optional) – fill value if not enough splits
expand (bool) – expand result
pd.Series
topd.DataFrame
. Can beTrue
only when input is anpd.Series
- Returns:
Splitted strings
Examples
>>> split('abc def ghi') ['abc', 'def', 'ghi'] >>> split('abc def ghi', ' ', maxsplit=1) ['abc', 'def ghi'] >>> split('abc def ghi', ' ', minsplit=2) ['abc', 'def', 'ghi'] >>> split('abc def ghi', ' ', minsplit=4, fillvalue="") ['abc', 'def', 'ghi', ''] >>> split(pd.Series(['abc def', 'ABC']), ' ', minsplit=3, fillvalue="", expand=True) 0 1 2 0 abc def 1 ABC
- calcpy.str.sub(value, pattern, new=None, count=inf, flags=0)[source]¶
Replace using regex.
- Parameters:
value (str | list[str] | tuple[str] | pd.Series[str]) – string to replace
pattern (str | dict) – old pattern, or a dict from old string to new string
new (str | None) – new string if
pattern
is the old stringcount (int | inf) – maximum number of replacements. By default, there is no limit on the number of replacements. 0 means disabling replacements.
flags (re.RegexFlag)
- Returns:
string replaced.
Notes
The parameters differ from the built-in
re.sub
method.Examples
>>> sub('Hello World', 'World', 'Earth') 'Hello Earth' >>> sub(['Hello', 'World'], 'World', 'Earth') ['Hello', 'Earth'] >>> import pandas as pd >>> sub(pd.Series(['Hello', 'World']), 'World', 'Earth') 0 Hello 1 Earth dtype: object >>> sub('aaaa', {'a': 'b'}, count=0) 'aaaa' >>> sub('aaaa', {'a': 'b'}, count=2) 'bbaa'
- calcpy.str.wrap(text, width=70, **kwargs)[source]¶
Wrap a single paragraph of text, returning a list of wrapped lines.
Reformat the single paragraph in ‘text’ so it fits in lines of no more than ‘width’ columns, and return a list of wrapped lines. By default, tabs in ‘text’ are expanded with string.expandtabs(), and all other whitespace characters (including newline) are converted to space. See TextWrapper class for available keyword args to customize wrapping behaviour.