Pandas Operations¶
Operations on Series and DataFrames¶
- calcpy.pd.extend(frame, /, labels=None, *, index=None, columns=None, axis=None, **kwargs)[source]¶
Add index values if the index values are not present.
This API is simliar to
pd.DataFrame.reindex()
.- Parameters:
frame (pd.Series | pd.DataFrame) – Input data.
labels (list | tuple, optional) – New labels / index to conform the axis specified by.
index (list | tuple, optional) – index names.
columns (list | tuple, optional) – column names. only work for DataFrame.
axis (int | str, optional) – axis to extend. 0: index, 1: columns. only work for DataFrame.
kwargs – keyword arguments to be passed to
pd.DataFrame.reindex()
, includingcopy
,level
,fill_value
,limit
, andtolerance
.
- Return type:
pd.Series | pd.DataFrame
Example
>>> import pandas as pd >>> s = pd.Series(1, index=[0, 1]) >>> extend(s, index=[1, 2]) 0 1.0 1 1.0 2 NaN dtype: float64 >>> df = pd.DataFrame({"A": 1, "B": 2}, index=[0, 1]) >>> extend(df, index=[1, 2], columns=["A", "C"]) A B C 0 1.0 2.0 NaN 1 1.0 2.0 NaN 2 NaN NaN NaN
- calcpy.pd.prioritize(frame, /, labels=None, *, index=None, columns=None, axis=None, **kwargs)[source]¶
Put some index values at the begining of the index.
If the index is already in the index, the index will be moved to the begining. If the index is not in the index, the index will be added to the index.
This API is simliar to
pd.Series.reindex()
andpd.DataFrame.reindex()
.- Parameters:
frame (pd.Series | pd.DataFrame) – Input data.
labels (list | tuple, optional) – New labels / index to conform the axis specified by
index (list | tuple, optional) – index names
columns (list | tuple, optional) – column names. only work for DataFrame.
axis (int | str, optional) – axis to extend. 0: index, 1: columns. only work for DataFrame.
kwargs – keyword arguments to be passed to
pd.DataFrame.reindex()
, includingcopy
,level
,fill_value
,limit
, andtolerance
.
- Return type:
pd.Series | pd.DataFrame
Example
>>> import pandas as pd >>> s = pd.Series(1, index=[0, 1]) >>> prioritize(s, index=[1, 2]) 1 1.0 2 NaN 0 1.0 dtype: float64 >>> df = pd.DataFrame({"A": 1, "B": 2}, index=[0, 1]) >>> prioritize(df, index=[1, 2], columns=["A", "C"]) A C B 1 1.0 NaN 2.0 2 NaN NaN NaN 0 1.0 NaN 2.0
- calcpy.pd.stack(frame, /, **kwargs)[source]¶
Stack a
pd.Series
orpd.DataFrame
withfuture_stack
behavior.Stack and silence the
FutureWarning
“The prevoius implementation of stack is deprecated”.- Parameters:
frame (pd.DataFrame)
**kwargs – Keyword arguments to be passed to
pd.DataFrame.stack()
.
- Return type:
pd.Series | pd.DataFrame
Examples
>>> import pandas as pd >>> df = pd.DataFrame({"A": 1, "B": 2}, index=[0]) >>> stack(df) 0 A 1 B 2 dtype: int64
Calculation on Series and DataFrames¶
- calcpy.pd.mdd(inputs)[source]¶
Maximum drawdown.
- Parameters:
inputs (pd.Series | pd.DataFrame) – Input time series (not difference).
- Return type:
float | pd.Series
Examples
Calculate maximum drawdown for a DataFrame.
>>> from math import nan >>> import pandas as pd >>> data = {"___": [nan, nan, nan], ... "1__": [1.0, nan, nan], ... "_1_": [nan, 1.0, nan], ... "__1": [nan, nan, 1.0], ... "12_": [1.0, 2.0, nan], ... "21_": [2.0, 1.0, nan], ... "1_2": [1.0, nan, 2.0], ... "2_1": [2.0, nan, 1.0], ... "_12": [nan, 1.0, 2.0], ... "_21": [nan, 2.0, 1.0], ... "123": [1.0, 2.0, 3.0], ... "132": [1.0, 3.0, 2.0], ... "213": [2.0, 1.0, 3.0], ... "231": [2.0, 3.0, 1.0], ... "312": [3.0, 1.0, 2.0], ... "321": [3.0, 2.0, 1.0]} >>> df = pd.DataFrame(data, index=pd.date_range("2000-01-01", "2000-01-03")) >>> mdd(df) ___ NaN 1__ 0.0 _1_ 0.0 __1 0.0 12_ 0.0 21_ 1.0 1_2 0.0 2_1 1.0 _12 0.0 _21 1.0 123 0.0 132 1.0 213 1.0 231 2.0 312 2.0 321 2.0 dtype: float64
Calculate MDD for a Series.
>>> mdd(pd.Series([4, 2, 3, 1, 4])) np.int64(3)
Empty inputs.
>>> df = pd.DataFrame(columns=["A"]) >>> mdd(df) A NaN dtype: object
- calcpy.pd.mdd_recover(inputs, fillinf=None)[source]¶
Recovery duration for maximum drawdown.
- Parameters:
inputs (pd.Series | pd.DataFrame) – Input time series (not difference).
fillinf (optional) – Value for duration that the drawdown is not recovered.
- Returns:
- Recovery durations are shown in the places where the maximum drawdown begins.
Show NaN in other places. Results can be furthered processed with operations such as
mean
andmax
to get the average duration and the max duration.
- Return type:
pd.Series | pd.DataFrame
Examples
>>> from math import nan >>> import pandas as pd >>> data = {"___": [nan, nan, nan], ... "1__": [1.0, nan, nan], ... "_1_": [nan, 1.0, nan], ... "__1": [nan, nan, 1.0], ... "12_": [1.0, 2.0, nan], ... "21_": [2.0, 1.0, nan], ... "1_2": [1.0, nan, 2.0], ... "2_1": [2.0, nan, 1.0], ... "_12": [nan, 1.0, 2.0], ... "_21": [nan, 2.0, 1.0], ... "123": [1.0, 2.0, 3.0], ... "132": [1.0, 3.0, 2.0], ... "213": [2.0, 1.0, 3.0], ... "231": [2.0, 3.0, 1.0], ... "312": [3.0, 1.0, 2.0], ... "321": [3.0, 2.0, 1.0]} >>> df = pd.DataFrame(data, index=pd.date_range("2000-01-01", "2000-01-03")) >>> with pd.option_context("display.max_rows", None, "display.max_columns", None): ... mdd_recover(df) ___ 1__ _1_ __1 12_ 21_ 1_2 \ 2000-01-01 NaT NaT NaT NaT NaT 106751 days 23:47:16.854775807 NaT 2000-01-02 NaT NaT NaT NaT NaT NaT NaT 2000-01-03 NaT NaT NaT NaT NaT NaT NaT 2_1 _12 _21 \ 2000-01-01 106751 days 23:47:16.854775807 NaT NaT 2000-01-02 NaT NaT 106751 days 23:47:16.854775807 2000-01-03 NaT NaT NaT 123 132 213 \ 2000-01-01 NaT NaT 2 days 2000-01-02 NaT 106751 days 23:47:16.854775807 NaT 2000-01-03 NaT NaT NaT 231 312 \ 2000-01-01 NaT 106751 days 23:47:16.854775807 2000-01-02 106751 days 23:47:16.854775807 NaT 2000-01-03 NaT NaT 321 2000-01-01 106751 days 23:47:16.854775807 2000-01-02 NaT 2000-01-03 NaT
Conventions between Series/DataFrames and Dictionaries¶
- calcpy.convert_nested_dict_to_dataframe(data, /, *, index_cols=None, columns=None)[source]¶
Convert a nested dictionary to a
pd.DataFrame
.- Parameters:
data (dict) – Nested dict.
index_cols (int | str | (list | tuple)[str], optional) – Index names.
columns (int | (list | tuple)[str]], optional) – Column names.
- Return type:
pd.DataFrame
Example
>>> data = {"A": {"H": 1, "J": 2}, "E": {"D": 3, "T": 4}} >>> convert_nested_dict_to_dataframe(data) 0 1 2 0 A H 1 1 A J 2 2 E D 3 3 E T 4 >>> convert_nested_dict_to_dataframe(data, index_cols=["v", "c"], columns=["x"]) x v c A H 1 J 2 E D 3 T 4
- calcpy.convert_series_to_nested_dict(series, /)[source]¶
Convert a
pd.Series
to a nested dictionary.- Parameters:
series (pd.Series)
- Return type:
dict
Example
>>> import pandas as pd >>> s = pd.DataFrame({"A": 1, "B": [2, 3], "C": [4, 5]}).set_index(["A", "B"])["C"] >>> convert_series_to_nested_dict(s) {1: {2: 4, 3: 5}}