Pandas Operations

Operations on Series and DataFrames

calcpy.pd.extend(frame, /, labels=None, *, index=None, columns=None, axis=None, **kwargs)[source]

Add index values if the index values are not present.

This API is simliar to pd.DataFrame.reindex().

Parameters:
  • frame (pd.Series | pd.DataFrame) – Input data.

  • labels (list | tuple, optional) – New labels / index to conform the axis specified by.

  • index (list | tuple, optional) – index names.

  • columns (list | tuple, optional) – column names. only work for DataFrame.

  • axis (int | str, optional) – axis to extend. 0: index, 1: columns. only work for DataFrame.

  • kwargs – keyword arguments to be passed to pd.DataFrame.reindex(), including copy, level, fill_value, limit, and tolerance.

Return type:

pd.Series | pd.DataFrame

Example

>>> import pandas as pd
>>> s = pd.Series(1, index=[0, 1])
>>> extend(s, index=[1, 2])
0    1.0
1    1.0
2    NaN
dtype: float64
>>> df = pd.DataFrame({"A": 1, "B": 2}, index=[0, 1])
>>> extend(df, index=[1, 2], columns=["A", "C"])
      A    B    C
0   1.0  2.0  NaN
1   1.0  2.0  NaN
2   NaN  NaN  NaN
calcpy.pd.prioritize(frame, /, labels=None, *, index=None, columns=None, axis=None, **kwargs)[source]

Put some index values at the begining of the index.

If the index is already in the index, the index will be moved to the begining. If the index is not in the index, the index will be added to the index.

This API is simliar to pd.Series.reindex() and pd.DataFrame.reindex().

Parameters:
  • frame (pd.Series | pd.DataFrame) – Input data.

  • labels (list | tuple, optional) – New labels / index to conform the axis specified by

  • index (list | tuple, optional) – index names

  • columns (list | tuple, optional) – column names. only work for DataFrame.

  • axis (int | str, optional) – axis to extend. 0: index, 1: columns. only work for DataFrame.

  • kwargs – keyword arguments to be passed to pd.DataFrame.reindex(), including copy, level, fill_value, limit, and tolerance.

Return type:

pd.Series | pd.DataFrame

Example

>>> import pandas as pd
>>> s = pd.Series(1, index=[0, 1])
>>> prioritize(s, index=[1, 2])
1    1.0
2    NaN
0    1.0
dtype: float64
>>> df = pd.DataFrame({"A": 1, "B": 2}, index=[0, 1])
>>> prioritize(df, index=[1, 2], columns=["A", "C"])
     A   C    B
1  1.0 NaN  2.0
2  NaN NaN  NaN
0  1.0 NaN  2.0
calcpy.pd.stack(frame, /, **kwargs)[source]

Stack a pd.Series or pd.DataFrame with future_stack behavior.

Stack and silence the FutureWarning “The prevoius implementation of stack is deprecated”.

Parameters:
  • frame (pd.DataFrame)

  • **kwargs – Keyword arguments to be passed to pd.DataFrame.stack().

Return type:

pd.Series | pd.DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"A": 1, "B": 2}, index=[0])
>>> stack(df)
0 A  1
  B  2
dtype: int64

Calculation on Series and DataFrames

calcpy.pd.mdd(inputs)[source]

Maximum drawdown.

Parameters:

inputs (pd.Series | pd.DataFrame) – Input time series (not difference).

Return type:

float | pd.Series

Examples

Calculate maximum drawdown for a DataFrame.

>>> from math import nan
>>> import pandas as pd
>>> data = {"___": [nan, nan, nan],
...         "1__": [1.0, nan, nan],
...         "_1_": [nan, 1.0, nan],
...         "__1": [nan, nan, 1.0],
...         "12_": [1.0, 2.0, nan],
...         "21_": [2.0, 1.0, nan],
...         "1_2": [1.0, nan, 2.0],
...         "2_1": [2.0, nan, 1.0],
...         "_12": [nan, 1.0, 2.0],
...         "_21": [nan, 2.0, 1.0],
...         "123": [1.0, 2.0, 3.0],
...         "132": [1.0, 3.0, 2.0],
...         "213": [2.0, 1.0, 3.0],
...         "231": [2.0, 3.0, 1.0],
...         "312": [3.0, 1.0, 2.0],
...         "321": [3.0, 2.0, 1.0]}
>>> df = pd.DataFrame(data, index=pd.date_range("2000-01-01", "2000-01-03"))
>>> mdd(df)
___    NaN
1__    0.0
_1_    0.0
__1    0.0
12_    0.0
21_    1.0
1_2    0.0
2_1    1.0
_12    0.0
_21    1.0
123    0.0
132    1.0
213    1.0
231    2.0
312    2.0
321    2.0
dtype: float64

Calculate MDD for a Series.

>>> mdd(pd.Series([4, 2, 3, 1, 4]))  
np.int64(3)

Empty inputs.

>>> df = pd.DataFrame(columns=["A"])
>>> mdd(df)
A    NaN
dtype: object
calcpy.pd.mdd_recover(inputs, fillinf=None)[source]

Recovery duration for maximum drawdown.

Parameters:
  • inputs (pd.Series | pd.DataFrame) – Input time series (not difference).

  • fillinf (optional) – Value for duration that the drawdown is not recovered.

Returns:

Recovery durations are shown in the places where the maximum drawdown begins.

Show NaN in other places. Results can be furthered processed with operations such as mean and max to get the average duration and the max duration.

Return type:

pd.Series | pd.DataFrame

Examples

>>> from math import nan
>>> import pandas as pd
>>> data = {"___": [nan, nan, nan],
...         "1__": [1.0, nan, nan],
...         "_1_": [nan, 1.0, nan],
...         "__1": [nan, nan, 1.0],
...         "12_": [1.0, 2.0, nan],
...         "21_": [2.0, 1.0, nan],
...         "1_2": [1.0, nan, 2.0],
...         "2_1": [2.0, nan, 1.0],
...         "_12": [nan, 1.0, 2.0],
...         "_21": [nan, 2.0, 1.0],
...         "123": [1.0, 2.0, 3.0],
...         "132": [1.0, 3.0, 2.0],
...         "213": [2.0, 1.0, 3.0],
...         "231": [2.0, 3.0, 1.0],
...         "312": [3.0, 1.0, 2.0],
...         "321": [3.0, 2.0, 1.0]}
>>> df = pd.DataFrame(data, index=pd.date_range("2000-01-01", "2000-01-03"))
>>> with pd.option_context("display.max_rows", None, "display.max_columns", None):
...     mdd_recover(df)
           ___ 1__ _1_ __1 12_                            21_ 1_2  \
2000-01-01 NaT NaT NaT NaT NaT 106751 days 23:47:16.854775807 NaT
2000-01-02 NaT NaT NaT NaT NaT                            NaT NaT
2000-01-03 NaT NaT NaT NaT NaT                            NaT NaT

                                      2_1 _12                            _21  \
2000-01-01 106751 days 23:47:16.854775807 NaT                            NaT
2000-01-02                            NaT NaT 106751 days 23:47:16.854775807
2000-01-03                            NaT NaT                            NaT

           123                            132    213  \
2000-01-01 NaT                            NaT 2 days
2000-01-02 NaT 106751 days 23:47:16.854775807    NaT
2000-01-03 NaT                            NaT    NaT

                                      231                            312  \
2000-01-01                            NaT 106751 days 23:47:16.854775807
2000-01-02 106751 days 23:47:16.854775807                            NaT
2000-01-03                            NaT                            NaT

                                      321
2000-01-01 106751 days 23:47:16.854775807
2000-01-02                            NaT
2000-01-03                            NaT

Conventions between Series/DataFrames and Dictionaries

calcpy.convert_nested_dict_to_dataframe(data, /, *, index_cols=None, columns=None)[source]

Convert a nested dictionary to a pd.DataFrame.

Parameters:
  • data (dict) – Nested dict.

  • index_cols (int | str | (list | tuple)[str], optional) – Index names.

  • columns (int | (list | tuple)[str]], optional) – Column names.

Return type:

pd.DataFrame

Example

>>> data = {"A": {"H": 1, "J": 2}, "E": {"D": 3, "T": 4}}
>>> convert_nested_dict_to_dataframe(data)
   0  1  2
0  A  H  1
1  A  J  2
2  E  D  3
3  E  T  4
>>> convert_nested_dict_to_dataframe(data, index_cols=["v", "c"], columns=["x"])
     x
v c
A H  1
  J  2
E D  3
  T  4
calcpy.convert_series_to_nested_dict(series, /)[source]

Convert a pd.Series to a nested dictionary.

Parameters:

series (pd.Series)

Return type:

dict

Example

>>> import pandas as pd
>>> s = pd.DataFrame({"A": 1, "B": [2, 3], "C": [4, 5]}).set_index(["A", "B"])["C"]
>>> convert_series_to_nested_dict(s)
{1: {2: 4, 3: 5}}