Integrating Decorators#

Let’s talk about some functionality

This follows up on <15 Minutes to Mastery.

Hamilton relies on python decorators to enable easy code reuse. Taking the previous example, let’s say that we cared about the running average spend per signup with both a 2 and a 3 week lookback. Rather than writing a bunch of functions with almost exactly the same definitions, we can parametrize! The following uses two decorator to curry your nodes into multiple functions.

import pandas as pd

from hamilton import function_modifiers
from hamilton.function_modifiers import value, source


@function_modifiers.parameterize(
    avg_2wk_spend={'rolling_lookback' : value(2)},
    avg_3wk_spend={'rolling_lookback' : value(3)}
)
def avg_nwk_spend(spend: pd.Series, rolling_lookback: int) -> pd.Series:
    """Average marketing spend looking back {rolling_lookback} weeks."""
    return spend.rolling(rolling_lookback).mean()


@function_modifiers.parameterize(
    acquisition_cost_2wk={'spend' : source('avg_2wk_spend')},
    acquisition_cost_3wk={'spend' : source('avg_3wk_spend')}
)
def acquisition_cost(spend: pd.Series, signups: pd.Series) -> pd.Series:
    """The cost per signup in relation to {spend}."""
    return spend / signups

In this case we have two separate parameterizations:

  1. Parameterizing the value (currying the function) for lookback

  2. Parameterizing the source of the variable spend in acquisition_cost

All we have to do is modify our driver to run the right module and ask for the right outputs, and we’re good to go!

import logging
import sys

import pandas as pd

import with_decorators  # we import the module here!
from hamilton import driver

logger = logging.getLogger(__name__)
logging.basicConfig(stream=sys.stdout)

if __name__ == '__main__':
    # Instantiate a common spine for your pipeline
    index = pd.date_range("2022-01-01", periods=6, freq="w")
    initial_columns = {  # load from actuals or wherever -- this is our initial data we use as input.
        # Note: these do not have to be all series, they could be scalar inputs.
        'signups': pd.Series([1, 10, 50, 100, 200, 400], index=index),
        'spend': pd.Series([10, 10, 20, 40, 40, 50], index=index),
    }
    # we need to tell hamilton where to load function definitions from
    dr = driver.Driver(initial_columns, with_decorators)  # can pass in multiple modules
    # we need to specify what we want in the final dataframe.
    output_columns = [
        'spend',
        'signups',
        'acquisition_cost_2wk',
        'acquisition_cost_3wk',
    ]
    # let's create the dataframe!
    df = dr.execute(output_columns)
    # `pip install sf-hamilton[visualization]` earlier you can also do
    # dr.visualize_execution(output_columns,'./my_dag.dot', {})
    print(df)

Running the driver now gives you the following:

   spend  signups  acquisition_cost_2wk  acquisition_cost_3wk
0     10        1                   NaN                   NaN
1     10       10                1.0000                   NaN
2     20       50                0.3000              0.266667
3     40      100                0.3000              0.233333
4     40      200                0.2000              0.166667
5     50      400                0.1125              0.108333