Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent function inspection for @decorated functions #2246

Merged
merged 5 commits into from
Jan 24, 2025

Conversation

rashidakanchwala
Copy link
Contributor

@rashidakanchwala rashidakanchwala commented Jan 22, 2025

Description

Resolves #2240

When a function is decorated, the decorator often wraps the original function with additional logic. Some decorators, such as @pandera.check_output, use functools.wraps, which preserves the original function’s metadata. This allows inspection tools like inspect.getsource to return the code of the original function instead of the wrapper.

However, decorators that do not use functools.wraps replace the original function entirely with the wrapper. As a result, inspection tools will return the decorator code instead of the original function's code, which can cause confusion.

This PR addresses this inconsistency by:

  • Checking if the decorated function has the __wrapped__ attribute (set by functools.wraps) and returning the original function if it exists.
  • If the function does not have __wrapped__ falling back to inspecting the closure to retrieve the original function, if available.

QA notes

  • In demo_project, replace feature_engineering/nodes.py with the below files which consists of both user created decorator and pandera decorator. Run kedro viz with the old code and you will see how one @pandera.decorator displays decorator when you do Show Code and the user created decorator displays the original code.
  • Pull the changes of this PR, and try again and you will see both returning the original code when you do `Show Code'
Click to expand the the nodes.py file
"""
This is a boilerplate pipeline 'feature_engineering'
generated using Kedro 0.18.1
"""

from functools import reduce
from typing import Dict, List, Any

import numpy as np
import pandas as pd
import pandera as pa

ID_COLS_TO_NUMERIC = ["cust_id", "customer_id"]

def cast_df(func: callable) -> callable:
    """Decorator to cast DataFrame ID columns to numeric and date columns to datetime."""
    def _new_fun(*args, **kwargs):
        args = [_cast_if_dataframe(a) for a in args]
        kwargs = {k: _cast_if_dataframe(v) for k, v in kwargs.items()}
        return func(*args, **kwargs)

    _new_fun.__name__ = func.__name__
    return _new_fun


def _cast_if_dataframe(df: Any) -> Any:
    if isinstance(df, pd.DataFrame):
        df = _cast_df_columns(df)
    return df


def _cast_df_columns(df: pd.DataFrame) -> pd.DataFrame:
    """Casts columns: date columns and ID columns."""
    date_columns = [col for col in df.columns if "date" in col.lower()]
    id_columns = [col for col in df.columns if col.lower() in ID_COLS_TO_NUMERIC]
    for date in date_columns:
        df[date] = pd.to_datetime(df[date], errors="coerce")
    for col in id_columns:
        df[col] = pd.to_numeric(df[col], errors="coerce")
    return df



class FeatureSchema(pa.DataFrameSchema):
    """Define a schema for validating feature importance."""
    columns = {
        "Feature": pa.Column(str),
        "Score": pa.Column(float, checks=pa.Check.ge(0))
    }


def _get_id_columns(data: pd.DataFrame) -> List[str]:
    return [x for x in data.columns if x.endswith("_id")]

@cast_df
def create_static_features(data: pd.DataFrame, column_names: List[str]) -> pd.DataFrame:
    """This function accepts a pandas DataFrame as well as a list of
    columns to keep in scope. A DataFrame limited to any ID columns as well
    as the provided column names will be returned

    Args:
        data (pd.DataFrame): The DataFrame to process
        column_names (List[str]): The column names to keep in scope

    Returns:
        (pd.DataFrame): The limited DataFrame to return
    """
    id_columns = _get_id_columns(data)
    columns_to_select = id_columns + column_names
    return data[columns_to_select]


def _create_metric_column(
    data: pd.DataFrame,
    column_a: str,
    column_b: str,
    numpy_method: str,
    conjunction: str,
) -> pd.DataFrame:
    """This method will retrieve a numpy function, combine two columns and make
    a new column. it then retruns a new DataFrame which is the available ID
    columns plus the new column.

    Args:
        data (pd.DataFrame): The DataFrame to work with
        column_a (str): The left operand to the numpy function
        columb_b (str): The right operand to the numpy function
        numpy_method (str): The numpy function to use such as `numpy.divide`
        conjunction (str): This is used to name the new column
            i.e. {a}_{conjunction}_{b}

    Returns:
        pd.DataFrame: A new feature table
    """
    column_operation = getattr(np, numpy_method)
    new_column = column_operation(data[column_a], data[column_b])
    id_columns = _get_id_columns(data=data)
    working_df = data[id_columns]
    working_df = working_df.assign(**{f"{column_a}_{conjunction}_{column_b}": new_column})
    return working_df


def create_derived_features(
    spine_df: pd.DataFrame, data: pd.DataFrame, derived_params: Dict[str, str]
) -> pd.DataFrame:
    """[summary]

    Args:
        spine_df (pd.DataFrame): [description]
        data (pd.DataFrame): [description]
        derived_params (Dict[str, str]): [description]
    """
    new_columns = [_create_metric_column(data, **kwargs) for kwargs in derived_params]
    combined_df = joiner(spine_df, *new_columns)
    return combined_df


def joiner(spine_df: pd.DataFrame, *dfs: pd.DataFrame) -> pd.DataFrame:
    """This function takes an arbitrary number of DataFrames and will
    keep left-joining them to themselves along any columns suffixed
    with "id". There is an assumption that the tables passed in share
    the same identifiers and grain.

    Args:
        spine_df (pd.DataFrame): The first argument should simply contain
        the identifier columns at the correct grain.
        *dfs (pd.DataFrame): Any subsequent tables are joined to the spine

    Returns:
        pd.DataFrame: A single data-frame where all inputs to this function
            have been left joined together.
    """
    id_columns = _get_id_columns(data=spine_df)

    merged_dfs = reduce(
        lambda df, df2: df.merge(df2, on=id_columns, how="left"), dfs, spine_df
    )
    # Confirm that the number of rows is unchanged after the operation has completed
    assert spine_df.shape[0] == merged_dfs.shape[0]
    return merged_dfs


@pa.check_output(FeatureSchema)
def create_feature_importance(data: pd.DataFrame) -> pd.DataFrame:
    feature_name = [f"feature_{i}" for i in range(15)]
    feature_score = np.random.rand(15)
    feature_importance_df = pd.DataFrame(
        {"Feature": feature_name, "Score": feature_score}
    )

    return feature_importance_df

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com>
Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com>
Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com>
@ravi-kumar-pilla ravi-kumar-pilla self-requested a review January 22, 2025 17:51
Copy link
Contributor

@SajidAlamQB SajidAlamQB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rashida, LGTM.

nit: maybe we should add a test for non-wraps Decorators too?

@rashidakanchwala
Copy link
Contributor Author

Thanks Rashida, LGTM.

nit: maybe we should add a test for non-wraps Decorators too?

both tests are there, there's one for non-wraps which was already there. i added one for wrap

Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well !! should we add to release note ?
Thank you @rashidakanchwala

Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com>
@rashidakanchwala rashidakanchwala merged commit 160dbec into main Jan 24, 2025
37 checks passed
@rashidakanchwala rashidakanchwala deleted the fix-decorator-issue branch January 24, 2025 12:48
@SajidAlamQB SajidAlamQB mentioned this pull request Jan 30, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node code show decorators instead of the node itself
3 participants