Full model: Issue with the fit() function when using naive 3-step and 3-step BCH approaches #39

FelixLaliberte · 2023-06-20T20:09:28Z

When using the fit() function on a full model (with mixed descriptors), there seems to be an issue with the naive 3-step and 3-step BCH approaches. With the 1-step, 2-step and 3-step ML approaches, there is no such issue.

However, using the bootstrap() function, there are no error messages with any approach.

Here is an example with the Iris dataset:

#packages
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.metrics import rand_score
from stepmix.stepmix import StepMix
from stepmix.utils import get_mixed_descriptor
from stepmix.bootstrap import bootstrap

#IRIS in a dataframe
data, target = load_iris(return_X_y=True, as_frame=True)

#Create categorical and binary data based on the Iris data quantiles
for c in data:
c_categorical = c.replace("cm", "cat")
data[c_categorical] = pd.qcut(data[c], q=3).cat.codes
c_binary = c.replace("cm", "binary")
data[c_binary] = pd.qcut(data[c], q=2).cat.codes

#Create a fake covariate
data['Total length (cm)'] = data["sepal length (cm)"] + data["petal length (cm)"]

#Add missing values in all variables, except the covariate
#Replace 50% of values with missing values
for i, c in enumerate(data.columns):
if c != 'Total length (cm)':
data[c] = data[c].sample(frac=.5, random_state=42*i)

#Measurement model definition
mm_data, mm_descriptor = get_mixed_descriptor(
dataframe=data,
continuous_nan=['sepal length (cm)', 'sepal width (cm)'],
binary_nan=['sepal length (binary)', 'sepal width (binary)'],
categorical_nan=['sepal length (cat)', 'sepal width (cat)'],
)

#Structural model definition
sm_data, sm_descriptor = get_mixed_descriptor(
dataframe=data,
#Covariate
covariate=['Total length (cm)'],
#Outcomes
continuous_nan=['petal length (cm)', 'petal width (cm)'],
binary_nan=['petal length (binary)', 'petal width (binary)'],
categorical_nan=['petal length (cat)', 'petal width (cat)'],
)

#models to be fit
model1 = StepMix(n_components=3,
measurement=mm_descriptor,
structural=sm_descriptor,
verbose=1,
random_state=123,
n_steps=3) #naive using bootstrap

model2 = StepMix(n_components=3,
measurement=mm_descriptor,
structural=sm_descriptor,
verbose=1,
random_state=123,
n_steps=3) #naive using fit()

model3 = StepMix(n_components=3,
measurement=mm_descriptor,
structural=sm_descriptor,
verbose=1,
random_state=123,
n_steps=3, correction='BCH') #BCH using fit()

model4 = StepMix(n_components=3,
measurement=mm_descriptor,
structural=sm_descriptor,
verbose=1,
random_state=123,
n_steps=3, correction='ML') #ML using fit

#results
model1, bootstrapped_params = bootstrap(model1, mm_data, sm_data, n_repetitions=10) #OK

model2.fit(mm_data, sm_data) #error

model3.fit(mm_data, sm_data) #error

model4.fit(mm_data, sm_data) #OK

#39

FelixLaliberte added the bug Something isn't working label Jun 20, 2023

sachaMorin added a commit that referenced this issue Jun 22, 2023

Test for dataframe with 3-step uncorrected and BCH

a789b74

#39

sachaMorin closed this as completed in bbd179f Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full model: Issue with the fit() function when using naive 3-step and 3-step BCH approaches #39

Full model: Issue with the fit() function when using naive 3-step and 3-step BCH approaches #39

FelixLaliberte commented Jun 20, 2023

Full model: Issue with the fit() function when using naive 3-step and 3-step BCH approaches #39

Full model: Issue with the fit() function when using naive 3-step and 3-step BCH approaches #39

Comments

FelixLaliberte commented Jun 20, 2023