Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feols() incorrectly says FE's are colinear with regressors. #538

Open
ja-ortiz-uniandes opened this issue Feb 7, 2025 · 9 comments
Open

Comments

@ja-ortiz-uniandes
Copy link

As always, thanks for developing such an incredible program. I use it almost daily!

Reproducible example

library(data.table)
library(fixest)

dt <- data.table(y = rnorm(100), x = rnorm(100), fe = rnorm(100, 1, 3))

feols(y ~ x | fe, dt)
# Error: in feols(y ~ x | fe, dt): 
# The only variable, 'x', is collinear with the fixed effects. Without doubt, your
# model is misspecified.

Expected behavior

Change the error message to include cases where there is one value per observation on the FE. This can either be done by amending the error such that it says "'x', is collinear with the fixed effects or there is not enough within-group variation in the FE variable" or preferably by including a new error message "insufficient within group variation in variable 'fe' to estimate fixed effects. Please check you model specification"

Why this matters

This is not a high priority issue. Fixing it would make using the program easier when you accidentally write an incorrect model specification. Sometimes, this can be rather difficult to diagnose. In particular if regressions are nested or inside functions, or both. In these cases formulas are typically constructed and it is not always obvious why the program failed. Correcting the error message makes the issue clearer and can save developers time in such situations (ehem...).

@ja-ortiz-uniandes
Copy link
Author

ja-ortiz-uniandes commented Feb 12, 2025

A small note to the comment above that I thought was clear but didn't mention explicitly, x is not collinear with fe.

@caleb-kwon
Copy link

I'm not sure why my comment disappeared, sorry. But what I meant was that you essentially have one fixed effect for every observation. There is no remaining variation to estimate the coefficient for x.

@Alejandro-Ortiz-WBG
Copy link

You are right. I'm just saying the error message doesn't reflect that fact and it would be helpful to developers if the error message described the problem better.

@caleb-kwon
Copy link

Right! That would be helpful.

@Alejandro-Ortiz-WBG
Copy link

Thank you!

@kylebutts
Copy link
Contributor

@Alejandro-Ortiz-WBG, the comment is not x is collinear with the fixed effect!

"Fixed effects" are indicator variables for each unique value of fe:

df <- data.frame(y = rnorm(3), x = rnorm(3), fe = rnorm(3, 1, 3))
model.matrix(~ 0 + x + factor(fe), data = df)
#>            x factor(fe)0.37770054215013 factor(fe)0.51974892031667
#> 1 -0.9281797                          1                          0
#> 2  0.6688002                          0                          1
#> 3  0.8449222                          0                          0
#>   factor(fe)1.18450468893623
#> 1                          0
#> 2                          0
#> 3                          1

The three indicators are collinear with x. I think you mean the numeric vector fe and x are not collinear, which is true (but not what you're trying to estimate).

This is a very edge case because it will only happen when each value of the incorrect fixed effect is unique. This error wouldn't happen if say fe was discrete but with many values. In other words, there's not a really good way to "detect" when a person shouldn't be using fixed effects

@ja-ortiz-uniandes
Copy link
Author

ja-ortiz-uniandes commented Feb 19, 2025

@kylebutts You are right that the error would not happen if there were more observations than FEs. The point is, however, the error message reads 'x', is collinear with the fixed effects which is inaccurate and thus leads to confusion when debugging.

@kylebutts
Copy link
Contributor

@ja-ortiz-uniandes You are mixing up the variable you called fe and the "fixed effects" which are a set of mutually exclusive indicator variables. 'x' is collinear with the fixed effects; 'x' is not collinear with the variable you called fe

@ja-ortiz-uniandes
Copy link
Author

ja-ortiz-uniandes commented Feb 19, 2025

@kylebutts Thanks, you are right. I do believe that adding clarity to the message would be useful, something along the lines of "As many FE as observations makes estimation not possible." Saying simply "x is collinear with the FEs" gives you the idea that the variable fe is collinear with x. Additionally, the message is just (slightly) incorrect,x is not collinear with the FEs, x is collinear with the FEs and the intercept.
To be clear, I appreciate the time it took to develop these custom error messages and know this required effort beyond simply saying "your model is rank deficient". Now that these messages have been incorporated, making sure they are easy to understand and accurate can further help developers when issues arise. A check of the sort if (length(unique(fe)) == NROW(dt)) stop(...) would further increase the usefulness of these errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants