Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate text file input. #9472

Open
trivialfis opened this issue Aug 11, 2023 · 3 comments
Open

Deprecate text file input. #9472

trivialfis opened this issue Aug 11, 2023 · 3 comments

Comments

@trivialfis
Copy link
Member

I propose we deprecate the use of text file input and remove the text parsers in XGBoost, including the libsvm parser and csv parser from dmlc core. Nowadays, there's a wealth of third-party libraries focus on feature engineering that can handle these formats with high efficiency. Loading the data inside XGBoost does not provide much value as users are likely need to perform tasks like cross-validation and hyper-parameter optimization.

At the moment, there are three use cases for the text input:

  • CLI. I propose the removal of the CLI in Deprecate the command line interface. #9471 .
  • External memory: We have largely replaced the external memory with a custom data iterator. Even with text input, the underlying implementation uses a data iterator.
  • Federated learning: I believe we will move toward memory input as we progress for better integration with frameworks like nvflare.
@trivialfis
Copy link
Member Author

Reminder: remove any reference in document including the c tutorial.

@saxena-vinayak36
Copy link

I think I can contribute to this aspect

@trivialfis
Copy link
Member Author

I think the work to deprecate it is onerous, might not be a good place to start your first PR. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants