Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Review model documentation strings #898

Closed
ablaom opened this issue Jan 27, 2022 · 3 comments
Closed

[Discussion] Review model documentation strings #898

ablaom opened this issue Jan 27, 2022 · 3 comments
Labels

Comments

@ablaom
Copy link
Member

ablaom commented Jan 27, 2022

Generally MLJ has taken the point-of-view that model documentation is the responsibility of the third-party package providing the model. The main reason has been to reduce duplication of documentation and the maintenance burden for MLJ. While I think this is fair enough, there are two basic problems the status quo does not address:

  1. The MLJ user may not be abler quickly find the 3rd party pkg documentation she needs. This is for a number of reasons, which include:

    • The MLJ model struct may only be a wrapper for the the 3rd party pkg object to which a helpful docstring is attached
    • The 3rd party pkg may not even provide such a docstring
    • Although the model trait pkg_url has a link to the 3rd party pkg repo, users are likely unaware of this fact, and it's not that convenient to dig this up and paste in a browser
    • The trait docstring, providing a separate "interface" docstring is similarly inconspicuous, with a fallback that isn't terribly helpful.
  2. It's difficult to understand a model's data type requirements by inspecting scitype traits. For example, we have input_scitype(DecisionTreeeClassifier()) = Table{var"#s28"} where var"#s28"<:Union{AbstractVector{var"#s29"} where var"#s29"<:Continuous, AbstractVector{var"#s29"} where var"#s29"<:Count, AbstractVector{var"#s29"} where var"#s29"<:OrderedFactor} . This means the tree accepts tabular data with columns of Continuous, Count or OrderedFactor, but this is very hard to grep from the string. Only die-hard novices would be able to extract this information! I suspect this is the most serious user-unfriendly feature of MLJ for beginners, struggling to understand warnings thrown by the machine(model, X, y) constructor.

A couple of ideas:

  • For 2, we review the docstring traits of models to ensure they include data type requirements, written in English. When a data requirements are not satisfied in the constructor machine(model, data...) then we quote this docstring.
  • To help with 1, each new concrete subtype of Model should get a docstring which quotes the docstring of the appropriate 3rd party package object

Other suggestions or comments anyone?

@ablaom
Copy link
Member Author

ablaom commented Jan 28, 2022

Might be useful for opening documentation in a github repo: https://github.com/tpapp/DefaultApplication.jl

@ablaom
Copy link
Member Author

ablaom commented Feb 11, 2022

After further reflection and one offline discussion, it seems that we should, after all:

  • Maintain a detailed julia doc-string for every concrete subtype of Model
  • Make this string the fallback value for the docstring trait, to make this full string available in the (searchable) model metadata.

It makes sense to standardise model docstrings, and document what that standard is. To that end I am opening an separate issue to invite feedback on one proposal for such a standard:

#901

@ablaom ablaom added the docs label Feb 11, 2022
@ablaom
Copy link
Member Author

ablaom commented Mar 1, 2022

Closing in favour of #901

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant