-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Review model documentation strings #898
Comments
Might be useful for opening documentation in a github repo: https://github.com/tpapp/DefaultApplication.jl |
After further reflection and one offline discussion, it seems that we should, after all:
It makes sense to standardise model docstrings, and document what that standard is. To that end I am opening an separate issue to invite feedback on one proposal for such a standard: |
Closing in favour of #901 |
Generally MLJ has taken the point-of-view that model documentation is the responsibility of the third-party package providing the model. The main reason has been to reduce duplication of documentation and the maintenance burden for MLJ. While I think this is fair enough, there are two basic problems the status quo does not address:
The MLJ user may not be abler quickly find the 3rd party pkg documentation she needs. This is for a number of reasons, which include:
pkg_url
has a link to the 3rd party pkg repo, users are likely unaware of this fact, and it's not that convenient to dig this up and paste in a browserdocstring
, providing a separate "interface" docstring is similarly inconspicuous, with a fallback that isn't terribly helpful.It's difficult to understand a model's data type requirements by inspecting scitype traits. For example, we have
input_scitype(DecisionTreeeClassifier()) = Table{var"#s28"} where var"#s28"<:Union{AbstractVector{var"#s29"} where var"#s29"<:Continuous, AbstractVector{var"#s29"} where var"#s29"<:Count, AbstractVector{var"#s29"} where var"#s29"<:OrderedFactor}
. This means the tree accepts tabular data with columns ofContinuous
,Count
orOrderedFactor
, but this is very hard to grep from the string. Only die-hard novices would be able to extract this information! I suspect this is the most serious user-unfriendly feature of MLJ for beginners, struggling to understand warnings thrown by themachine(model, X, y)
constructor.A couple of ideas:
docstring
traits of models to ensure they include data type requirements, written in English. When a data requirements are not satisfied in the constructormachine(model, data...)
then we quote this docstring.Model
should get a docstring which quotes the docstring of the appropriate 3rd party package objectOther suggestions or comments anyone?
The text was updated successfully, but these errors were encountered: