Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30630][ML] Remove numTrees in GBT in 3.0.0 #27330

Closed
wants to merge 3 commits into from

Conversation

huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Jan 23, 2020

What changes were proposed in this pull request?

Remove numTrees in GBT in 3.0.0.

Why are the changes needed?

Currently, GBT has

  /**
   * Number of trees in ensemble
   */
  @Since("2.0.0")
  val getNumTrees: Int = trees.length

and

  /** Number of trees in ensemble */
  val numTrees: Int = trees.length

I think we should remove one of them. We deprecated it in 2.4.5 via #27352.

Does this PR introduce any user-facing change?

Yes, remove numTrees in GBT in 3.0.0

How was this patch tested?

existing tests

@SparkQA
Copy link

SparkQA commented Jan 23, 2020

Test build #117268 has finished for PR 27330 at commit f95744f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* Number of trees in ensemble
*
* @deprecated Use [[getNumTrees]] instead. This method will be removed in 3.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit. instead. This -> instead. This?

*
* @deprecated Use [[getNumTrees]] instead. This method will be removed in 3.1.0
*/
@deprecated("Use getNumTrees instead. This method will be removed in 3.1.0.", "3.0.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can deprecate this in 2.4.5 and remove in 3.0.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so deprecate this in v2.4.5-rc1, right?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 2.4.5 because RC is not an official release.
RC1 vote failed and we are preparing 2.4.5 RC2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

@dongjoon-hyun
Copy link
Member

@huaxingao . Could you file a JIRA for this?

@huaxingao
Copy link
Contributor Author

@dongjoon-hyun

Could you file a JIRA for this?

Yes, I will open a JIRA. Thank you very much for reviewing this PR.

@huaxingao huaxingao changed the title [MINOR][ML] Deprecate numTrees in GBT [SPARK-30630][ML] Deprecate numTrees in GBT Jan 24, 2020
@huaxingao huaxingao changed the title [SPARK-30630][ML] Deprecate numTrees in GBT [SPARK-30630][ML] Remove numTrees in GBT in 3.0.0 Jan 24, 2020
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 24, 2020

@huaxingao . We need to deprecate this in branch-2.4 first. We can not proceed in the reverse order. Could you make a PR for branch-2.4 too? We will review that first.

I will deprecate it in 2.4.5 and remove it in 3.0.0

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117337 has finished for PR 27330 at commit 8744e17.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

@dongjoon-hyun Yes. I am trying to make a PR for 2.4. I will also fix the MiMa problem.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117340 has finished for PR 27330 at commit 0b81823.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

cc @zhengruifeng

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117345 has finished for PR 27330 at commit 0b81823.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jan 24, 2020

OK by me. I think there was also no big hurry to deprecate in 2.4 to remove in 3.0, but it's pretty trivial either way.

@dongjoon-hyun
Copy link
Member

Thank you, @srowen .

@dongjoon-hyun
Copy link
Member

Merged to master. Thank you, @huaxingao and @srowen .

@huaxingao
Copy link
Contributor Author

Thank you very much! @dongjoon-hyun @srowen

@huaxingao huaxingao deleted the spark-numTrees branch January 24, 2020 20:18
@zhengruifeng
Copy link
Contributor

Sorry to come late. LGTM

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 25, 2020

It's not late. Thank you, @zhengruifeng . :)

@gatorsmile
Copy link
Member

@dongjoon-hyun @zhengruifeng @huaxingao @srowen

We normally do not make such an exception. Can anyone explain how critical it is?

@gatorsmile
Copy link
Member

Also, I did not even see the migration guide in the PR.

@srowen
Copy link
Member

srowen commented Jan 29, 2020

Not at all critical, but what's the exception here?

@huaxingao
Copy link
Contributor Author

@gatorsmile It's not critical. It was deprecated in 2.4.5. I will add the release notes.

@srowen
Copy link
Member

srowen commented Jan 29, 2020

It can't hurt, but I think this is too trivial for a migration guide. It's already in the deprecation message: use method X instead of Y.

@dongjoon-hyun
Copy link
Member

@gatorsmile . As mentioned in the PR description, we deprecated this at 2.4.5 which will be released before 3.0.0. Do you mean the deprecation in 2.4.5 is also exceptional?

@gatorsmile
Copy link
Member

gatorsmile commented Jan 29, 2020

Did we deprecate any API in the maintenance releases and then remove the APIs in the next [major] release? Based on my understanding, all the APIs are deprecated in the major releases. Could any of you give me some examples in our previous releases?

@dongjoon-hyun
Copy link
Member

We don't remove the API at 2.4.6, @gatorsmile . The next of 2.4.5 is 2.4.6.

Did we deprecate any API in the maintenance releases and then remove the APIs in the next release?

@gatorsmile
Copy link
Member

I mean the next major release.

@gatorsmile
Copy link
Member

Also, we should never remove any API in the maintenance release

@dongjoon-hyun
Copy link
Member

We didn't remove any API in the maintenance release.

@dongjoon-hyun
Copy link
Member

This is removed at 3.0.0 at the major version, isn't it?

@dongjoon-hyun
Copy link
Member

Let me rephrase your words~ So, only 2.0.0 or 3.0.0 can do deprecation . Then, 3.0.0 or 4.0.0 can do the removal?

@gatorsmile
Copy link
Member

Let me make my points clear. Did we deprecate any API in the maintenance releases?

For example, did we deprecate an API in 2.3.4 then remove the API in 2.4?

@dongjoon-hyun
Copy link
Member

SPARK-24918 adds Executor Plugin at 2.4.0.
SPARK-29399 removed it at 3.0.0.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 29, 2020

SPARK-28780 Delete the incorrect setWeightCol method in LinearSVCModel at 2.3.4, 2.4.4, 3.0.0?

@gatorsmile
Copy link
Member

Deprecating the APIs in the major/feature releases, like 2.4.0, 2.3.0, 2.2.0, is common; but deprecating the APIs in the maintenance releases, like 2.2.1, 2.2.2, 2.3.4 is rare.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 29, 2020

We deleted public API even in minor maintenance versions like SPARK-28780.

@gatorsmile
Copy link
Member

To be honest, this is a big surprise to me. We might need more public discussions for such changes. This will easily break the applications and make our end users frustrated.

@dongjoon-hyun
Copy link
Member

I totally agree with you. Your guideline also sounds reasonable.

@gatorsmile
Copy link
Member

Maybe we need some public discussions about the general guideline about the API deprecation/removal in the dev list; and then everyone can follow the same rule in the future?

I do not want to offend anyone. Will not give an example in the dev-list discussion.

@dongjoon-hyun
Copy link
Member

Please add the above examples. Otherwise, people don't understand what we did. Additionally, please add the following.

[SPARK-28556][SQL] QueryExecutionListener should also notify Error

This changes the public API at 3.0.0 last July, but we didn't mentioned it at any 2.4.x releases.

@dongjoon-hyun
Copy link
Member

I don't think it's an offend. (including your arguments). All comments are valid opinions.

@dongjoon-hyun
Copy link
Member

I gave +1 for the above one and this PR .

@srowen
Copy link
Member

srowen commented Jan 29, 2020

The policy, such as it exists, has always been roughly to remove deprecated items in a major release (fine of course), or in a minor release after it's been deprecated for at least one prior minor release (aggressive, but can be fine). Yes, we've done both many times. This is the first situation.

@gatorsmile what do you think the rule has been? I'm not sure I have perceived any other disagreement about that.

This particular change is really minor either way. Deprecating is obviously correct. Whether it's removed in 3.0 or 3.1 doesn't matter much, but, why keep it in 3.0? it's obviously an old API oversight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants