Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20926][SQL] Removing exposures to guava library caused by directly accessing SessionCatalog's tableRelationCache #18148

Closed
wants to merge 4 commits into from

Conversation

rezasafi
Copy link
Contributor

There could be test failures because DataStorageStrategy, HiveMetastoreCatalog and also HiveSchemaInferenceSuite were exposed to guava library by directly accessing SessionCatalog's tableRelationCacheg. These failures occur when guava shading is in place.

What changes were proposed in this pull request?

This change removes those guava exposures by introducing new methods in SessionCatalog and also changing DataStorageStrategy, HiveMetastoreCatalog and HiveSchemaInferenceSuite so that they use those proxy methods.

How was this patch tested?

Unit tests passed after applying these changes.

…ly accessing SessionCatalog's tableRelationCache

There were test failures because DataStorageStrategy, HiveMetastoreCatalog and also HiveSchemaInferenceSuite were exposed to the shaded Guava library. This change removes those exposures by introducing new methods in SessionCatalog.
@rezasafi rezasafi changed the title [SPARK-20926][SQL] Removing exposures to guava library caused by directly accessing SessionCatalog's tableRelationCacheg [SPARK-20926][SQL] Removing exposures to guava library caused by directly accessing SessionCatalog's tableRelationCache May 30, 2017
@vanzin
Copy link
Contributor

vanzin commented Jun 1, 2017

ok to test

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmbrus as far as I can tell, SessionCatalog is a public API right? If so we shouldn't ship 2.2 without fixing this, since it's exposing Guava.

* A cache of qualified table names to table relation plans.
* Accessing tableRelationCache directly is not recommended,
* since it will introduce exposures to guava libraries.
*/
val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have been introduced in 2.2, can you make it private instead, and change all call sites?

@SparkQA
Copy link

SparkQA commented Jun 1, 2017

Test build #77659 has finished for PR 18148 at commit 8253bbe.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val tableRelationCache: Cache[QualifiedTableName, LogicalPlan] = {
val cacheSize = conf.tableRelationCacheSize
CacheBuilder.newBuilder().maximumSize(cacheSize).build[QualifiedTableName, LogicalPlan]()
}

/**
* This method provides a way to get a cached plan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "without exposing components to Guava" part of all these comments is unnecessary.

@zsxwing
Copy link
Member

zsxwing commented Jun 1, 2017

@vanzin the whole catalyst project is private.

@vanzin
Copy link
Contributor

vanzin commented Jun 1, 2017

Ah, yes. I read 'filter' instead of 'filterNot' in SparkBuild.scala where it sets mimaProjects.

@SparkQA
Copy link

SparkQA commented Jun 2, 2017

Test build #77664 has finished for PR 18148 at commit 9821ea1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 2, 2017

Test build #77666 has finished for PR 18148 at commit 9421372.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of suggestions for the names but otherwise LGTM.

}

/** This method provides a way to get a cached plan if the key exists. */
def getCachedTableIfPresent(key: QualifiedTableName): LogicalPlan = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCachedTable instead. I'd even just use cachedTable but this class at least seems to use verbs more than the rest of the code.

}

/** This method provides a way to cache a plan. */
def putTableInCache(t: QualifiedTableName, l: LogicalPlan): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheTable

@SparkQA
Copy link

SparkQA commented Jun 3, 2017

Test build #77696 has finished for PR 18148 at commit 2832253.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rezasafi
Copy link
Contributor Author

rezasafi commented Jun 3, 2017

Not clear why the tests were failed.

@vanzin
Copy link
Contributor

vanzin commented Jun 3, 2017

Fun, but probably unrelated:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f115a81f0e9, pid=125279, tid=139711419225856
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x88a0e9]  LoadKlassNode::make(PhaseGVN&, Node*, Node*, Node*, TypePtr const*, TypeKlassPtr const*)+0x49
#

@vanzin
Copy link
Contributor

vanzin commented Jun 3, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Jun 3, 2017

Test build #77704 has finished for PR 18148 at commit 2832253.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jun 5, 2017

Merging to master.

asfgit pushed a commit that referenced this pull request Jun 5, 2017
…ctly accessing SessionCatalog's tableRelationCache

There could be test failures because DataStorageStrategy, HiveMetastoreCatalog and also HiveSchemaInferenceSuite were exposed to guava library by directly accessing SessionCatalog's tableRelationCacheg. These failures occur when guava shading is in place.

## What changes were proposed in this pull request?
This change removes those guava exposures by introducing new methods in SessionCatalog and also changing DataStorageStrategy, HiveMetastoreCatalog and HiveSchemaInferenceSuite so that they use those proxy methods.

## How was this patch tested?

Unit tests passed after applying these changes.

Author: Reza Safi <rezasafi@cloudera.com>

Closes #18148 from rezasafi/branch-2.2.
@rezasafi rezasafi closed this Jun 6, 2017
@rezasafi rezasafi reopened this Jun 6, 2017
@vanzin
Copy link
Contributor

vanzin commented Jun 6, 2017

@rezasafi please close this.

@vanzin
Copy link
Contributor

vanzin commented Jun 6, 2017

Oh crap, I didn't notice the branch. @rezasafi in the future, always send PRs against the master branch first.

@asfgit asfgit closed this in b61a401 Jun 6, 2017
@rezasafi
Copy link
Contributor Author

rezasafi commented Jun 6, 2017

Sorry about this @vanzin. I didn't know that.

@yhuai
Copy link
Contributor

yhuai commented Jun 8, 2017

@vanzin Seems merging to branch-2.2 was an accident? Since it is not really a bug fix, should we revert it from branch-2.2 and just keep it in the master?

@vanzin
Copy link
Contributor

vanzin commented Jun 8, 2017

It was an accident but it shouldn't cause any harm either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants