-
Notifications
You must be signed in to change notification settings - Fork 212
feat(data-catalog): Adding Hue #227
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Cloudera Hue | ||
|
||
Deploys the Cloudera Hue server allowing data exploration on Hive and S3 buckets. | ||
|
||
Cloudera Hue is expected to be deployed along any HiveServer2 type of service. In Open Data Hub a [Spark SQL Thrift Server](../thriftserver) is used. Without Thrift Server deployment, Hue won't be able to fulfil any SQL queries. However it can still serve as an S3 browser. | ||
|
||
### Folders | ||
|
||
There is one main folder in the Hue component `hue` which contains the kustomize manifests. | ||
|
||
### Installation | ||
|
||
To install Hue add the following to the `kfctl` yaml file. | ||
|
||
```yaml | ||
- kustomizeConfig: | ||
repoRef: | ||
name: manifests | ||
path: hue/hue | ||
name: hue | ||
``` | ||
|
||
### Overlays | ||
|
||
Hue component provides a single overlay. | ||
|
||
#### storage-class | ||
|
||
Customizes Hue's database to use a specific `StorageClass` for PVC, see `storage_class` parameter. | ||
|
||
### Parameters | ||
|
||
There are 4 parameters exposed vie KFDef. | ||
|
||
#### storage_class | ||
|
||
Name of the storage class to be used for PVC created by Hue's database. This requires `storage-class` **overlay** to be enabled as well to work. | ||
|
||
#### hue_secret_key | ||
|
||
Set session store secret key for Hue web server. | ||
|
||
#### s3_endpoint_url | ||
tumido marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
HTTP endpoint exposed by your S3 object storage solution which will be made available to Hue as the default S3 filesystem location. | ||
|
||
#### s3_is_secure | ||
|
||
Specifies if HTTPS should be used as a transport protocol. Set to `true` for HTTPS and to `false` to use HTTP. Parameter is set to `true` by default. | ||
|
||
#### s3_credentials_secret | ||
|
||
Along with `s3_endpoint_url`, this parameter configures the Hue's access to S3 object storage. Setting this parameter to any name of local Openshift/Kubernetes Secret resource name would allow Hue to consume S3 credentials from it. The secret of choice must contain `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` keys. Keep in mind, in order for this value to be respected by Spark cluster properly, it must use the same credentials. If not set, credentials from [`hue-sample-s3-secret`](hue/base/hue-sample-s3-secret.yaml) will be used instead. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
--- | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
name: hue-hive-site-xml | ||
type: Opaque | ||
stringData: | ||
hive-site.xml: | | ||
<?xml version="1.0"?> | ||
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> | ||
<configuration> | ||
<property> | ||
<name>hive.server2.transport.mode</name> | ||
<value>binary</value> | ||
<description>Server transport mode. binary or http.</description> | ||
</property> | ||
|
||
<property> | ||
<name>hive.server2.thrift.http.port</name> | ||
<value>10000</value> | ||
<description>Port number when in HTTP mode.</description> | ||
</property> | ||
|
||
<property> | ||
<name>fs.s3a.aws.credentials.provider</name> | ||
<value>com.amazonaws.auth.EnvironmentVariableCredentialsProvider</value> | ||
<description> | ||
Comma-separated class names of credential provider classes which implement | ||
com.amazonaws.auth.AWSCredentialsProvider. | ||
|
||
These are loaded and queried in sequence for a valid set of credentials. | ||
Each listed class must implement one of the following means of | ||
construction, which are attempted in order: | ||
1. a public constructor accepting java.net.URI and | ||
org.apache.hadoop.conf.Configuration, | ||
2. a public static method named getInstance that accepts no | ||
arguments and returns an instance of | ||
com.amazonaws.auth.AWSCredentialsProvider, or | ||
3. a public default constructor. | ||
|
||
Specifying org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider allows | ||
anonymous access to a publicly accessible S3 bucket without any credentials. | ||
Please note that allowing anonymous access to an S3 bucket compromises | ||
security and therefore is unsuitable for most use cases. It can be useful | ||
for accessing public data sets without requiring AWS credentials. | ||
|
||
If unspecified, then the default list of credential provider classes, | ||
queried in sequence, is: | ||
1. org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider: | ||
Uses the values of fs.s3a.access.key and fs.s3a.secret.key. | ||
2. com.amazonaws.auth.EnvironmentVariableCredentialsProvider: supports | ||
configuration of AWS access key ID and secret access key in | ||
environment variables named AWS_ACCESS_KEY_ID and | ||
AWS_SECRET_ACCESS_KEY, as documented in the AWS SDK. | ||
3. com.amazonaws.auth.InstanceProfileCredentialsProvider: supports use | ||
of instance profile credentials if running in an EC2 VM. | ||
</description> | ||
</property> | ||
|
||
<property> | ||
<name>fs.s3a.endpoint</name> | ||
<value>$(s3_endpoint_url)</value> | ||
<description>AWS S3 endpoint to connect to. An up-to-date list is | ||
provided in the AWS Documentation: regions and endpoints. Without this | ||
property, the standard region (s3.amazonaws.com) is assumed. | ||
</description> | ||
</property> | ||
</configuration> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
--- | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
name: hue-ini | ||
type: Opaque | ||
stringData: | ||
hue.ini: | | ||
# Full configuration: | ||
# https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini | ||
|
||
[desktop] | ||
# Hide unused apps | ||
app_blacklist=impala,security,jobbrowser,jobsub,pig,hbase,sqoop,zookeeper,spark,oozie,search | ||
secret_key=$(hue_secret_key) | ||
http_host=0.0.0.0 | ||
http_port=8000 | ||
time_zone=America/Los_Angeles | ||
django_debug_mode=false | ||
dev=false | ||
database_logging=false | ||
send_dbug_messages=false | ||
http_500_debug_mode=false | ||
enable_prometheus=true | ||
|
||
[[django_admins]] | ||
|
||
[[custom]] | ||
|
||
[[auth]] | ||
|
||
[[ldap]] | ||
[[[users]]] | ||
|
||
[[[groups]]] | ||
|
||
[[[ldap_servers]]] | ||
|
||
[[vcs]] | ||
|
||
[[database]] | ||
engine=mysql | ||
host=hue-mysql.$(namespace).svc | ||
port=3306 | ||
user=$(database_user) | ||
password=$(database_password) | ||
name=$(database_name) | ||
|
||
[[session]] | ||
|
||
[[smtp]] | ||
host=localhost | ||
port=25 | ||
user= | ||
password= | ||
tls=no | ||
|
||
[[knox]] | ||
|
||
[[kerberos]] | ||
|
||
[[oauth]] | ||
|
||
[[oidc]] | ||
|
||
[[metrics]] | ||
|
||
[[tracing]] | ||
|
||
[[task_server]] | ||
|
||
[[gc_accounts]] | ||
[[[default]]] | ||
|
||
[notebook] | ||
[[interpreters]] | ||
[[[hive]]] | ||
name=Hive | ||
interface=hiveserver2 | ||
|
||
[[[impala]]] | ||
name=Impala | ||
interface=hiveserver2 | ||
|
||
[[[sparksql]]] | ||
name=SparkSql | ||
interface=hiveserver2 | ||
|
||
[[[text]]] | ||
name=Text | ||
interface=text | ||
|
||
[[[markdown]]] | ||
name=Markdown | ||
interface=text | ||
|
||
[dashboard] | ||
is_enabled=true | ||
has_sql_enabled=true | ||
has_report_enabled=true | ||
use_gridster=true | ||
has_widget_filter=false | ||
has_tree_widget=false | ||
|
||
[[engines]] | ||
[[[solr]]] | ||
analytics=true | ||
nesting=true | ||
|
||
[[[sql]]] | ||
analytics=true | ||
nesting=false | ||
|
||
[hadoop] | ||
[[hdfs_clusters]] | ||
[[[default]]] | ||
|
||
[[yarn_clusters]] | ||
[[[default]]] | ||
|
||
[beeswax] | ||
hive_server_host=thriftserver.$(namespace).svc | ||
hive_server_port=10000 | ||
hive_conf_dir=/etc/hive/conf | ||
thrift_version=7 | ||
|
||
[[ssl]] | ||
|
||
[metastore] | ||
enable_new_create_table=true | ||
force_hs2_metadata=false | ||
|
||
[impala] | ||
[[ssl]] | ||
|
||
[spark] | ||
|
||
[oozie] | ||
|
||
[filebrowser] | ||
|
||
[pig] | ||
|
||
[sqoop] | ||
|
||
[proxy] | ||
|
||
[hbase] | ||
|
||
[search] | ||
|
||
[libsolr] | ||
|
||
[indexer] | ||
|
||
[jobsub] | ||
|
||
[jobbrowser] | ||
|
||
[security] | ||
|
||
[zookeeper] | ||
[[clusters]] | ||
[[[default]]] | ||
|
||
[useradmin] | ||
[[password_policy]] | ||
|
||
[liboozie] | ||
oozie_url= | ||
|
||
[aws] | ||
[[aws_accounts]] | ||
[[[default]]] | ||
host=$(s3_endpoint_url) | ||
is_secure=$(s3_is_secure) | ||
calling_format=boto.s3.connection.OrdinaryCallingFormat | ||
access_key_id_script=/opt/hue/aws_access_key_id.sh | ||
secret_access_key_script=/opt/hue/aws_secret_access_key.sh | ||
|
||
[azure] | ||
[[azure_accounts]] | ||
[[[default]]] | ||
|
||
[[adls_clusters]] | ||
[[[default]]] | ||
|
||
[[abfs_clusters]] | ||
[[[default]]] | ||
|
||
[libsentry] | ||
|
||
[libzookeeper] | ||
|
||
[librdbms] | ||
[[databases]] | ||
|
||
[libsaml] | ||
|
||
[liboauth] | ||
|
||
[kafka] | ||
[[kafka]] | ||
|
||
[metadata] | ||
[[manager]] | ||
[[optimizer]] | ||
[[catalog]] | ||
[[navigator]] | ||
[[prometheus]] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
kind: PersistentVolumeClaim | ||
apiVersion: v1 | ||
metadata: | ||
name: hue-mysql | ||
spec: | ||
accessModes: | ||
- ReadWriteOnce | ||
resources: | ||
requests: | ||
storage: "1Gi" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a very little understanding about what the DB is actually used for - do you know if 1Gi is a reasonable size? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same answer as at the Thrift server PR. I've used values based on the internal data hub team expertise. 🙂 Hue stores mainly it's "user" data and preferences, (it's an old school Django app), query history, sample data cache for individual tables, data autocomplete and hive metadata cache, etc... In our experience this value is enough, though I don't really know how much lower we can go. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
name: hue-mysql | ||
stringData: | ||
database-user: datacatalog | ||
database-password: datacatalog | ||
database-name: datacatalog | ||
database-root-password: root |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
kind: Service | ||
apiVersion: v1 | ||
metadata: | ||
name: hue-mysql | ||
annotations: | ||
template.openshift.io/expose-uri: | | ||
'mysql://{.spec.clusterIP}:{.spec.ports[?(.name=="mysql")].port}' | ||
spec: | ||
ports: | ||
- name: mysql | ||
protocol: TCP | ||
port: 3306 | ||
targetPort: 3306 | ||
- name: exporter | ||
protocol: TCP | ||
port: 9104 | ||
targetPort: 9104 | ||
selector: | ||
deployment: hue-mysql | ||
type: ClusterIP | ||
sessionAffinity: None | ||
status: | ||
loadBalancer: {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this does not work without thrifserver, right? If so, can you mention it in the readme and maybe even as a comment in KFDef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can work without thriftserver, but you'd need to connect it to something else (yarn, Hive directly, etc) via modifying the
hue.ini
. Or, you can use it as an S3 explorer. Which is a standalone functionality. In our scenario and set up it requires Thrift server to be available. I'll explain it in the readme. 👍There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a paragraph in there. WDYT? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clarification looks good to me.