-
Notifications
You must be signed in to change notification settings - Fork 131
Deployment
- JRE 1.8+
- Apache Hadoop v2.X
- Apache Hive v0.14+
- Apache Drill v1.8+. You can visit http://drill.apache.org/docs/ to learn how to deploy and use it.
- Apache Zookeeper. We use v3.4.8, consider using an adaptable version.
- Apache Kafka v0.8.0+. Optional. Used for realtime ingestion.
Note - Please make sure you have all those requirements installed corrrectly before deploying IndexR.
- Copy the correct lib file in
indexr-<version>/lib
to/usr/local/lib/
on all cluster nodes, including those nodes where you may run Hive or indexr-tool scripts.
e.g. On Linux platform, you should use the
libbhcompress.so
file.
- Edit
${HADOOP_HOME}/etc/hadoop/mapred-site.xml
, add/usr/local/lib
toLD_LIBRARY_PATH
inmapred.child.env
parameter. e.g.
<property>
<name>mapred.child.env</name>
<value>LD_LIBRARY_PATH=/usr/local/lib</value>
</property>
- Edit
${HADOOP_HOME}/etc/hadoop/hadoop-env.sh
, add/usr/local/lib
toLD_LIBRARY_PATH
. e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
- Copy IndexR Hive aux jars
indexr-<version>/indexr-hive/aux/*
to Hive'sHIVE_AUX_JARS_PATH
.HIVE_AUX_JARS_PATH
can be set in${HIVE_HOME}/conf/hive-env.sh
. e.g.
cp -r indexr-<version>/indexr-hive/aux/* /usr/local/hive/aux/
- [Optional] Sometimes you will need to upload those hive aux jars to HDFS in the same path. e.g.
hdfs dfs -put /usr/local/hive/aux/* /usr/local/hive/aux/
- Restart HiveServer2 if you have.
- Now you should be able to create an IndexR hive table via Hive console. e.g.
hive (default)> CREATE EXTERNAL TABLE IF NOT EXISTS test (
`date` int,
`d1` string,
`m1` int,
`m2` bigint,
`m3` float,
`m4` double
)
PARTITIONED BY (`dt` string)
ROW FORMAT SERDE 'io.indexr.hive.IndexRSerde'
STORED AS INPUTFORMAT 'io.indexr.hive.IndexRInputFormat'
OUTPUTFORMAT 'io.indexr.hive.IndexROutputFormat'
LOCATION '/indexr/segment/test'
;
hive (default)> insert into table test partition (dt=20160701) values(20160701,'mac',100,192444,1.55,-331.43555);
hive (default)> select * from test limit 10;
indexr-tool is a tool box to manage IndexR. It only need to deploy on one node, usaully your manage node.
-
Copy
indexr-<version>/indexr-tool
to a path, like/usr/local/indexr-tool
-
Copy
${HADOOP_CONF}/core-site.xml
and${HADOOP_CONF}/hdfs-site.xml
toconf
folder. -
Modify configurations in
conf
folder. Especially theindexr.fs.connection
setting inindexr.config.properties
, make sure it is set to the same value asfs.defaultFS
incore-site.xml
.env.sh indexr.config.properties log4j.xml
- Copy all files in
indexr-<version>/indexr-drill/*
to Drill installation home dir${DRILL_HOME}/
, for example,/usr/local/drill
. Do it on all Drill nodes in cluster. - Copy
drill-indexr-storage-<version>.jar
to${DRILL_HOME}/jars/
. - Modify configuration
${DRILL_HOME}/conf/indexr.config.properties
. It should be keep in sync with indexr-tool and all Drillbit nodes. - Copy
${HADOOP_CONF}/core-site.xml
and${HADOOP_CONF}/hdfs-site.xml
to${DRILL_HOME}/conf
folder if not yet exists. - Modify
${DRILL_HOME}/conf/drill-env.sh
, add/usr/local/lib
toLD_LIBRARY_PATH
. e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
-
Synchronize
${DRILL_HOME}/conf
to all Drillbit nodes and restart them. -
Go to Drill Web Console, create an new Storage call indexr, and input the follow text, click Create. You only need to do this once in one of your Drill Web Console in the cluster.
{
"type": "indexr",
"enabled": true
}
Node that IndexR plugin can only create one storage in a Drill cluster, so you should always use the name indexr
- Now you can create an IndexR table and enjoy.
Create IndexR table by indexr-tool
cd ${INDEXR-TOOL_HOME}
bin/tools.sh -cmd settb -t test -c test_schema.json
test_schem.json
:
{
"schema":{
"columns":
[
{"name": "date", "dataType": "int"},
{"name": "d1", "dataType": "string"},
{"name": "m1", "dataType": "int"},
{"name": "m2", "dataType": "bigint"},
{"name": "m3", "dataType": "float"},
{"name": "m4", "dataType": "double"}
]
}
}
Do some query via Drill console
cd ${DRILL_HOME}
bin/drill-conf
0: jdbc:drill:> select * from indexr.test limit 10;
Node: We only support Spark 2.1.0+.
Simple copy all jars in indexr-<version>/indexr-spark/jars/*
to ${SPARK_HOME}/jars/
, you are good to go.