-
Notifications
You must be signed in to change notification settings - Fork 5
Replacing Mulgara with Blazegraph
The following notes are for a 3.8.1 install. If it gives errors at startup with Fedora versions other than 3.8.1 it is likely due to conflicting jars in the classpath.
All ingests should be halted and an accurate triple count should be retrieved from Mulgara before undertaking so you can compare before and after.
E.g.
http://fedoraserver:8080/fedora/risearch
Language - itql
Response - CSV
Limit - Unlimited
Query:
select count(select $subject $predicate $object from <#ri> where $subject $predicate $object) from <#ri> where $subject $predicate $object;
Note: this query may take a long time to to run depending on your repository size. The output should give you the total number of triples.
Blazegraph needs to run in a separate container from Fedora or ideally on a separate server with port 8080 (or whatever port you want to run tomcat on) exposed to the fedora server. Blazegraph must be running before Fedora can start properly keep this in mind when scheduling the start/stop of services.
Install java if you don’t already have it installed
e.g.
apt-get install oracle-java8-installer
Install Tomcat 7 with latest binaries
e.g.
wget http://mirror.its.dal.ca/apache/tomcat/tomcat-7/v7.0.69/bin/apache-tomcat-7.0.69.tar.gz`
tar xf apache-tomcat-7.0.69.tar.gz
mv apache-tomcat-7.0.69 /usr/share/tomcat-blzg
useradd -m -d /var/bigdata -s /bin/false blazegraph`
add variables to /var/bigdata/.bash_profile for blazegraph user to override other tomcat variables
export BLZG_CONF=/etc/bigdata
export CATALINA_HOME=/usr/share/tomcat-blzg
export CATALINA_PID="/usr/share/tomcat-blzg/catalina.pid"
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre
export BLZG_USER=blazegraph
export PATH=/usr/lib/jvm/java-8-oracle/jre/bin:/usr/share/tomcat-blzg/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export JAVA_OPTS="-server -Xmx2000m -Dcom.bigdata.rdf.sail.webapp.ConfigParams.propertyFile=/etc/bigdata/RWStore.properties -Dlog4j.configuration=/etc/bigdata/log4j.properties -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -XX:+UseParallelOldGC"
NOTE: multiple CPU use -XX:+UseParallelOldGC if you don’t have multiple CPU don’t use -XX:+UseParallelOldGC. Might have to tweak Xmx depending on heap usage this can vary 2000m is just a starting point. Leave at least 50%+ of total system memory free for disk caching.
edit /usr/share/tomcat-blzg/conf/server.xml to use alternate ports from other tomcat container if running on the same server as fedora tomcat.
mkdir -p /var/bigdata/logs
mkdir -p /etc/bigdata/
cd ~
git clone https://github.com/discoverygarden/blazegraph_conf
cp blazegraph_conf/RWStore.properties /etc/bigdata
#note log4j.properties could be tweaked to log INFO/DEBUG could impact throughput however.`
cp blazegraph_conf/log4j.properties /etc/bigdata
cp blazegraph_conf/blazegraph_init /etc/init.d/blazegraph
cd /usr/share/tomcat-blzg/webapps
wget https://sourceforge.net/projects/bigdata/files/bigdata/2.1.0/blazegraph.war/download -O blazegraph.war
chown -R blazegraph:blazegraph /usr/share/tomcat-blzg
chown -R blazegraph:blazegraph /var/bigdata
chown -R blazegraph:blazegraph /etc/bigdata
chmod +x /etc/init.d/blazegraph
#note this will vary just make sure it starts before fedora
update-rc.d blazegraph start 64 2 3 4 5 . stop 36 0 1 6 .
service blazegraph start
Visit http://server:8080/blazegraph (or server:8081/blazegraph assuming you incremented ports if running two tomcats locally)
Ensure you install and setup maven binaries
apt-get install maven
cd ~
git clone https://github.com/discoverygarden/trippi-sail.git
cd trippi-sail
#-Dfedora.version should be whatever version you are using
mvn package -Dfedora.version=3.8.1
cd trippi-sail-blazegraph-remote/target
tar xf trippi-sail-blazegraph-remote-0.0.1-SNAPSHOT-bin.tar.gz
mv trippi-sail-blazegraph-remote-0.0.1-SNAPSHOT /opt/trippi-sail
chown -R fedora:fedora /opt/trippi-sail
Stop fedora
service tomcat stop
update /usr/local/fedora/tomcat/conf/Catalina/localhost/fedora.xml
e.g.
<?xml version="1.0" encoding="UTF-8"?>
<Context>
<Loader
className="org.apache.catalina.loader.VirtualWebappLoader"
virtualClasspath="/opt/trippi-sail/*.jar"
searchVirtualFirst="true"/>
<Parameter name="fedora.home" value="/usr/local/fedora" />
</Context>
cp ~/trippi-sail/trippi-sail-blazegraph-remote/src/main/resources/sample-bean-config-xml/remote-blazegraph.xml /usr/local/fedora/server/config/spring/
update remote-blazegraph.xml with blazegraph server location
<constructor-arg type="java.lang.String" value="http://blazegraphserver:8081/blazegraph"/>
<constructor-arg type="boolean" value="false"/>
</bean>
Should change to something like
E.g.
<bean class="org.trippi.impl.sesame.SesameSession" scope="prototype" >
<constructor-arg ref="trippiSailRepository"/>
<constructor-arg ref="org.trippi.AliasManager"/>
<constructor-arg value="fedora://model#"/>
<constructor-arg value="ri"/>
</bean>
chown -R fedora:fedora /usr/local/fedora
comment out this section
`$FEDORA_HOME/server/config/fedora.fcfg
<!--
<param name="datastore" value="localMulgaraTriplestore">
<comment>(required)
Name of the triplestore to use. WARNING: changing the
triplestore running the Resource Index Rebuilder.</comment>
</param>
-->
update the -cp section of $FEDORA_HOME/server/bin/env-server.sh so that it has the jars
-cp \"$webinf\"/classes:/opt/trippi-sail/*:\"$FEDORA_HOME\"/server/bin:\"$webinf\"/lib/* \
Rebuild resource index Note: ensure that you got a triplecount from your existing fedora before proceeding ensure you have database dumps and backups.
Run the following in “screen” or a similar program and ensure it is logging somewhere since it will run potentially for days depending on triplecount. Make sure you can access your blazegraph server ok before hand.
Check pidGen table in fedora database before proceeding need to ensure that it matches when done. _Note: you likely can get away with not rebuilding the database in an existing install just putting it here for completion just incase it is a net new install. _
E.g.
use database fedora3;
select * from pidGen;
su - fedora -s /bin/bash
cd /usr/local/fedora/server/bin/
./fedora-rebuild.sh
First 1) Rebuild the Resource Index.
After it completes re-run ./fedora-rebuild.sh and run 2) Rebuild SQL database.
Note when rebuilding in blazegraph you might have to Ctrl+C after it says it finished if the fedora-rebuild doesn’t exit.
Note: You might see an warnings similar to https://jira.blazegraph.com/browse/BLZG-1152. We have seen this in other installs and it doesn’t seem to break anything. Just make sure your triple counts match.
Re-check pidGen table in fedora database before proceeding need to ensure that it matches what you had before. If it is empty or doesn’t look right you might have to rebuild the database again. This typically is an issue with older fedora versions.
E.g.
use database fedora3;
select * from pidGen;
Once it is all completed connect to blazegraph and check triple count by running the following query:
http://blazegraphserver:8081/blazegraph/#query
SELECT (COUNT(*) AS ?triples) WHERE {?s ?p ?o}
It should match what you had in mulgara. If it matches you should be good to start up fedora and you will be using blazegraph. You now need to under http://drupalserver/admin/islandora/configure uncheck “Use iTQL for particular queries”.
Start fedora
service tomcat start