Skip to content

Replacing Mulgara with Blazegraph

lutaylor edited this page Jul 27, 2016 · 2 revisions

How to replace Mulgara triplestore in Fedora 3.x

The following notes are for a 3.8.1 install. If it gives errors at startup with Fedora versions other than 3.8.1 it is likely due to conflicting jars in the classpath.

All ingests should be halted and an accurate triple count should be retrieved from Mulgara before undertaking so you can compare before and after.

E.g.

http://fedoraserver:8080/fedora/risearch

Language - itql
Response - CSV
Limit - Unlimited

Query:

select count(select $subject $predicate $object from <#ri> where $subject $predicate $object) from <#ri> where $subject $predicate $object;

Note: this query may take a long time to to run depending on your repository size. The output should give you the total number of triples.

Blazegraph needs to run in a separate container from Fedora or ideally on a separate server with port 8080 (or whatever port you want to run tomcat on) exposed to the fedora server. Blazegraph must be running before Fedora can start properly keep this in mind when scheduling the start/stop of services.

Install java if you don’t already have it installed

e.g.

apt-get install oracle-java8-installer

Install Tomcat 7 with latest binaries

e.g.

wget http://mirror.its.dal.ca/apache/tomcat/tomcat-7/v7.0.69/bin/apache-tomcat-7.0.69.tar.gz`
tar xf apache-tomcat-7.0.69.tar.gz
mv apache-tomcat-7.0.69 /usr/share/tomcat-blzg
useradd -m -d /var/bigdata -s /bin/false blazegraph`

add variables to /var/bigdata/.bash_profile for blazegraph user to override other tomcat variables

export BLZG_CONF=/etc/bigdata
export CATALINA_HOME=/usr/share/tomcat-blzg
export CATALINA_PID="/usr/share/tomcat-blzg/catalina.pid"
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre
export BLZG_USER=blazegraph
export PATH=/usr/lib/jvm/java-8-oracle/jre/bin:/usr/share/tomcat-blzg/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export JAVA_OPTS="-server -Xmx2000m -Dcom.bigdata.rdf.sail.webapp.ConfigParams.propertyFile=/etc/bigdata/RWStore.properties -Dlog4j.configuration=/etc/bigdata/log4j.properties -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -XX:+UseParallelOldGC"

NOTE: multiple CPU use -XX:+UseParallelOldGC if you don’t have multiple CPU don’t use -XX:+UseParallelOldGC. Might have to tweak Xmx depending on heap usage this can vary 2000m is just a starting point. Leave at least 50%+ of total system memory free for disk caching.

edit /usr/share/tomcat-blzg/conf/server.xml to use alternate ports from other tomcat container if running on the same server as fedora tomcat.

mkdir -p /var/bigdata/logs
mkdir -p /etc/bigdata/
cd ~
git clone https://github.com/discoverygarden/blazegraph_conf
cp blazegraph_conf/RWStore.properties /etc/bigdata
#note log4j.properties could be tweaked to log INFO/DEBUG could impact throughput however.`
cp blazegraph_conf/log4j.properties /etc/bigdata
cp blazegraph_conf/blazegraph_init /etc/init.d/blazegraph
cd /usr/share/tomcat-blzg/webapps
wget https://sourceforge.net/projects/bigdata/files/bigdata/2.1.0/blazegraph.war/download -O blazegraph.war
chown -R blazegraph:blazegraph /usr/share/tomcat-blzg
chown -R blazegraph:blazegraph /var/bigdata
chown -R blazegraph:blazegraph /etc/bigdata
chmod +x /etc/init.d/blazegraph
#note this will vary just make sure it starts before fedora
update-rc.d blazegraph start 64 2 3 4 5 . stop 36 0 1 6 .
service blazegraph start

Visit http://server:8080/blazegraph (or server:8081/blazegraph assuming you incremented ports if running two tomcats locally)

On Fedora server

Ensure you install and setup maven binaries apt-get install maven

cd ~ git clone https://github.com/discoverygarden/trippi-sail.git

cd trippi-sail
#-Dfedora.version should be whatever version you are using
mvn package -Dfedora.version=3.8.1
cd trippi-sail-blazegraph-remote/target
tar xf trippi-sail-blazegraph-remote-0.0.1-SNAPSHOT-bin.tar.gz
mv trippi-sail-blazegraph-remote-0.0.1-SNAPSHOT /opt/trippi-sail
chown -R fedora:fedora /opt/trippi-sail

Stop fedora

service tomcat stop

update /usr/local/fedora/tomcat/conf/Catalina/localhost/fedora.xml

e.g.

<?xml version="1.0" encoding="UTF-8"?>
<Context>
 <Loader
       className="org.apache.catalina.loader.VirtualWebappLoader"
       virtualClasspath="/opt/trippi-sail/*.jar"
       searchVirtualFirst="true"/>
 <Parameter name="fedora.home" value="/usr/local/fedora" />
</Context>

cp ~/trippi-sail/trippi-sail-blazegraph-remote/src/main/resources/sample-bean-config-xml/remote-blazegraph.xml /usr/local/fedora/server/config/spring/

update remote-blazegraph.xml with blazegraph server location

               <constructor-arg type="java.lang.String" value="http://blazegraphserver:8081/blazegraph"/>
               <constructor-arg type="boolean" value="false"/>
       </bean>

Should change to something like

E.g.

<bean class="org.trippi.impl.sesame.SesameSession" scope="prototype" >
                                <constructor-arg ref="trippiSailRepository"/>
                                <constructor-arg ref="org.trippi.AliasManager"/>
                                <constructor-arg value="fedora://model#"/>
                                <constructor-arg value="ri"/>
            	</bean>

chown -R fedora:fedora /usr/local/fedora

comment out this section

`$FEDORA_HOME/server/config/fedora.fcfg

   <!--
   <param name="datastore" value="localMulgaraTriplestore">
     <comment>(required)
           Name of the triplestore to use. WARNING: changing the
           triplestore running the Resource Index Rebuilder.</comment>
   </param>
   -->

update the -cp section of $FEDORA_HOME/server/bin/env-server.sh so that it has the jars

-cp \"$webinf\"/classes:/opt/trippi-sail/*:\"$FEDORA_HOME\"/server/bin:\"$webinf\"/lib/* \

Rebuild resource index Note: ensure that you got a triplecount from your existing fedora before proceeding ensure you have database dumps and backups.

Run the following in “screen” or a similar program and ensure it is logging somewhere since it will run potentially for days depending on triplecount. Make sure you can access your blazegraph server ok before hand.

Check pidGen table in fedora database before proceeding need to ensure that it matches when done. _Note: you likely can get away with not rebuilding the database in an existing install just putting it here for completion just incase it is a net new install. _

E.g. use database fedora3;

select * from pidGen;

su - fedora -s /bin/bash

cd /usr/local/fedora/server/bin/ ./fedora-rebuild.sh

First 1) Rebuild the Resource Index.

After it completes re-run ./fedora-rebuild.sh and run 2) Rebuild SQL database.

Note when rebuilding in blazegraph you might have to Ctrl+C after it says it finished if the fedora-rebuild doesn’t exit.

Note: You might see an warnings similar to https://jira.blazegraph.com/browse/BLZG-1152. We have seen this in other installs and it doesn’t seem to break anything. Just make sure your triple counts match.

Re-check pidGen table in fedora database before proceeding need to ensure that it matches what you had before. If it is empty or doesn’t look right you might have to rebuild the database again. This typically is an issue with older fedora versions.

E.g. use database fedora3;

select * from pidGen;

Once it is all completed connect to blazegraph and check triple count by running the following query:

http://blazegraphserver:8081/blazegraph/#query

SELECT (COUNT(*) AS ?triples) WHERE {?s ?p ?o}

It should match what you had in mulgara. If it matches you should be good to start up fedora and you will be using blazegraph. You now need to under http://drupalserver/admin/islandora/configure uncheck “Use iTQL for particular queries”.

Start fedora

service tomcat start