Never Ending Security

It starts all here

Tag Archives: Solr

Solr: Search Engine Platform for Linux


SEARCH ENGINE PLATFORM: SOLR


Solr is a search engine platform based on Apache Lucene. It is written in Java and uses the Lucene library to implement indexing. It can be accessed using a variety of REST APIs (e.g. XML and JSON). This is the feature list from their websiteSearch Engine Platform: Solr Search Engine Platform

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Web Traffic
  • Standards Based Open Interfaces – XML, JSON and HTTP
  • Comprehensive HTML Administration Interfaces
  • Server statistics exposed over JMX for monitoring
  • Linearly scalable, auto index replication, auto failover and recovery
  • Near Real-time indexing
  • Flexible and Adaptable with XML configuration
  • Extensible Plugin Architecture

Installing Solr using apt-get (easy way)

Solr doesn’t work alone; it needs a Java servlet container such as Tomcat or Jetty. In this article, we’ll use Jetty, although Tomcat is just as easy. First, we should install the Java JDK. If you want a simple installation, execute the following commands:

sudo apt-get -y install openjdk-7-jdk
mkdir /usr/java
ln -s /usr/lib/jvm/java-7-openjdk-amd64 /usr/java/default

Ubuntu provides 3 Solr packages by default: solr-common, the package that contains the actual Solr code;solr-tomcat, Solr integrated with Tomcat; and solr-jetty, which is just like solr-tomcat but with the Jetty web server. In this article, we will install solr-tomcat, so execute the following command:

sudo apt-get -y install solr-tomcat

Your Solr instance should now be available at http://YOUR_IP:8080/solr. Skip the next section on installing manually if you want to configure Solr.

Installing Solr Manually

To install Solr manually, you will need a little more time. First, we should install the Java JDK. For this section, we will be using Jetty instead of Tomcat. If you want a simple installation, execute the following command:

sudo apt-get -y install openjdk-7-jdk
mkdir /usr/java
ln -s /usr/lib/jvm/java-7-openjdk-amd64 /usr/java/default

We can now start the real installation of Solr. First, download all files and uncompress them:

cd /opt
wget http://archive.apache.org/dist/lucene/solr/4.7.2/solr-4.7.2.tgz
tar -xvf solr-4.7.2.tgz
cp -R solr-4.7.2/example /opt/solr
cd /opt/solr
java -jar start.jar

Check if it works by visiting http://YOUR_IP:8983/solr. When it works, go back into your SSH session and close the window with Ctrl+C. Then open the /etc/default/jetty file (nano /etc/default/jetty) and paste this into it:

NO_START=0 # Start on boot
JAVA_OPTIONS="-Dsolr.solr.home=/opt/solr/solr $JAVA_OPTIONS"
JAVA_HOME=/usr/java/default
JETTY_HOME=/opt/solr
JETTY_USER=solr
JETTY_LOGS=/opt/solr/logs

Save it and open the file /opt/solr/etc/jetty-logging.xml (nano /opt/solr/etc/jetty-logging.xml) and paste this into it:

<?xml version="1.0"?>
  <!DOCTYPE Configure PUBLIC "-//Mort Bay Consulting//DTD Configure//EN" "http://jetty.mortbay.org/configure.dtd">
  <!-- =============================================================== -->
  <!-- Configure stderr and stdout to a Jetty rollover log file -->
  <!-- this configuration file should be used in combination with -->
  <!-- other configuration files.  e.g. -->
  <!--    java -jar start.jar etc/jetty-logging.xml etc/jetty.xml -->
  <!-- =============================================================== -->
  <Configure id="Server" class="org.mortbay.jetty.Server">

      <New id="ServerLog" class="java.io.PrintStream">
        <Arg>
          <New class="org.mortbay.util.RolloverFileOutputStream">
            <Arg><SystemProperty name="jetty.logs" default="."/>/yyyy_mm_dd.stderrout.log</Arg>
            <Arg type="boolean">false</Arg>
            <Arg type="int">90</Arg>
            <Arg><Call class="java.util.TimeZone" name="getTimeZone"><Arg>GMT</Arg></Call></Arg>
            <Get id="ServerLogName" name="datedFilename"/>
          </New>
        </Arg>
      </New>

      <Call class="org.mortbay.log.Log" name="info"><Arg>Redirecting stderr/stdout to <Ref id="ServerLogName"/></Arg></Call>
      <Call class="java.lang.System" name="setErr"><Arg><Ref id="ServerLog"/></Arg></Call>
      <Call class="java.lang.System" name="setOut"><Arg><Ref id="ServerLog"/></Arg></Call></Configure>

Then, create the Solr user and grant it permissions:

sudo useradd -d /opt/solr -s /sbin/false solr
sudo chown solr:solr -R /opt/solr

After that, download the start file and set it to automatically start up if it hasn’t been done already:

sudo wget -O /etc/init.d/jetty http://dev.eclipse.org/svnroot/rt/org.eclipse.jetty/jetty/trunk/jetty-distribution/src/main/resources/bin/jetty.sh
sudo chmod a+x /etc/init.d/jetty
sudo update-rc.d jetty defaults

Finally start Jetty/Solr:

sudo /etc/init.d/jetty start

You can now access your installation just as before at http://YOUR_IP:8983/solr.

Configuring a schema.xml for Solr

First, rename the /opt/solr/solr/collection1 to an understandable name like apples (use whatever name you’d like). (This can be skipped if you installed it using apt-get. In that case, you can execute the following command instead: cd /usr/share/solr):

cd /opt/solr/solr
mv collection1 apples
cd apples

Also, if you installed Solr manually, open the file core.properties (nano core.properties) and change the name to the same name.

Then, remove the data directory and change the schema.xml:

rm -R data
nano conf/schema.xml

Paste your own schema.xml in here. There is a very advanced schema.xml in the Solr Repository. You can probably find a lot more of them on the internet, but I won’t go into depth about that. Restart Jetty/Tomcat:

For the simple installation.

sudo service tomcat6 restart

For the advanced installation.

sudo /etc/init.d/jetty restart

When you now visit your Solr instance, you should see the Dashboard with the collection somewhere.