vinod@techstuff: Solr In Production

This blog helps you in setting up SolrCloud with zookeeper ensemble. assuming you have fair knowledge about Apache Solr and Zookeeper.
Includes detailed steps for configuring SolrCloud with zookeeper ensemble in production environment using AWS EC2 machines.

Using SolrCloud instead of single standalone Solr server.

Why SolrCloud ?

Setting up Solr as a cluster of Solr servers(i.e. solrCloud) gives ability to handle fault tolerance and high availability. other features of SolrCloud

Central configuration for the entire cluster
Automatic load balancing and fail-over for queries
Zookeeper integration for cluster co-ordination and configuration

Zookeeper

Setting Co-ordination and managing a service in a distributed environment is a complicated process. Zookeeper makes this process easy. Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Why External Zookeeper ?

By default Solr comes bundled with Apache Zookeeper, but it is discouraged from using this internal Zookeeper in production. Because shutting down Solr instance will also shut down its Zookeeper server.

To handle this we should have External Zookeeper Ensemble(cluster of zookeeper servers) with more than half its servers running at any given time.

Use formula 2n+1 = number of Zookeeper servers in cluster, where n is number of failure servers to handle. ex: say n=2, we should have total of 5 Zookeeper servers in quorum(zookeeper cluster). if any 2 out of 5 servers goes down, Zookeeper will be still up and running as majority of servers in quorum are active/running.

So, lets start with configuration steps

Am using 6 AWS EC2 machines for this setup, 3 for SolrCloud and remaining 3 for Zookeeper Ensemble. Not mandatory that it should be 6 machines, depends on requirements and availability.

3 nodes for zookeeper
private IP : 9.*.*.11 public IP : 5*.**2.1**.*1*
private IP : 9.*.*.12 public IP : 5*.**2.2**.*2*
private IP : 9.*.*.13 public IP : 5*.**2.3**.*3*

3 nodes for solrcloud
private IP : 9.*.*.14 public IP : 5*.**2.4**.*4*
private IP : 9.*.*.15 public IP : 5*.**2.5**.*5*
private IP : 9.*.*.16 public IP : 5*.**2.6**.*6*

Update the Java to latest version in all 6 machines. because recent version of Solr needs Java-1.8

sudo yum install java-1.8.0

remove other older version(change the command accordingly)

sudo yum remove java-1.7.0-openjdk

Zookeeper setup :

Just to make sure all zookeeper related files goes into same folder, create one folder.

mkdir zookeeper
cd zookeeper

Download Zookeeper from web (change the version accordingly)

wget "http://www-eu.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz"

OR you can copy from local machine(change user and IP accordingly)

sudo scp /home/vinod/zk/zookeeper-3.4.8.tar.gz user@AWS-Public-IP:/home/user/zookeeper

Extract file

tar -xvf zookeeper-3.4.8.tar.gz

Create a directory where you would like Zookeeper to store its data

mkdir data

Copy sample zookeeper configuration file zoo_sample.cfg to zoo.cfg. because by default zk(zookeeper) refer zoo.cfg file for configuration settings when started.

cp zookeeper-3.4.8/conf/zoo_sample.cfg zookeeper-3.4.8/conf/zoo.cfg

Edit default zoo.cfg file and update file

vim zookeeper-3.4.8/conf/zoo.cfg

Update zoo.cfg file like below and save the file. (change username and IP accordingly)

tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/username/zookeeper/data
# the port at which the clients will connect
clientPort=2181

server.1=9.*.*.11:2888:3888
server.2=9.*.*.12:2888:3888
server.3=9.*.*.13:2888:3888

Inside data folder create file with name myid for server number mapping. server.1 i.e 9.*.*.11 should have myid file with value 1 inside it.

cd data
cat > myid

Enter 1 and save it.
Follow the same steps in other two machines. Note that server.2 machine should have myid file with value 2 and server.3 machine myid file with value 3.

Once all this configuration is done in all 3 machines, we are all set to start Zookeeper.

cd home/username/zookeeper/zookeeper-3.4.8
bin/zkServer.sh start

Once Zookeeper started in all 3 machines, check all 3 machines are connected.
Use zkCli.sh from any of 3 connected machines to perform simple, file-like operations. Enter below command, you should see something like [zk: ip:2181(connected) ] in the console.

bin/zkCli.sh -server 9.*.*.11:2181,9.*.*.12:2181,9.*.*.13:2181

Check here for available Zookeeper operations and more details.

SolrCloud setup :
Make sure Java version is updated to latest. if not, follow the steps mentioned before Zookeeper setup.

mkdir SolrCloud
cd SolrCloud

Download Solr from web (change the version accordingly)

wget "http://www.us.apache.org/dist/lucene/solr/6.1.0/solr-6.1.0.tgz"

OR you can copy from local machine(change user and IP accordingly)

sudo scp /home/user/solr/solr-6.1.0.tgz user@AWS-Public-IP:/home/user/SolrCloud

Create solr home directory for cloud setup. can choose any location I am creating here solr-6.1.0/server/solr/node1/solr

cd solr-6.1.0/server/solr
mkdir node1
cd node1
mkdir solr

Copy zoo.cfg & solr.xml from solr-6.1.0/server/solr to solr-6.1.0/server/solr/node1/solr/

cp solr-6.1.0/server/solr/zoo.cfg solr-6.1.0/server/solr/node1/solr/
cp solr-6.1.0/server/solr/solr.xml solr-6.1.0/server/solr/node1/solr/

Now we can start solr in cloud mode. Follow the same steps in other two machines. start solr 3 machines in cloud mode with external zookeeper which is running in other 3 machines by pointing to its IP.

bin/solr start -cloud -s server/solr/node1/solr -p 8983 -z 9.*.*.11:2181,9.*.*.12:2181,9.*.*.13:2181

Now we have 3 node SolrCloud and 3 node Zookeeper running.
Let's create solr collection, before that we need to upload solr configuration files(schema.xml, solrconfig.xml etc) to Zookeeper.
By default solr comes with sample config files(found inside server/solr/configsets) and scripts(server/scripts) which helps to upload these to zookeeper node. from any solr machine

server/cloud-scripts/zkcli.sh -zkhost 9.*.*.11:2181 -cmd upconfig -confname solr_configs -confdir /server/solr/configsets/
sample_techproducts_configs/conf

Create collection by using the configs uploaded to Zookeeper.
Example:

http://5*.**2.4**.*4*:8983/solr/admin/collections?
action=CREATE&name=collection1&numShards=2&replicationFactor=2&collection.configName=solr_configs

Hope this helps,
Vinod

4 comments:

suresh_269231 August 2017 at 23:30
Great post!!!
MR-KAHLOON5 November 2018 at 23:34
hi, nice one.. I need your kind response, i want to store date-wise data in each shard e.g 1st oct shard1, 2nd oct shard2,3rd oct shard3 and so on. please help me how can i do explicitly, with minimum human intervention. Please need your kindness.
kind regards
Muhammad Rehman KAHLOON
yoursrehman@gmail.com
Unknown15 November 2018 at 01:06
This comment has been removed by the author.

vinod@techstuff

Thursday 31 August 2017

Solr In Production

Why SolrCloud ?

Zookeeper

Why External Zookeeper ?

4 comments:

Blog Archive