This blog helps you in setting up SolrCloud with zookeeper ensemble. assuming you have fair knowledge about Apache Solr and Zookeeper.
Includes detailed steps for configuring SolrCloud with zookeeper ensemble in production environment using AWS EC2 machines.
Using SolrCloud instead of single standalone Solr server.
So, lets start with configuration steps
Am using 6 AWS EC2 machines for this setup, 3 for SolrCloud and remaining 3 for Zookeeper Ensemble. Not mandatory that it should be 6 machines, depends on requirements and availability.
3 nodes for zookeeper
private IP : 9.*.*.11 public IP : 5*.**2.1**.*1*
private IP : 9.*.*.12 public IP : 5*.**2.2**.*2*
private IP : 9.*.*.13 public IP : 5*.**2.3**.*3*
3 nodes for solrcloud
private IP : 9.*.*.14 public IP : 5*.**2.4**.*4*
private IP : 9.*.*.15 public IP : 5*.**2.5**.*5*
private IP : 9.*.*.16 public IP : 5*.**2.6**.*6*
Update the Java to latest version in all 6 machines. because recent version of Solr needs Java-1.8
Includes detailed steps for configuring SolrCloud with zookeeper ensemble in production environment using AWS EC2 machines.
Using SolrCloud instead of single standalone Solr server.
Why SolrCloud ?
Setting up Solr as a cluster of Solr servers(i.e. solrCloud) gives ability to handle fault tolerance and high availability. other features of SolrCloud- Central configuration for the entire cluster
- Automatic load balancing and fail-over for queries
- Zookeeper integration for cluster co-ordination and configuration
Zookeeper
Setting Co-ordination and managing a service in a distributed environment is a complicated process. Zookeeper makes this process easy. Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.Why External Zookeeper ?
By default Solr comes bundled with Apache Zookeeper, but it is discouraged from using this internal Zookeeper in production. Because shutting down Solr instance will also shut down its Zookeeper server.
To
handle this we should have External Zookeeper Ensemble(cluster of
zookeeper servers) with more than half its servers running at any given
time.
Use
formula 2n+1 = number of Zookeeper servers in cluster, where n is
number of failure servers to handle. ex: say n=2, we should have total
of 5 Zookeeper servers in quorum(zookeeper cluster). if any 2 out of 5
servers goes down, Zookeeper will be still up and running as majority of
servers in quorum are active/running.
So, lets start with configuration steps
Am using 6 AWS EC2 machines for this setup, 3 for SolrCloud and remaining 3 for Zookeeper Ensemble. Not mandatory that it should be 6 machines, depends on requirements and availability.
3 nodes for zookeeper
private IP : 9.*.*.11 public IP : 5*.**2.1**.*1*
private IP : 9.*.*.12 public IP : 5*.**2.2**.*2*
private IP : 9.*.*.13 public IP : 5*.**2.3**.*3*
3 nodes for solrcloud
private IP : 9.*.*.14 public IP : 5*.**2.4**.*4*
private IP : 9.*.*.15 public IP : 5*.**2.5**.*5*
private IP : 9.*.*.16 public IP : 5*.**2.6**.*6*
Update the Java to latest version in all 6 machines. because recent version of Solr needs Java-1.8
- sudo yum install java-1.8.0
- sudo yum remove java-1.7.0-openjdk
Zookeeper setup :
Just to make sure all zookeeper related files goes into same folder, create one folder.
- mkdir zookeeper
- cd zookeeper
Download Zookeeper from web (change the version accordingly)
- wget "http://www-eu.apache.org/dist/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz"
OR you can copy from local machine(change user and IP accordingly)
- sudo scp /home/vinod/zk/zookeeper-3.4.8.tar.gz user@AWS-Public-IP:/home/user/zookeeper
Extract file
- tar -xvf zookeeper-3.4.8.tar.gz
Create a directory where you would like Zookeeper to store its data
- mkdir data
Copy sample zookeeper configuration file
zoo_sample.cfg to zoo.cfg. because by default zk(zookeeper) refer
zoo.cfg file for configuration settings when started.
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/username/zookeeper/data
# the port at which the clients will connect
clientPort=2181
server.1=9.*.*.11:2888:3888
server.2=9.*.*.12:2888:3888
server.3=9.*.*.13:2888:3888
Inside data folder create file with name myid for server number mapping. server.1 i.e 9.*.*.11 should have myid file with value 1 inside it.
Follow the same steps in other two machines. Note that server.2 machine should have myid file with value 2 and server.3 machine myid file with value 3.
Once all this configuration is done in all 3 machines, we are all set to start Zookeeper.
Use zkCli.sh from any of 3 connected machines to perform simple, file-like operations. Enter below command, you should see something like [zk: ip:2181(connected) ] in the console.
SolrCloud setup :
Make sure Java version is updated to latest. if not, follow the steps mentioned before Zookeeper setup.
Let's create solr collection, before that we need to upload solr configuration files(schema.xml, solrconfig.xml etc) to Zookeeper.
By default solr comes with sample config files(found inside server/solr/configsets) and scripts(server/scripts) which helps to upload these to zookeeper node. from any solr machine
Example:
Hope this helps,
Vinod
- cp zookeeper-3.4.8/conf/zoo_sample.cfg zookeeper-3.4.8/conf/zoo.cfg
- vim zookeeper-3.4.8/conf/zoo.cfg
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/username/zookeeper/data
# the port at which the clients will connect
clientPort=2181
server.1=9.*.*.11:2888:3888
server.2=9.*.*.12:2888:3888
server.3=9.*.*.13:2888:3888
Inside data folder create file with name myid for server number mapping. server.1 i.e 9.*.*.11 should have myid file with value 1 inside it.
- cd data
- cat > myid
Follow the same steps in other two machines. Note that server.2 machine should have myid file with value 2 and server.3 machine myid file with value 3.
Once all this configuration is done in all 3 machines, we are all set to start Zookeeper.
- cd home/username/zookeeper/zookeeper-3.4.8
- bin/zkServer.sh start
Use zkCli.sh from any of 3 connected machines to perform simple, file-like operations. Enter below command, you should see something like [zk: ip:2181(connected) ] in the console.
- bin/zkCli.sh -server 9.*.*.11:2181,9.*.*.12:2181,9.*.*.13:2181
SolrCloud setup :
Make sure Java version is updated to latest. if not, follow the steps mentioned before Zookeeper setup.
- mkdir SolrCloud
- cd SolrCloud
Download Solr from web (change the version accordingly)
- wget "http://www.us.apache.org/dist/lucene/solr/6.1.0/solr-6.1.0.tgz"
OR you can copy from local machine(change user and IP accordingly)
- sudo scp /home/user/solr/solr-6.1.0.tgz user@AWS-Public-IP:/home/user/SolrCloud
- cd solr-6.1.0/server/solr
- mkdir node1
- cd node1
- mkdir solr
- cp solr-6.1.0/server/solr/zoo.cfg solr-6.1.0/server/solr/node1/solr/
- cp solr-6.1.0/server/solr/solr.xml solr-6.1.0/server/solr/node1/solr/
- bin/solr start -cloud -s server/solr/node1/solr -p 8983 -z 9.*.*.11:2181,9.*.*.12:2181,9.*.*.13:2181
Let's create solr collection, before that we need to upload solr configuration files(schema.xml, solrconfig.xml etc) to Zookeeper.
By default solr comes with sample config files(found inside server/solr/configsets) and scripts(server/scripts) which helps to upload these to zookeeper node. from any solr machine
- server/cloud-scripts/zkcli.sh -zkhost 9.*.*.11:2181 -cmd upconfig -confname solr_configs -confdir /server/solr/configsets/
sample_techproducts_configs/conf
Example:
- http://5*.**2.4**.*4*:8983/solr/admin/collections?
action=CREATE&name=collection1&numShards=2&replicationFactor=2&collection.configName=solr_configs
Hope this helps,
Vinod