In the ever-expanding world of big data, Elasticsearch has emerged as a front-runner in the search and analytics engine space. Its ability to handle large volumes of data efficiently makes it a go-to choice for many organizations. This article delves into the process of setting up a multi-node Elasticsearch cluster on an Ubuntu server, providing a detailed and comprehensive guide for systems administrators, DevOps engineers, and data professionals.

Advertisement

This guide focuses on establishing a three-node Elasticsearch cluster on Ubuntu, specifically configured for the following nodes:


NODE_1: 192.168.10.101
NODE_2: 192.168.10.102
NODE_3: 192.168.10.103

What is Elasticsearch?

Elasticsearch is an open-source, distributed search and analytics engine designed for horizontal scalability, reliability, and real-time search. It’s often used for log analytics, full-text search, and operational intelligence use cases.

Prerequisites

Before embarking on setting up a multi-node Elasticsearch cluster, ensure you have the following:

  • Multiple Ubuntu servers (physical or virtual) for the nodes.
  • Basic knowledge of Linux command line and networking.
  • Sudo or root access on the servers.

Step 1: Installing Elasticsearch (On Each Node)

  1. Update and Install Java: Elasticsearch requires Java. Install OpenJDK:
    sudo apt update 
    sudo apt install openjdk-11-jdk 
    
  2. Import Elasticsearch PGP Key: Securely download and install Elasticsearch using the official PGP key:
    wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg 
    
  3. You may need to install the apt-transport-https package on Debian before proceeding:
    sudo apt-get install apt-transport-https 
    
  4. Add Elasticsearch Repository: Save the repository definition to /etc/apt/sources.list.d/elastic-8.x.list:
    echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list 
    
  5. Install Elasticsearch: Update the Apt cache and install Elasticsearch.
    sudo apt update  
    sudo apt install elasticsearch 
    

Step 2: Configuring Elasticsearch

  1. Edit the Elasticsearch Configuration: Modify /etc/elasticsearch/elasticsearch.yml on each node.
    • On NODE_1:
      
      node.name: "NODE_1"
      network.host: 192.168.10.101
      cluster.initial_master_nodes: ["NODE_1", "NODE_2", "NODE_3"]
      
      
    • On NODE_2:
      
      node.name: "NODE_2"
      network.host: 192.168.10.102
      cluster.initial_master_nodes: ["NODE_1", "NODE_2", "NODE_3"]
      
      
    • On NODE_3:
      
      node.name: "NODE_3"
      network.host: 192.168.10.103
      cluster.initial_master_nodes: ["NODE_1", "NODE_2", "NODE_3"]
      
      
  2. Configure JVM Options: Adjust JVM settings in /etc/elasticsearch/jvm.options if needed.

Step 3: Setting Up Cluster Networking

  • Open Necessary Ports: Ensure that the ports Elasticsearch uses (default 9200 and 9300) are open on each node.
  • Configure Firewall (if applicable): Adjust firewall settings to allow traffic on these ports.

Step 4: Starting and Verifying the Cluster

  1. Start Elasticsearch Service:
    sudo systemctl start elasticsearch 
    
  2. Enable Elasticsearch on Boot:
    sudo systemctl enable elasticsearch 
    
  3. Check Cluster Health: Use curl to verify the cluster’s health:
    curl -X GET "localhost:9200/_cluster/health?pretty"
     
    

Step 5: Cluster Maintenance and Management

  • Monitoring: Regularly monitor cluster health and performance.
  • Scaling: Add more nodes as required.
  • Backup and Recovery: Implement a backup and recovery plan.

Conclusion

Setting up a multi-node Elasticsearch cluster on Ubuntu involves careful planning and execution. By following the steps outlined in this guide, you can establish a robust, scalable, and efficient search and analytics platform that leverages the power of Elasticsearch. Remember, regular maintenance and monitoring are key to ensuring the long-term health and performance of your Elasticsearch cluster.

Further Reading:

Share.

8 Comments

  1. Akshay Hegde on

    Hi Rahul sir,
    i had a doubt about the first step ,assigning the ip addresses to the nodes,
    should we add the following ip to hosts file in etc/hosts directory??
    Can you please help me on this.
    Thank you in advance

  2. techmechanik on

    Great write up! Thank you very much! Going build this; I shall provide an update once completed.
    Do you also have a logtash and kibana write up to above article?

  3. Hi Rahul,

    One QQ .. How can we setup ES cluster to get access to it with a single URL instead of node specific URL?
    Note – If current master fails this URL should be able to automatically re-point to newly elected master.

    Regards,
    Atul

  4. Hi Rahul,
    I have one doubt in this tutorial.
    Where is the relation in between all these three nodes?
    .yml files are different. ES runs on different host:port and how come data inserted on one host be available to others? Please specify the setting which ensures the relation/connection between these 3 nodes. I think that is missing.

    br,
    Sunil

    • Hi Sunil,

      ES automatically works on cluster environments. You just need to keep cluster.name same on all nodes you want to replicate. When ES started, it automatically searches for all nodes with same cluster.name on local network and start replication automatically.

      • Sunil Chaudhari on

        Hi Rahul,
        Thanks for quick reply.
        I have single host ready for environment set up.
        1) Can I create 2 data nodes and 1 master node n single host?
        2) How much ES instaces I need to run? If more than 1 instance is required
        then how can I run those?
        I use command [user@xxx bin] $ sudo service elasticsearch start
        3) Do I need to create more elasticsearch.yml example elasticsearch.0.yml, elasticsearch.1.yml and so on?

        Also please share steps to secure ES behind nginx.

        thanks,
        Sunil

      • Hi Rahul,
        I have started cluster with one master node and 2 data nodes.
        However when I do curl http://localhost:9200/_nodes/process?pretty, It shows only a master node and 1 data node. It doesn’t show second data node as below. Why so?

        {
        “cluster_name” : “presit-elasticsearch”,
        “nodes” : {
        “bC83hVbgRaiHuPrOn133dA” : {
        “name” : “presit-data-node-1”,
        “transport_address” : “inet[/xx.xxx.xx.xx:9300]”,
        “host” : “hostname1”,
        “ip” : “xx.xxx.xx.xx”,
        “version” : “1.5.2”,
        “build” : “62ff986”,
        “http_address” : “inet[/xx.xxx.xx.xx:9200]”,
        “attributes” : {
        “master” : “false”
        },
        “process” : {
        “refresh_interval_in_millis” : 1000,
        “id” : 10278,
        “max_file_descriptors” : 65535,
        “mlockall” : false
        }
        },
        “zjPvIyRpSYGUrRg1-ZX5nw” : {
        “name” : “presit-master-Node”,
        “transport_address” : “inet[/xx.xxx.xx.xx:9300]”,
        “host” : “hostname1”,
        “ip” : “xx.xxx.xx.xx”,
        “version” : “1.5.2”,
        “build” : “62ff986”,
        “http_address” : “inet[/xx.xxx.xx.xx:9200]”,
        “attributes” : {
        “data” : “false”,
        “master” : “true”
        },
        “process” : {
        “refresh_interval_in_millis” : 1000,
        “id” : 22738,
        “max_file_descriptors” : 65535,
        “mlockall” : false
        }
        }
        }

Exit mobile version