visit
STEP 1 Setting Up An Azure Blob Storage Account
follow the steps below to create an azure blob storage account, if you don't have one already for this purpose.1. On the Azure portal, click on the storage accounts link on the sidebar, or you can use the search resource option to search for "storage account" if this link isn't present on your side barSTEP 2 Installing The Azure Repository Plugin For Elasticsearch
To start taking snapshots we need to first register a snapshot repository within our elasticsearch cluster, this repository defines where Elasticsearch should store snapshots taken, learn more about it . Remember these repositories could be an HDFS or a cloud storage service and in this case we are using the Azure blob storage service.
To register the repository for azure, ssh into your ES cluster node and enter the following commandssudo bin/elasticsearch-plugin install repository-azure
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-azure
sudo systemctl restart elasticsearch
bin/elasticsearch-keystore add azure.client.default.account
bin/elasticsearch-keystore add azure.client.default.key
enter your account key here, if you have issues running these command you should confirm the location of your elasticsearch bin folder, you could try this path
/usr/share/elasticsearch/bin/elasticsearch-keystore
, it depends on your installation configuration. After setting the values in the keystore, restart your elasticsearch cluster. sudo systemctl restart elasticsearch
Next, we set up a snapshot repository, you can do this by sending a post request to the following endpoint
//eshost:port/_snapshot/name-of-your-repo
and pass the these json payload{
"type": "azure",
"settings": {
"container": "backup-container",
"base_path": "backups",
"chunk_size": "32MB",
"compress": true
}
}
you can leave the type as is, in the settings, the container name should be the name of the container created in your storage account, the
base_path
defines a folder where snapshot data should be stored, this is useful if you are taking snapshot of different indices or even data from different clusters and storing them in one container. The
chunk_size
defines how small big files can be broken down to prior to being transferred . you can get more details about the settings by following this . Below is a screenshot of my settingsYou should get an "acknowledged: true" as a response. You can view all your registered repositories by making a get request to the following endpoint //eshost:port/_snapshot/
Step 3 Taking Actual Snapshots
Now let's move on to taking actual snapshots, for this I've created a sample index "sample_records", we could back this up or better still let's back up all indices in the cluster along with the clusters settings. To do this, make a post request to the following endpoint
//eshost:port/_snapshot/azureblob_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E
with the following payload{
"indices": "index_1,index_2",
"ignore_unavailable": true,
"include_global_state": true,
"partial" : true
}
By default, when ignore_unavailable option is not set and an index is missing the snapshot request will fail. By setting include_global_state to false it’s possible to prevent the cluster global state to be stored as part of the snapshot. By default, the entire snapshot will fail if one or more indices participating in the snapshot don’t have all primary shards available.
This behaviour can be changed by setting partial to true. This is from elasticsearch's official docs . Below is a snapshot of my backup request,
Here, notice I didn't didn't add the indices parameter, when the indices parameter isn't included, all indices present in the cluster is going to be included in the snapshot. Take note of the snapshot name /
%3Csnapshot-%7Bnow%2Fd%7D%3E,
this is the url encoded version of this <snapshot-{now/d}>,
this is translated to the current date the snapshot was taken, i.e /snapshot-2020.04.09,
also note that it isn't a prerequisite to name your backups this way, you can give it any name you want to, this just makes sense in case you are doing daily or weekly backups via cron for example, to be able to reference snapshots easily. Next, we are going to monitor the status of an ongoing snapshot, to do this, send a get request to the following endpoint, //eshost:port/_snapshot/azureblob_backup/<snapshot-name>
, note that using the url encoded now format won't work if you used it to save your snapshot, instead use the literal string, e.g
//eshost:port/_snapshot/azureblob_backup/snapshot-2020.04.09,
you should get the following response.{
"snapshots": [
{
"snapshot": "snapshot-2020.04.09",
"uuid": "OjLZEfXDS-mKVqsSi7VteQ",
"version_id": 6080399,
"version": "6.8.3",
"indices": [
"sample_records"
],
"include_global_state": true,
"state": "SUCCESS",
"start_time": "2020-04-08T09:53:47.926Z",
"start_time_in_millis": 26,
"end_time": "2020-04-08T10:03:15.361Z",
"end_time_in_millis": 61,
"duration_in_millis": 567435,
"failures": [],
"shards": {
"total": 15,
"failed": 0,
"successful": 15
}
}
]
}
Take note of the "indices" array, this shows the names of backed up indices, the state shows the current status of the snapshot it can be either , IN_PROGRESS, FAILED, SUCCESS or PARTIAL, if the snapshot is in a PARTIAL state, it means some indices could not be backed up, the names of these indices are saved in the "failures" array.
Now, lets check our azure storage to see if the snapshot was saved in our container.Restoring A Snapshot
This assume you a new empty cluster you wish to copy your snapshot data to, first on your new cluster you have to configure the Azure repository plugin by following the same steps above, use the same Storage account as that which was used to take the snapshot, keys and all. Ensure also that the base_path matches that which was used in creating the snapshot.
Next, send a post request to the following endpoint to restore your snapshot to the new cluster,
//eshost:port/_snapshot/<repo-name>/<snapshot-name>/_restore
Note that there compatibility requirements for backups, below are the compatibility rangesIf you plan to export data from one ES cluster to another, you need to be aware that not all versions may be compatible with your exported data.A snapshot of an index created in 6.x can be restored to 7.x.
A snapshot of an index created in 5.x can be restored to 6.x.
A snapshot of an index created in 2.x can be restored to 5.x.
A snapshot of an index created in 1.x can be restored to 2.x.
Conversely, snapshots of indices created in 1.x cannot be restored to 5.x or 6.x, snapshots of indices created in 2.x cannot be restored to 6.x or 7.x, and snapshots of indices created in 5.x cannot be restored to 7.x or 8.x.
This is from the official elasticsearch docs.
Bonus! Automating Things
const axios = require("axios")
const nodemailer = require('nodemailer');
/**
* Configure email
*/
let transporter = nodemailer.createTransport({
service: 'emailservice',
auth: {
user: '[email protected]',
pass: '*****************'
}
});
const SNAPSHOT_URL = '//localhost:9200/_snapshot/azureblob_backup/'
const CLUSTER_NAME = 'tutorial_cluster';
let dateObj = new Date();
let month = dateObj.getUTCMonth() + 1; //months from 1-12
let day = dateObj.getUTCDate();
let year = dateObj.getUTCFullYear();
let hour = dateObj.getUTCHours();
let minute = dateObj.getUTCMinutes();
let seconds = dateObj.getUTCSeconds();
let backuptime = `${year}-${month}-${day}-${hour}-${minute}-${seconds}`;
axios.post(`${SNAPSHOT_URL}snapshot-${backuptime}`,{
"ignore_unavailable": true,
"include_global_state": true
}).then((response)=>{
console.log(response.data.accepted)
if(response.data.accepted === true){
console.log("start checking for status")
checker();
}else{
console.log("send failure notification")
notify(`Could not start backup for ${CLUSTER_NAME}`)
}
},(error)=>{
console.log("Backup Not Started Error ===>", error)
notify(`Could not start backup for ${CLUSTER_NAME}`)
});
let checker = function(){
let intervalId = setInterval(() =>{
console.log("checking.....")
axios.get(`${SNAPSHOT_URL}snapshot-${backuptime}`)
.then(function (response) {
// handle success
let status = response.data.snapshots[0].state;
console.log(status);
if(status === 'SUCCESS'){
//send a success mail & clear interval
clearInterval(intervalId);
notify(` ${CLUSTER_NAME} Has Been Backed Up Successfully \n completed in ${milisecConvert(response.data.snapshots[0].duration_in_millis)} minute(s) \n please check //${SNAPSHOT_URL}snapshot-${backuptime} for details`)
}else if(status === 'ABORTED' || status === 'FAILED' ){
//send failure message & clear interval
clearInterval(intervalId);
notify(` ${CLUSTER_NAME} Backup Failed please check //${SNAPSHOT_URL}snapshot-${backuptime} for details `)
}else if(status === 'PARTIAL'){
//send failure message & clear interval
clearInterval(intervalId);
notify(` ${CLUSTER_NAME} Backed up with a few issues please check //${SNAPSHOT_URL}snapshot-${backuptime} for details `)
}
else{
//continue
}
})
.catch(function (error) {
console.log("request status error >>>>>", error)
clearInterval(intervalId);
})
},5000);
}
let notify = (message)=>{
//set mail options
let mailOptions = {
from: '[email protected]',
to: '[email protected]',
subject: ` ${CLUSTER_NAME} Elasticsearch Backup Notification`,
text: message
};
transporter.sendMail(mailOptions, (error, info)=>{
if (error) {
console.log(error);
} else {
console.log('Email sent: ' + info.response);
}
});
}
let milisecConvert = (milisec)=>{
let hours,minutes;
hours = Math.floor(milisec/1000/60/60);
minutes = Math.floor((milisec/1000/60/60 - hours)*60);
return minutes > 1 ? munites : 'less than 1';
}
You need to install the dependencies, and for this script to work and also you should have had your snapshot repository set up already. You can and should set up a cron to run this script at specified intervals.
NOTES
Did this help? let me know.
O dabọ ✌