HBase Backup:
We need
to have backup of HBase table offline in some point of time, in spite of the
fact that Hadoop and HBase provide replication and redundancy. For this we have
some backup option in HBase. These are categorized in two ways:
Online backup
Again this is categorized in three
ways
Replication: In
this method you need to have a 2nd cluster where you will keep your
replication for the data from the 1st cluster.
Hadoop/HBase
Export command: which runs a map reduce job to copy table from one cluster to
the same cluster or to other Hadoop cluster. This does not require any kind of
downtime for backing/ exporting data.
In this method
we need to export the data to the cluster and if we need to restore we need to
restore it by Importing.
CopyTable: this
is also online backup method which copies table from one cluster to another
cluster or to the same cluster.
Offline Backup:
Distcp : this is
a kind of file system backup, this copies a directory from HDFS to same cluster
or to other cluster.
copyToLocal :
this is less reliable way of copying directories from HDFS to local backup
drive. If large amount of data is there then you need lot of Hadoop tune-up to
copy successfully.
Offline Backup
methods are full shutdown backup method, suppose you need to copy HBase you
need to stop your HBase cluster, for a successful backup, as the files are
being continuously moved, modified and changes while cluster is online, and
copying in this scenario may fail.
No comments:
Post a Comment
Thank you for Commenting Will reply soon ......