DistCp Between HA Clusters
To copy data between HA clusters, use the dfs.internal.nameservices property
      in the hdfs-site.xml file to explicitly specify the name services belonging to
      the local cluster, while continuing to use the dfs.nameservices property to
      specify all of the name services in the local and remote clusters.
Use the following steps to copy data between HA clusters:
Modify the following properties in the hdfs-site.xml file for both cluster A
      and cluster B:
- Add both name services to - dfs.nameservices= HAA, HAB
- Add the - dfs.internal.nameservicesproperty:- In cluster A: - dfs.internal.nameservices = HAA
- In cluster B: - dfs.internal.nameservices = HAB
 
- Add - dfs.ha.namenodes.<nameservice>to both clusters:- In cluster A - dfs.ha.namenodes.HAB = nn1,nn2
- In cluster B - dfs.ha.namenodes.HAA = nn1,nn2
 
- Add the - dfs.namenode.rpc-address.<cluster>.<nn>property:- In Cluster A: - dfs.namenode.rpc-address.HAB.nn1 = <NN1_fqdn>:8020- dfs.namenode.rpc-address.HAB.nn2 = <NN2_fqdn>:8020
- In Cluster B: - dfs.namenode.rpc-address.HAA.nn1 = <NN1_fqdn>:8020- dfs.namenode.rpc-address.HAA.nn2 = <NN2_fqdn>:8020
 
- Add the following properties to enable - distcpover WebHDFS and secure WebHDFS:- In Cluster A: - dfs.namenode.http-address.HAB.nn1 = <NN1_fqdn>:50070- dfs.namenode.http-address.HAB.nn2 = <NN2_fqdn>:50070- dfs.namenode.https-address.HAB.nn1 = <NN1_fqdn>:50470- dfs.namenode.https-address.HAB.nn2 = <NN2_fqdn>:50470
- In Cluster B: - dfs.namenode.http-address.HAA.nn1 = <NN1_fqdn>:50070- dfs.namenode.http-address.HAA.nn2 = <NN2_fqdn>:50070- dfs.namenode.https-address.HAA.nn1 = <NN1_fqdn>:50470- dfs.namenode.https-address.HAA.nn2 = <NN2_fqdn>:50470
 
- Add the - dfs.client.failover.proxy.provider.<cluster>property:- In cluster A: - dfs.client.failover.proxy.provider. HAB = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
- In cluster B: - dfs.client.failover.proxy.provider. HAA = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
 
- Restart the HDFS service, then run the - distcpcommand using the NameService. For example:- hadoop distcp hdfs://HAA/tmp/testDistcp hdfs://HAB/tmp/ 

