org.apache.hadoop.hive.shims
Interface HadoopShims


public interface HadoopShims

In order to be compatible with multiple versions of Hadoop, all parts of the Hadoop interface that are not cross-version compatible are encapsulated in an implementation of this class. Users should use the ShimLoader class as a factory to obtain an implementation of HadoopShims corresponding to the version of Hadoop currently on the classpath.


Nested Class Summary
static interface HadoopShims.CombineFileInputFormatShim<K,V>
          CombineFileInputFormatShim.
static interface HadoopShims.InputSplitShim
          InputSplitShim.
static class HadoopShims.JobTrackerState
           
static interface HadoopShims.MiniDFSShim
          Shim around the functions in MiniDFSCluster that Hive uses.
 
Field Summary
static org.apache.commons.logging.Log LOG
           
 
Method Summary
 void closeAllForUGI(org.apache.hadoop.security.UserGroupInformation ugi)
          Get the UGI that the given job configuration will run as.
 int compareText(org.apache.hadoop.io.Text a, org.apache.hadoop.io.Text b)
          We define this function here to make the code compatible between hadoop 0.17 and hadoop 0.20.
 int createHadoopArchive(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path parentDir, org.apache.hadoop.fs.Path destDir, String archiveName)
           
 org.apache.hadoop.security.UserGroupInformation createRemoteUser(String userName, List<String> groupNames)
          Used by metastore server to creates UGI object for a remote user.
<T> T
doAs(org.apache.hadoop.security.UserGroupInformation ugi, PrivilegedExceptionAction<T> pvea)
          Used by metastore server to perform requested rpc in client context.
 boolean fileSystemDeleteOnExit(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
          Calls fs.deleteOnExit(path) if such a function exists.
 long getAccessTime(org.apache.hadoop.fs.FileStatus file)
          return the last access time of the given file.
 HadoopShims.CombineFileInputFormatShim getCombineFileInputFormat()
           
 URI getHarUri(URI original, URI base, URI originalBase)
           
 String getInputFormatClassName()
           
 String getJobLauncherHttpAddress(org.apache.hadoop.conf.Configuration conf)
          All references to jobtracker/resource manager http address in the configuration should be done through this shim
 String getJobLauncherRpcAddress(org.apache.hadoop.conf.Configuration conf)
          All retrieval of jobtracker/resource manager rpc address in the configuration should be done through this shim
 HadoopShims.JobTrackerState getJobTrackerState(org.apache.hadoop.mapred.ClusterStatus clusterStatus)
          Convert the ClusterStatus to its Thrift equivalent: JobTrackerState.
 HadoopShims.MiniDFSShim getMiniDfs(org.apache.hadoop.conf.Configuration conf, int numDataNodes, boolean format, String[] racks)
          Returns a shim to wrap MiniDFSCluster.
 String getShortUserName(org.apache.hadoop.security.UserGroupInformation ugi)
          Get the short name corresponding to the subject in the passed UGI In secure versions of Hadoop, this returns the short name (after undergoing the translation in the kerberos name rule mapping).
 String getTaskAttemptLogUrl(org.apache.hadoop.mapred.JobConf conf, String taskTrackerHttpAddress, String taskAttemptId)
          Constructs and Returns TaskAttempt Log Url or null if the TaskLogServlet is not available
 String[] getTaskJobIDs(org.apache.hadoop.mapred.TaskCompletionEvent t)
          getTaskJobIDs returns an array of String with two elements.
 String getTokenStrForm(String tokenSignature)
          Get the string form of the token given a token signature.
 org.apache.hadoop.security.UserGroupInformation getUGIForConf(org.apache.hadoop.conf.Configuration conf)
           
 void inputFormatValidateInput(org.apache.hadoop.mapred.InputFormat fmt, org.apache.hadoop.mapred.JobConf conf)
          Calls fmt.validateInput(conf) if such a function exists.
 boolean isJobPreparing(org.apache.hadoop.mapred.RunningJob job)
          Return true if the job has not switched to RUNNING state yet and is still in PREP state
 boolean isLocalMode(org.apache.hadoop.conf.Configuration conf)
          Check wether MR is configured to run in local-mode
 boolean isSecureShimImpl()
          Return true if the Shim is based on Hadoop Security APIs.
 org.apache.hadoop.mapreduce.JobContext newJobContext(org.apache.hadoop.mapreduce.Job job)
           
 org.apache.hadoop.mapreduce.TaskAttemptContext newTaskAttemptContext(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.util.Progressable progressable)
           
 void prepareJobOutput(org.apache.hadoop.mapred.JobConf conf)
          Hive uses side effect files exclusively for it's output.
 void setFloatConf(org.apache.hadoop.conf.Configuration conf, String varName, float val)
          Wrapper for Configuration.setFloat, which was not introduced until 0.20.
 void setJobLauncherRpcAddress(org.apache.hadoop.conf.Configuration conf, String val)
          All updates to jobtracker/resource manager rpc address in the configuration should be done through this shim
 void setTmpFiles(String prop, String files)
          If JobClient.getCommandLineConfig exists, sets the given property/value pair in that Configuration object.
 String unquoteHtmlChars(String item)
          Used by TaskLogProcessor to Remove HTML quoting from a string
 boolean usesJobShell()
          Return true if the current version of Hadoop uses the JobShell for command line interpretation.
 

Field Detail

LOG

static final org.apache.commons.logging.Log LOG
Method Detail

usesJobShell

boolean usesJobShell()
Return true if the current version of Hadoop uses the JobShell for command line interpretation.


getTaskAttemptLogUrl

String getTaskAttemptLogUrl(org.apache.hadoop.mapred.JobConf conf,
                            String taskTrackerHttpAddress,
                            String taskAttemptId)
                            throws MalformedURLException
Constructs and Returns TaskAttempt Log Url or null if the TaskLogServlet is not available

Returns:
TaskAttempt Log Url
Throws:
MalformedURLException

isJobPreparing

boolean isJobPreparing(org.apache.hadoop.mapred.RunningJob job)
                       throws IOException
Return true if the job has not switched to RUNNING state yet and is still in PREP state

Throws:
IOException

fileSystemDeleteOnExit

boolean fileSystemDeleteOnExit(org.apache.hadoop.fs.FileSystem fs,
                               org.apache.hadoop.fs.Path path)
                               throws IOException
Calls fs.deleteOnExit(path) if such a function exists.

Returns:
true if the call was successful
Throws:
IOException

inputFormatValidateInput

void inputFormatValidateInput(org.apache.hadoop.mapred.InputFormat fmt,
                              org.apache.hadoop.mapred.JobConf conf)
                              throws IOException
Calls fmt.validateInput(conf) if such a function exists.

Throws:
IOException

setTmpFiles

void setTmpFiles(String prop,
                 String files)
If JobClient.getCommandLineConfig exists, sets the given property/value pair in that Configuration object. This applies for Hadoop 0.17 through 0.19


getAccessTime

long getAccessTime(org.apache.hadoop.fs.FileStatus file)
return the last access time of the given file.

Parameters:
file -
Returns:
last access time. -1 if not supported.

getMiniDfs

HadoopShims.MiniDFSShim getMiniDfs(org.apache.hadoop.conf.Configuration conf,
                                   int numDataNodes,
                                   boolean format,
                                   String[] racks)
                                   throws IOException
Returns a shim to wrap MiniDFSCluster. This is necessary since this class was moved from org.apache.hadoop.dfs to org.apache.hadoop.hdfs

Throws:
IOException

compareText

int compareText(org.apache.hadoop.io.Text a,
                org.apache.hadoop.io.Text b)
We define this function here to make the code compatible between hadoop 0.17 and hadoop 0.20. Hive binary that compiled Text.compareTo(Text) with hadoop 0.20 won't work with hadoop 0.17 because in hadoop 0.20, Text.compareTo(Text) is implemented in org.apache.hadoop.io.BinaryComparable, and Java compiler references that class, which is not available in hadoop 0.17.


getCombineFileInputFormat

HadoopShims.CombineFileInputFormatShim getCombineFileInputFormat()

getInputFormatClassName

String getInputFormatClassName()

setFloatConf

void setFloatConf(org.apache.hadoop.conf.Configuration conf,
                  String varName,
                  float val)
Wrapper for Configuration.setFloat, which was not introduced until 0.20.


getTaskJobIDs

String[] getTaskJobIDs(org.apache.hadoop.mapred.TaskCompletionEvent t)
getTaskJobIDs returns an array of String with two elements. The first element is a string representing the task id and the second is a string representing the job id. This is necessary as TaskID and TaskAttemptID are not supported in Haddop 0.17


createHadoopArchive

int createHadoopArchive(org.apache.hadoop.conf.Configuration conf,
                        org.apache.hadoop.fs.Path parentDir,
                        org.apache.hadoop.fs.Path destDir,
                        String archiveName)
                        throws Exception
Throws:
Exception

getHarUri

URI getHarUri(URI original,
              URI base,
              URI originalBase)
              throws URISyntaxException
Throws:
URISyntaxException

prepareJobOutput

void prepareJobOutput(org.apache.hadoop.mapred.JobConf conf)
Hive uses side effect files exclusively for it's output. It also manages the setup/cleanup/commit of output from the hive client. As a result it does not need support for the same inside the MR framework This routine sets the appropriate options related to bypass setup/cleanup/commit support in the MR framework, but does not set the OutputFormat class.


unquoteHtmlChars

String unquoteHtmlChars(String item)
Used by TaskLogProcessor to Remove HTML quoting from a string

Parameters:
item - the string to unquote
Returns:
the unquoted string

closeAllForUGI

void closeAllForUGI(org.apache.hadoop.security.UserGroupInformation ugi)
Get the UGI that the given job configuration will run as. In secure versions of Hadoop, this simply returns the current access control context's user, ignoring the configuration.


getUGIForConf

org.apache.hadoop.security.UserGroupInformation getUGIForConf(org.apache.hadoop.conf.Configuration conf)
                                                              throws LoginException,
                                                                     IOException
Throws:
LoginException
IOException

doAs

<T> T doAs(org.apache.hadoop.security.UserGroupInformation ugi,
           PrivilegedExceptionAction<T> pvea)
       throws IOException,
              InterruptedException
Used by metastore server to perform requested rpc in client context.

Parameters:
ugi -
pvea -
Throws:
IOException
InterruptedException

createRemoteUser

org.apache.hadoop.security.UserGroupInformation createRemoteUser(String userName,
                                                                 List<String> groupNames)
Used by metastore server to creates UGI object for a remote user.

Parameters:
userName - remote User Name
groupNames - group names associated with remote user name
Returns:
UGI created for the remote user.

getShortUserName

String getShortUserName(org.apache.hadoop.security.UserGroupInformation ugi)
Get the short name corresponding to the subject in the passed UGI In secure versions of Hadoop, this returns the short name (after undergoing the translation in the kerberos name rule mapping). In unsecure versions of Hadoop, this returns the name of the subject


isSecureShimImpl

boolean isSecureShimImpl()
Return true if the Shim is based on Hadoop Security APIs.


getTokenStrForm

String getTokenStrForm(String tokenSignature)
                       throws IOException
Get the string form of the token given a token signature. The signature is used as the value of the "service" field in the token for lookup. Ref: AbstractDelegationTokenSelector in Hadoop. If there exists such a token in the token cache (credential store) of the job, the lookup returns that. This is relevant only when running against a "secure" hadoop release The method gets hold of the tokens if they are set up by hadoop - this should happen on the map/reduce tasks if the client added the tokens into hadoop's credential store in the front end during job submission. The method will select the hive delegation token among the set of tokens and return the string form of it

Parameters:
tokenSignature -
Returns:
the string form of the token found
Throws:
IOException

getJobTrackerState

HadoopShims.JobTrackerState getJobTrackerState(org.apache.hadoop.mapred.ClusterStatus clusterStatus)
                                               throws Exception
Convert the ClusterStatus to its Thrift equivalent: JobTrackerState. See MAPREDUCE-2455 for why this is a part of the shim.

Parameters:
clusterStatus -
Returns:
the matching JobTrackerState
Throws:
Exception - if no equivalent JobTrackerState exists

newTaskAttemptContext

org.apache.hadoop.mapreduce.TaskAttemptContext newTaskAttemptContext(org.apache.hadoop.conf.Configuration conf,
                                                                     org.apache.hadoop.util.Progressable progressable)

newJobContext

org.apache.hadoop.mapreduce.JobContext newJobContext(org.apache.hadoop.mapreduce.Job job)

isLocalMode

boolean isLocalMode(org.apache.hadoop.conf.Configuration conf)
Check wether MR is configured to run in local-mode

Parameters:
conf -
Returns:

getJobLauncherRpcAddress

String getJobLauncherRpcAddress(org.apache.hadoop.conf.Configuration conf)
All retrieval of jobtracker/resource manager rpc address in the configuration should be done through this shim

Parameters:
conf -
Returns:

setJobLauncherRpcAddress

void setJobLauncherRpcAddress(org.apache.hadoop.conf.Configuration conf,
                              String val)
All updates to jobtracker/resource manager rpc address in the configuration should be done through this shim

Parameters:
conf -

getJobLauncherHttpAddress

String getJobLauncherHttpAddress(org.apache.hadoop.conf.Configuration conf)
All references to jobtracker/resource manager http address in the configuration should be done through this shim

Parameters:
conf -
Returns:


Copyright © 2013 The Apache Software Foundation