org.apache.hadoop.hive.ql.exec
Class SkewJoinHandler
java.lang.Object
org.apache.hadoop.hive.ql.exec.SkewJoinHandler
public class SkewJoinHandler
- extends Object
At runtime in Join, we output big keys in one table into one corresponding
directories, and all same keys in other tables into different dirs(one for
each table). The directories will look like:
-
dir-T1-bigkeys(containing big keys in T1), dir-T2-keys(containing keys which
is big in T1),dir-T3-keys(containing keys which is big in T1), ...
-
dir-T1-keys(containing keys which is big in T2), dir-T2-bigkeys(containing
big keys in T2),dir-T3-keys(containing keys which is big in T2), ...
-
dir-T1-keys(containing keys which is big in T3), dir-T2-keys(containing big
keys in T3),dir-T3-bigkeys(containing keys which is big in T3), ... .....
For each skew key, we first write all values to a local tmp file. At the time
of ending the current group, the local tmp file will be uploaded to hdfs.
Right now, we use one file per skew key.
For more info, please see https://issues.apache.org/jira/browse/HIVE-964.
Field Summary |
int |
currBigKeyTag
|
protected static org.apache.commons.logging.Log |
LOG
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
protected static final org.apache.commons.logging.Log LOG
currBigKeyTag
public int currBigKeyTag
SkewJoinHandler
public SkewJoinHandler(CommonJoinOperator<? extends OperatorDesc> joinOp)
initiliaze
public void initiliaze(org.apache.hadoop.conf.Configuration hconf)
handleSkew
public void handleSkew(int tag)
throws HiveException
- Throws:
HiveException
close
public void close(boolean abort)
throws HiveException
- Throws:
HiveException
setSkewJoinJobCounter
public void setSkewJoinJobCounter(org.apache.hadoop.io.LongWritable skewjoinFollowupJobs)
updateSkewJoinJobCounter
public void updateSkewJoinJobCounter(int tag)
Copyright © 2013 The Apache Software Foundation