SkewJoinHandler (Hive 0.10.0-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.hive.ql.exec
Class SkewJoinHandler

java.lang.Object
  org.apache.hadoop.hive.ql.exec.SkewJoinHandler

public class SkewJoinHandler
extends Object
extends Object

At runtime in Join, we output big keys in one table into one corresponding directories, and all same keys in other tables into different dirs(one for each table). The directories will look like:

dir-T1-bigkeys(containing big keys in T1), dir-T2-keys(containing keys which is big in T1),dir-T3-keys(containing keys which is big in T1), ...
dir-T1-keys(containing keys which is big in T2), dir-T2-bigkeys(containing big keys in T2),dir-T3-keys(containing keys which is big in T2), ...
dir-T1-keys(containing keys which is big in T3), dir-T2-keys(containing big keys in T3),dir-T3-bigkeys(containing keys which is big in T3), ... .....

For each skew key, we first write all values to a local tmp file. At the time of ending the current group, the local tmp file will be uploaded to hdfs. Right now, we use one file per skew key.

For more info, please see https://issues.apache.org/jira/browse/HIVE-964.

Field Summary
`int`	`currBigKeyTag`
`protected static org.apache.commons.logging.Log`	`LOG`

Constructor Summary
`SkewJoinHandler(CommonJoinOperator<? extends OperatorDesc> joinOp)`

Method Summary
`void`	`close(boolean abort)`
`void`	`handleSkew(int tag)`
`void`	`initiliaze(org.apache.hadoop.conf.Configuration hconf)`
`void`	`setSkewJoinJobCounter(org.apache.hadoop.io.LongWritable skewjoinFollowupJobs)`
`void`	`updateSkewJoinJobCounter(int tag)`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

LOG

protected static final org.apache.commons.logging.Log LOG

currBigKeyTag

public int currBigKeyTag

Constructor Detail

SkewJoinHandler

public SkewJoinHandler(CommonJoinOperator<? extends OperatorDesc> joinOp)

Method Detail

initiliaze

public void initiliaze(org.apache.hadoop.conf.Configuration hconf)

handleSkew

public void handleSkew(int tag)
                throws HiveException

Throws:: HiveException

close

public void close(boolean abort)
           throws HiveException

Throws:: HiveException

setSkewJoinJobCounter

public void setSkewJoinJobCounter(org.apache.hadoop.io.LongWritable skewjoinFollowupJobs)

updateSkewJoinJobCounter

public void updateSkewJoinJobCounter(int tag)