org.apache.hadoop.hive.ql.io
Class RCFile

java.lang.Object
  extended by org.apache.hadoop.hive.ql.io.RCFile

public class RCFile
extends Object

RCFiles, short of Record Columnar File, are flat files consisting of binary key/value pairs, which shares much similarity with SequenceFile. RCFile stores columns of a table in a record columnar way. It first partitions rows horizontally into row splits. and then it vertically partitions each row split in a columnar way. RCFile first stores the meta data of a row split, as the key part of a record, and all the data of a row split as the value part. When writing, RCFile.Writer first holds records' value bytes in memory, and determines a row split if the raw bytes size of buffered records overflow a given parameterWriter.columnsBufferSize, which can be set like: conf.setInt(COLUMNS_BUFFER_SIZE_CONF_STR, 4 * 1024 * 1024) .

RCFile provides RCFile.Writer, RCFile.Reader and classes for writing, reading respectively.

RCFile stores columns of a table in a record columnar way. It first partitions rows horizontally into row splits. and then it vertically partitions each row split in a columnar way. RCFile first stores the meta data of a row split, as the key part of a record, and all the data of a row split as the value part.

RCFile compresses values in a more fine-grained manner then record level compression. However, It currently does not support compress the key part yet. The actual compression algorithm used to compress key and/or values can be specified by using the appropriate CompressionCodec.

The RCFile.Reader is used to read and explain the bytes of RCFile.

RCFile Formats

RCFile Format


Nested Class Summary
static class RCFile.KeyBuffer
          KeyBuffer is the key of each record in RCFile.
static class RCFile.Reader
          Read KeyBuffer/ValueBuffer pairs from a RCFile.
static class RCFile.ValueBuffer
          ValueBuffer is the value of each record in RCFile.
static class RCFile.Writer
          Write KeyBuffer/ValueBuffer pairs to a RCFile.
 
Field Summary
static String BLOCK_MISSING_MESSAGE
           
static String COLUMN_NUMBER_CONF_STR
           
static String COLUMN_NUMBER_METADATA_STR
           
static String RECORD_INTERVAL_CONF_STR
           
static int SYNC_INTERVAL
          The number of bytes between sync points.
static String TOLERATE_CORRUPTIONS_CONF_STR
           
 
Constructor Summary
RCFile()
           
 
Method Summary
static org.apache.hadoop.io.SequenceFile.Metadata createMetadata(org.apache.hadoop.io.Text... values)
          Create a metadata object with alternating key-value pairs.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

RECORD_INTERVAL_CONF_STR

public static final String RECORD_INTERVAL_CONF_STR
See Also:
Constant Field Values

COLUMN_NUMBER_METADATA_STR

public static final String COLUMN_NUMBER_METADATA_STR
See Also:
Constant Field Values

COLUMN_NUMBER_CONF_STR

public static final String COLUMN_NUMBER_CONF_STR
See Also:
Constant Field Values

TOLERATE_CORRUPTIONS_CONF_STR

public static final String TOLERATE_CORRUPTIONS_CONF_STR
See Also:
Constant Field Values

BLOCK_MISSING_MESSAGE

public static final String BLOCK_MISSING_MESSAGE
See Also:
Constant Field Values

SYNC_INTERVAL

public static final int SYNC_INTERVAL
The number of bytes between sync points.

See Also:
Constant Field Values
Constructor Detail

RCFile

public RCFile()
Method Detail

createMetadata

public static org.apache.hadoop.io.SequenceFile.Metadata createMetadata(org.apache.hadoop.io.Text... values)
Create a metadata object with alternating key-value pairs. Eg. metadata(key1, value1, key2, value2)



Copyright © 2011 The Apache Software Foundation