@Private
public interface ShufflePartitionWriter
This writer stores bytes for one (mapper, reducer) pair, corresponding to one shuffle block.
Modifier and Type | Method and Description |
---|---|
long |
getNumBytesWritten()
Returns the number of bytes written either by this writer's output stream opened by
openStream() or the byte channel opened by openChannelWrapper() . |
default java.util.Optional<WritableByteChannelWrapper> |
openChannelWrapper()
Opens and returns a
WritableByteChannelWrapper for transferring bytes from
input byte channels to the underlying shuffle data store. |
java.io.OutputStream |
openStream()
Open and return an
OutputStream that can write bytes to the underlying
data store. |
java.io.OutputStream openStream() throws java.io.IOException
OutputStream
that can write bytes to the underlying
data store.
This method will only be called once on this partition writer in the map task, to write the bytes to the partition. The output stream will only be used to write the bytes for this partition. The map task closes this output stream upon writing all the bytes for this block, or if the write fails for any reason.
Implementations that intend on combining the bytes for all the partitions written by this
map task should reuse the same OutputStream instance across all the partition writers provided
by the parent ShuffleMapOutputWriter
. If one does so, ensure that
OutputStream.close()
does not close the resource, since it will be reused across
partition writes. The underlying resources should be cleaned up in
ShuffleMapOutputWriter.commitAllPartitions(long[])
and
ShuffleMapOutputWriter.abort(Throwable)
.
java.io.IOException
default java.util.Optional<WritableByteChannelWrapper> openChannelWrapper() throws java.io.IOException
WritableByteChannelWrapper
for transferring bytes from
input byte channels to the underlying shuffle data store.
This method will only be called once on this partition writer in the map task, to write the bytes to the partition. The channel will only be used to write the bytes for this partition. The map task closes this channel upon writing all the bytes for this block, or if the write fails for any reason.
Implementations that intend on combining the bytes for all the partitions written by this
map task should reuse the same channel instance across all the partition writers provided
by the parent ShuffleMapOutputWriter
. If one does so, ensure that
Closeable.close()
does not close the resource, since the channel
will be reused across partition writes. The underlying resources should be cleaned up in
ShuffleMapOutputWriter.commitAllPartitions(long[])
and
ShuffleMapOutputWriter.abort(Throwable)
.
This method is primarily for advanced optimizations where bytes can be copied from the input
spill files to the output channel without copying data into memory. If such optimizations are
not supported, the implementation should return Optional.empty()
. By default, the
implementation returns Optional.empty()
.
Note that the returned WritableByteChannelWrapper
itself is closed, but not the
underlying channel that is returned by WritableByteChannelWrapper.channel()
. Ensure
that the underlying channel is cleaned up in Closeable.close()
,
ShuffleMapOutputWriter.commitAllPartitions(long[])
, or
ShuffleMapOutputWriter.abort(Throwable)
.
java.io.IOException
long getNumBytesWritten()
openStream()
or the byte channel opened by openChannelWrapper()
.
This can be different from the number of bytes given by the caller. For example, the stream might compress or encrypt the bytes before persisting the data to the backing data store.