Package org.apache.uima.cas.impl
Class BinaryCasSerDes4
java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes4
User callable serialization and deserialization of the CAS in a compressed Binary Format
This serializes/deserializes the state of the CAS, assuming that the type
information remains constant.
Header specifies to reader the format, and the compression level.
How to Serialize:
1) create an instance of this class, specifying some options that don't change very much
2) call serialize(CAS) to serialize the cas *
You can reuse the instance for a different CAS (as long as the type system is the same);
this will save setup time.
This class lazily constructs customized TypeInfo instances for each type encountered in serializing.
These are preserved across multiple serialization calls, so their setup / initialization is only
needed the first time.
The form of the binary CAS is inserted at the beginning so that receivers can do the
proper deserialization.
Binary format requires that the exact same type system be used when deserializing
How to Deserialize:
1) get an appropriate CAS to deserialize into. For delta CAS, it does not have to be empty.
2) call CASImpl: cas.reinit(inputStream) This is the existing method
for binary deserialization, and it now handles this compressed version, too.
Delta cas is also supported.
Compression/Decompression
Works in two stages:
application of Zip/Unzip to particular sub-collections of CAS data,
grouped according to similar data distribution
collection of like kinds of data (to make the zipping more effective)
There can be up to ~20 of these collections, such as
control info, float-exponents, string chars
Deserialization:
Read all bytes,
create separate ByteArrayInputStreams for each segment, sharing byte bfr
create appropriate unzip data input streams for these
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclass
static enum
static enum
Compression alternativesstatic enum
private static class
Manage the conversion of Items (FSrefs or String offsets) to relative index number Map from int to int Fs: key = index into heap, value = fs index <<< a search key = fs index, value = index into heap <<< just an array ref StrOffset: key = string offset, value = str index <<< a search key = str index, value = string offset (index into strings) <<< just an array ref take advantage: both keys / indexes monotonically increasing most refs nearby spacing fairly uniform Do modified binary search - - estimate first probe: avg of % & current loc Lifecycle: 1) create an instance 2) fill 3) finish 4) do gets gcprivate class
Class instantiated once per deserialization Multiple deserializations in parallel supported, with multiple instances of thisprivate class
Class instantiated once per serialization Multiple serializations in parallel supported, with multiple instances of thisstatic enum
Define all the slot kinds.private static class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
private static final int
static final boolean
static final boolean
private static final int
private static final long
private final boolean
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
static final boolean
static final boolean
private static final int
static final boolean
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private static final int
private final TypeSystemImpl
static final int
private static final int
private final BinaryCasSerDes4.TypeInfo[]
Things set up for one instance of this class, and reuse-able -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
deserialize
(CASImpl cas, InputStream deserIn, boolean isDelta) private BinaryCasSerDes4.TypeInfo
getTypeInfo
(int typeCode) private int
incrToNextFs
(int[] heap, int iHeap, BinaryCasSerDes4.TypeInfo typeInfo) methods common to serialization / deserialization etc.private void
initFsStartIndexes
(BinaryCasSerDes4.ComprItemRefs fsStartIndexes, int[] heap, int heapStart, int heapEnd, int[] histo) private void
initTypeInfoArray
(int typeCode) private static DataOutputStream
printCasInfo
(CASImpl cas) private void
serialize
(AbstractCas cas, Object out) serialize
(AbstractCas cas, Object out, Marker trackingMark) serialize
(AbstractCas cas, Object out, Marker trackingMark, BinaryCasSerDes4.CompressLevel compressLevel) serialize
(AbstractCas cas, Object out, Marker trackingMark, BinaryCasSerDes4.CompressLevel compressLevel, BinaryCasSerDes4.CompressStrat compressStrategy)
-
Field Details
-
TYPECODE_COMPR
public static final int TYPECODE_COMPR- See Also:
-
CHANGE_FS_REFS_TO_SEQUENTIAL
public static final boolean CHANGE_FS_REFS_TO_SEQUENTIAL- See Also:
-
IS_DIFF_ENCODE
public static final boolean IS_DIFF_ENCODE- See Also:
-
CAN_BE_NEGATIVE
public static final boolean CAN_BE_NEGATIVE- See Also:
-
IGNORED
public static final boolean IGNORED- See Also:
-
IN_MAIN_HEAP
public static final boolean IN_MAIN_HEAP- See Also:
-
DBL_1
private static final long DBL_1 -
typeInfoArray
Things set up for one instance of this class, and reuse-able -
ts
-
doMeasurements
private final boolean doMeasurements -
arrayLength_i
private static final int arrayLength_i -
heapRef_i
private static final int heapRef_i -
int_i
private static final int int_i -
byte_i
private static final int byte_i -
short_i
private static final int short_i -
typeCode_i
private static final int typeCode_i -
strOffset_i
private static final int strOffset_i -
strLength_i
private static final int strLength_i -
long_High_i
private static final int long_High_i -
long_Low_i
private static final int long_Low_i -
float_Mantissa_Sign_i
private static final int float_Mantissa_Sign_i -
float_Exponent_i
private static final int float_Exponent_i -
double_Mantissa_Sign_i
private static final int double_Mantissa_Sign_i -
double_Exponent_i
private static final int double_Exponent_i -
fsIndexes_i
private static final int fsIndexes_i -
strChars_i
private static final int strChars_i -
control_i
private static final int control_i -
strSeg_i
private static final int strSeg_i
-
-
Constructor Details
-
BinaryCasSerDes4
- Parameters:
ts
- the type systemdoMeasurements
- - normally set this to false.
-
-
Method Details
-
serialize
public SerializationMeasures serialize(AbstractCas cas, Object out, Marker trackingMark, BinaryCasSerDes4.CompressLevel compressLevel, BinaryCasSerDes4.CompressStrat compressStrategy) throws IOException - Parameters:
cas
- CAS to serializeout
- output objecttrackingMark
- tracking mark (for delta serialization)compressLevel
- -compressStrategy
- -- Returns:
- null or serialization measurements (depending on setting of doMeasurements)
- Throws:
IOException
- if the marker is invalid
-
serialize
public SerializationMeasures serialize(AbstractCas cas, Object out, Marker trackingMark, BinaryCasSerDes4.CompressLevel compressLevel) throws IOException - Throws:
IOException
-
serialize
public SerializationMeasures serialize(AbstractCas cas, Object out, Marker trackingMark) throws IOException - Throws:
IOException
-
serialize
- Throws:
IOException
-
deserialize
- Throws:
IOException
-
incrToNextFs
methods common to serialization / deserialization etc. -
initFsStartIndexes
private void initFsStartIndexes(BinaryCasSerDes4.ComprItemRefs fsStartIndexes, int[] heap, int heapStart, int heapEnd, int[] histo) -
resetIprevious
private void resetIprevious() -
getCasCompare
-
makeDataOutputStream
- Parameters:
f
- can be a DataOutputStream, an OutputStream a File- Returns:
- a data output stream
- Throws:
FileNotFoundException
- passthru
-
printCasInfo
-
getTypeInfo
-
initTypeInfoArray
private void initTypeInfoArray(int typeCode)
-