Class ArrowReaderProperties
- Namespace
- ParquetSharp.Arrow
- Assembly
- ParquetSharp.dll
Configures Arrow specific options for reading Parquet files.
public sealed class ArrowReaderProperties : IDisposable
- Inheritance
-
ArrowReaderProperties
- Implements
- Inherited Members
Properties
ArrowExtensionEnabled
Whether to enable Parquet-supported Arrow extension types. Default is false.
public bool ArrowExtensionEnabled { get; set; }
Property Value
BatchSize
The maximum number of rows to read into a chunk or record batch. Batches may contain fewer rows when there are no more rows in the file.
public long BatchSize { get; set; }
Property Value
BinaryType
The Arrow binary type to read BYTE_ARRAY columns as.
Allowed values are ArrowTypeId.Binary, ArrowTypeId.LargeBinary and ArrowTypeId.BinaryView. Default is ArrowTypeId.Binary.
If a BYTE_ARRAY column has the STRING logical type, it is read as the Arrow string type corresponding to the configured binary type (for example Type::LARGE_STRING if the configured binary type is Type::LARGE_BINARY).
However, if a serialized Arrow schema is found in the Parquet metadata, this setting is ignored and the Arrow schema takes precedence
public ArrowTypeId BinaryType { get; set; }
Property Value
- ArrowTypeId
CoerceInt96TimestampUnit
The timestamp unit to use for deprecated INT96-encoded timestamps (default is nanoseconds).
public TimeUnit CoerceInt96TimestampUnit { get; set; }
Property Value
- TimeUnit
ListType
The Arrow list type to read Parquet list columns as.
Allowed values are ArrowTypeId.List, ArrowTypeId.LargeList and ArrowTypeId.ListView. Default is ArrowTypeId.List.
If a serialized Arrow schema is found in the Parquet metadata, this setting is ignored and the Arrow schema takes precedence
public ArrowTypeId ListType { get; set; }
Property Value
- ArrowTypeId
PreBuffer
When enabled, the Arrow reader will pre-buffer necessary regions of the file in-memory. This is intended to improve performance on high-latency filesystems (e.g. Amazon S3). This is enabled by default.
public bool PreBuffer { get; set; }
Property Value
UseThreads
Whether to use the IO thread pool to parse columns in parallel.
public bool UseThreads { get; set; }
Property Value
Methods
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
public void Dispose()
GetDefault()
Create a new ArrowReaderProperties with default values.
public static ArrowReaderProperties GetDefault()
Returns
GetReadDictionary(int)
Get whether to read a particular column as dictionary encoded.
public bool GetReadDictionary(int columnIndex)
Parameters
columnIndexintThe index of the column
Returns
- bool
Whether this column will be read as dictionary encoded
SetReadDictionary(int, bool)
Set whether to read a particular column as dictionary encoded. This is only supported for columns with a Parquet physical type of BYTE_ARRAY, such as string or binary types.
public void SetReadDictionary(int columnIndex, bool readDictionary)