Table of Contents

Class ArrowReaderProperties

Namespace
ParquetSharp.Arrow
Assembly
ParquetSharp.dll

Configures Arrow specific options for reading Parquet files.

public sealed class ArrowReaderProperties : IDisposable
Inheritance
ArrowReaderProperties
Implements
Inherited Members

Properties

ArrowExtensionEnabled

Whether to enable Parquet-supported Arrow extension types. Default is false.

public bool ArrowExtensionEnabled { get; set; }

Property Value

bool

BatchSize

The maximum number of rows to read into a chunk or record batch. Batches may contain fewer rows when there are no more rows in the file.

public long BatchSize { get; set; }

Property Value

long

BinaryType

The Arrow binary type to read BYTE_ARRAY columns as.

Allowed values are ArrowTypeId.Binary, ArrowTypeId.LargeBinary and ArrowTypeId.BinaryView. Default is ArrowTypeId.Binary.

If a BYTE_ARRAY column has the STRING logical type, it is read as the Arrow string type corresponding to the configured binary type (for example Type::LARGE_STRING if the configured binary type is Type::LARGE_BINARY).

However, if a serialized Arrow schema is found in the Parquet metadata, this setting is ignored and the Arrow schema takes precedence

public ArrowTypeId BinaryType { get; set; }

Property Value

ArrowTypeId

CoerceInt96TimestampUnit

The timestamp unit to use for deprecated INT96-encoded timestamps (default is nanoseconds).

public TimeUnit CoerceInt96TimestampUnit { get; set; }

Property Value

TimeUnit

ListType

The Arrow list type to read Parquet list columns as.

Allowed values are ArrowTypeId.List, ArrowTypeId.LargeList and ArrowTypeId.ListView. Default is ArrowTypeId.List.

If a serialized Arrow schema is found in the Parquet metadata, this setting is ignored and the Arrow schema takes precedence

public ArrowTypeId ListType { get; set; }

Property Value

ArrowTypeId

PreBuffer

When enabled, the Arrow reader will pre-buffer necessary regions of the file in-memory. This is intended to improve performance on high-latency filesystems (e.g. Amazon S3). This is enabled by default.

public bool PreBuffer { get; set; }

Property Value

bool

UseThreads

Whether to use the IO thread pool to parse columns in parallel.

public bool UseThreads { get; set; }

Property Value

bool

Methods

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

public void Dispose()

GetDefault()

Create a new ArrowReaderProperties with default values.

public static ArrowReaderProperties GetDefault()

Returns

ArrowReaderProperties

GetReadDictionary(int)

Get whether to read a particular column as dictionary encoded.

public bool GetReadDictionary(int columnIndex)

Parameters

columnIndex int

The index of the column

Returns

bool

Whether this column will be read as dictionary encoded

SetReadDictionary(int, bool)

Set whether to read a particular column as dictionary encoded. This is only supported for columns with a Parquet physical type of BYTE_ARRAY, such as string or binary types.

public void SetReadDictionary(int columnIndex, bool readDictionary)

Parameters

columnIndex int

The index of the column

readDictionary bool

Whether to read this column as dictionary encoded