Class ArrowReaderProperties
- Namespace
- ParquetSharp.Arrow
- Assembly
- ParquetSharp.dll
Configures Arrow specific options for reading Parquet files.
public sealed class ArrowReaderProperties : IDisposable
- Inheritance
-
ArrowReaderProperties
- Implements
- Inherited Members
Properties
BatchSize
The maximum number of rows to read into a chunk or record batch. Batches may contain fewer rows when there are no more rows in the file.
public long BatchSize { get; set; }
Property Value
CoerceInt96TimestampUnit
The timestamp unit to use for deprecated INT96-encoded timestamps (default is nanoseconds).
public TimeUnit CoerceInt96TimestampUnit { get; set; }
Property Value
- TimeUnit
PreBuffer
When enabled, the Arrow reader will pre-buffer necessary regions of the file in-memory. This is intended to improve performance on high-latency filesystems (e.g. Amazon S3).
public bool PreBuffer { get; set; }
Property Value
UseThreads
Whether to use the IO thread pool to parse columns in parallel.
public bool UseThreads { get; set; }
Property Value
Methods
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
public void Dispose()
GetDefault()
Create a new ArrowReaderProperties with default values.
public static ArrowReaderProperties GetDefault()
Returns
GetReadDictionary(int)
Get whether to read a particular column as dictionary encoded.
public bool GetReadDictionary(int columnIndex)
Parameters
columnIndex
intThe index of the column
Returns
- bool
Whether this column will be read as dictionary encoded
SetReadDictionary(int, bool)
Set whether to read a particular column as dictionary encoded. This is only supported for columns with a Parquet physical type of BYTE_ARRAY, such as string or binary types.
public void SetReadDictionary(int columnIndex, bool readDictionary)