Table of Contents

Class FileWriter

Namespace
ParquetSharp.Arrow
Assembly
ParquetSharp.dll

Writes Parquet files using Arrow format data

This may be used to write whole tables or record batches at once, using the WriteTable or WriteRecordBatch methods.

You can also buffer writes of record batches to allow writing multiple record batches within a Parquet row group, using WriteBufferedRecordBatch and NewBufferedRowGroup to start a new row group.

For more control over writing, you can create a new row group with NewRowGroup, then write all columns for the row group with the WriteColumn method. All required columns must be written before starting the next row group or closing the file.

public class FileWriter : IDisposable
Inheritance
FileWriter
Implements
Inherited Members

Constructors

FileWriter(OutputStream, Schema, WriterProperties?, ArrowWriterProperties?)

Create a new Arrow FileWriter that writes to the specified output stream

public FileWriter(OutputStream outputStream, Schema schema, WriterProperties? properties = null, ArrowWriterProperties? arrowProperties = null)

Parameters

outputStream OutputStream

Stream to write to

schema Schema

Arrow schema for the data to be written

properties WriterProperties

Parquet writer properties

arrowProperties ArrowWriterProperties

Arrow specific writer properties

FileWriter(Stream, Schema, WriterProperties?, ArrowWriterProperties?, bool)

Create a new Arrow FileWriter that writes to a .NET stream

public FileWriter(Stream stream, Schema schema, WriterProperties? properties = null, ArrowWriterProperties? arrowProperties = null, bool leaveOpen = false)

Parameters

stream Stream

Stream to write to

schema Schema

Arrow schema for the data to be written

properties WriterProperties

Parquet writer properties

arrowProperties ArrowWriterProperties

Arrow specific writer properties

leaveOpen bool

Whether to keep the stream open after closing the writer

FileWriter(string, Schema, WriterProperties?, ArrowWriterProperties?)

Create a new Arrow FileWriter that writes to the specified path

public FileWriter(string path, Schema schema, WriterProperties? properties = null, ArrowWriterProperties? arrowProperties = null)

Parameters

path string

Path to the Parquet file to write

schema Schema

Arrow schema for the data to be written

properties WriterProperties

Parquet writer properties

arrowProperties ArrowWriterProperties

Arrow specific writer properties

Properties

Schema

The Arrow schema of the file being written

public Schema Schema { get; }

Property Value

Schema

Methods

Close()

Close the file writer, writing the Parquet footer. This is the recommended way of closing Parquet files, rather than relying on the Dispose() method, as the latter will gobble exceptions.

public void Close()

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

public void Dispose()

NewBufferedRowGroup()

Flush buffered data and start a new row group. This can be used to force creation of a new row group when writing data with WriteBufferedRecordBatch.

public void NewBufferedRowGroup()

NewRowGroup(long)

Start writing a new row group to the file. After calling this method, each column required in the schema must be written using WriteColumn before creating a new row group or closing the file.

public void NewRowGroup(long chunkSize)

Parameters

chunkSize long

The number of rows to be written in this row group

WriteBufferedRecordBatch(RecordBatch)

Write a record batch to Parquet in buffered mode, allowing multiple record batches to be written to the same row group.

New row groups are started if the data reaches the MaxRowGroupLength configured in the WriterProperties.

public void WriteBufferedRecordBatch(RecordBatch recordBatch)

Parameters

recordBatch RecordBatch

The record batch to write

WriteColumnChunk(ChunkedArray)

Write a column of data to a row group using an Arrow ChunkedArray

public void WriteColumnChunk(ChunkedArray array)

Parameters

array ChunkedArray

The array of data for the column

WriteColumnChunk(IArrowArray)

Write a column of data to a row group using an Arrow Array

public void WriteColumnChunk(IArrowArray array)

Parameters

array IArrowArray

The array of data for the column

WriteRecordBatch(RecordBatch, long)

Write a record batch to Parquet

A new row group will be started, and the record batch data will be chunked into row groups that respect the maximum chunk size specified if required.

public void WriteRecordBatch(RecordBatch recordBatch, long chunkSize = 1048576)

Parameters

recordBatch RecordBatch

The record batch to write

chunkSize long

The maximum length of row groups to write

WriteTable(Table, long)

Write an Arrow table to Parquet

A new row group will be started, and the table data will be chunked into row groups that respect the maximum chunk size specified if required. This method requires that the columns in the table use equal chunking.

public void WriteTable(Table table, long chunkSize = 1048576)

Parameters

table Table

The table to write

chunkSize long

The maximum length of row groups to write