Class FileWriter
- Namespace
- ParquetSharp.Arrow
- Assembly
- ParquetSharp.dll
Writes Parquet files using Arrow format data
This may be used to write whole tables or record batches at once, using the WriteTable or WriteRecordBatch methods.
You can also buffer writes of record batches to allow writing multiple record batches within a Parquet row group, using WriteBufferedRecordBatch and NewBufferedRowGroup to start a new row group.
For more control over writing, you can create a new row group with NewRowGroup, then write all columns for the row group with the WriteColumn method. All required columns must be written before starting the next row group or closing the file.
public class FileWriter : IDisposable
- Inheritance
-
FileWriter
- Implements
- Inherited Members
Constructors
FileWriter(OutputStream, Schema, WriterProperties?, ArrowWriterProperties?)
Create a new Arrow FileWriter that writes to the specified output stream
public FileWriter(OutputStream outputStream, Schema schema, WriterProperties? properties = null, ArrowWriterProperties? arrowProperties = null)
Parameters
outputStream
OutputStreamStream to write to
schema
SchemaArrow schema for the data to be written
properties
WriterPropertiesParquet writer properties
arrowProperties
ArrowWriterPropertiesArrow specific writer properties
FileWriter(Stream, Schema, WriterProperties?, ArrowWriterProperties?, bool)
Create a new Arrow FileWriter that writes to a .NET stream
public FileWriter(Stream stream, Schema schema, WriterProperties? properties = null, ArrowWriterProperties? arrowProperties = null, bool leaveOpen = false)
Parameters
stream
StreamStream to write to
schema
SchemaArrow schema for the data to be written
properties
WriterPropertiesParquet writer properties
arrowProperties
ArrowWriterPropertiesArrow specific writer properties
leaveOpen
boolWhether to keep the stream open after closing the writer
FileWriter(string, Schema, WriterProperties?, ArrowWriterProperties?)
Create a new Arrow FileWriter that writes to the specified path
public FileWriter(string path, Schema schema, WriterProperties? properties = null, ArrowWriterProperties? arrowProperties = null)
Parameters
path
stringPath to the Parquet file to write
schema
SchemaArrow schema for the data to be written
properties
WriterPropertiesParquet writer properties
arrowProperties
ArrowWriterPropertiesArrow specific writer properties
Properties
Schema
The Arrow schema of the file being written
public Schema Schema { get; }
Property Value
- Schema
Methods
Close()
Close the file writer, writing the Parquet footer. This is the recommended way of closing Parquet files, rather than relying on the Dispose() method, as the latter will gobble exceptions.
public void Close()
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
public void Dispose()
NewBufferedRowGroup()
Flush buffered data and start a new row group. This can be used to force creation of a new row group when writing data with WriteBufferedRecordBatch.
public void NewBufferedRowGroup()
NewRowGroup(long)
Start writing a new row group to the file. After calling this method, each column required in the schema must be written using WriteColumn before creating a new row group or closing the file.
public void NewRowGroup(long chunkSize)
Parameters
chunkSize
longThe number of rows to be written in this row group
WriteBufferedRecordBatch(RecordBatch)
Write a record batch to Parquet in buffered mode, allowing multiple record batches to be written to the same row group.
New row groups are started if the data reaches the MaxRowGroupLength configured in the WriterProperties.
public void WriteBufferedRecordBatch(RecordBatch recordBatch)
Parameters
recordBatch
RecordBatchThe record batch to write
WriteColumnChunk(ChunkedArray)
Write a column of data to a row group using an Arrow ChunkedArray
public void WriteColumnChunk(ChunkedArray array)
Parameters
array
ChunkedArrayThe array of data for the column
WriteColumnChunk(IArrowArray)
Write a column of data to a row group using an Arrow Array
public void WriteColumnChunk(IArrowArray array)
Parameters
array
IArrowArrayThe array of data for the column
WriteRecordBatch(RecordBatch, long)
Write a record batch to Parquet
A new row group will be started, and the record batch data will be chunked into row groups that respect the maximum chunk size specified if required.
public void WriteRecordBatch(RecordBatch recordBatch, long chunkSize = 1048576)
Parameters
recordBatch
RecordBatchThe record batch to write
chunkSize
longThe maximum length of row groups to write
WriteTable(Table, long)
Write an Arrow table to Parquet
A new row group will be started, and the table data will be chunked into row groups that respect the maximum chunk size specified if required. This method requires that the columns in the table use equal chunking.
public void WriteTable(Table table, long chunkSize = 1048576)
Parameters
table
TableThe table to write
chunkSize
longThe maximum length of row groups to write