Skip to content

Commit

Permalink
feat(readwriter): Implement File Format Reader/Writer (#72)
Browse files Browse the repository at this point in the history
* feat: Implement CSV Options Configuration for DataFrameReader (#53)

- Added CsvOptions struct to support CSV read options like `header`, `delimiter`, and `nullValue`.
- Implemented ConfigOpts trait for CsvOptions to convert options into key-value pairs.
- Updated DataFrameReader to include `csv` method that accepts CsvOptions.

* feat: Implement CSV Options Configuration for DataFrameReader (#54)

- Added documentation for the CsvOptions struct.

* test(readwriter): Implement test_dataframe_read_csv_with_options (#54)

* refactor: Improve CSV method to handle multiple paths (#54)

    - Updated the csv method in DataFrameReader to support both single string slices and arrays of string slices as input paths.

* feat: Added implementations for JSON Options struct (#54)

* feat: Implement JSON Options Configuration for DataFrameReader (#54)

- Added JsonOptions struct to support JSON read options like `schema`, `multi_line`, `encoding`, and more.
- Implemented ConfigOpts trait for JsonOptions to convert options into key-value pairs.
- Updated DataFrameReader to include `json` method that accepts JsonOptions.
- Documented all available JSON options, including example usage for setting options when reading JSON files. [TO DO]
- Write tests to validate JSON options functionality.

* feat: Implement ORC Options Configuration for DataFrameReader (#54)

- Example usage provided for setting ORC options when reading files.
- Write tests to validate ORC options functionality.

* feat: Implement Parquet Options Configuration for DataFrameReader (#54)

- Added ParquetOptions struct to support Parquet read options like `mergeSchema`, `pathGlobFilter`, and `recursiveFileLookup`.
- Implemented ConfigOpts trait for ParquetOptions to convert options into key-value pairs.
- Updated DataFrameReader to include `parquet` method that accepts ParquetOptions.
- Example usage provided for setting Parquet options when reading files.
- Write tests to validate Parquet options functionality.

* feat: Implement Text Options Configuration for DataFrameReader (#54)

- Added TextOptions struct to support text read options like `wholetext`, `lineSep`, and `pathGlobFilter`.
- Implemented ConfigOpts trait for TextOptions to convert options into key-value pairs.
- Updated DataFrameReader to include `text` method that accepts TextOptions.
- Example usage provided for setting text options when reading files.
- Write tests to validate text options functionality.

* feat: Implement Text and Parquet Options Configuration for DataFrameWriter (#54)

- Added TextOptions struct to support text write options such as `whole_text` and `line_sep`.
- Added ParquetOptions struct to support Parquet write options like `merge_schema`, `path_glob_filter`, and `datetime_rebase_mode`.
- Implemented `write` method in DataFrameWriter to handle configuration for text and Parquet file formats.
- Example usage provided for setting text and Parquet options when writing DataFrames.
- Write tests to validate text and Parquet file writing functionality.

* Added rustdocs to method implementations.

* feat: Implement initial methods for file format reader and writer (#54)

- Added support for reading and writing .csv, .json, .orc, .parquet, and .text file formats.
- Created `ConfigOpts` trait for each file type to manage options in a structured way.
- Added example method signatures for file reading using a configurable options object passed into methods.

* Add missing csv options to CsvOptions.

* feat: Implement Configuration Options for DataFrameReader and Writer (#54)

    - Implemented additional fields in ParquetOptions compression.
    - Updated test_dataframe_read_parquet_with_options to ensure valid compression codec usage.
    - Enhanced test_dataframe_read_text_with_options to properly read lines by setting line_sep and disabling whole_text.
    - Implemented the #[derive(Debug, Clone)] traits for all Option structs.
    - Updated expected path_glob_filter type to string.
    - Added the compression field to ParquetOptions, OrcOptions, and JsonOptions.
    - Updated documentation for all Options structs to include descriptions for new and existing fields.

* feat: Refactor file format options with shared CommonFileOptions (#54)

    - Introduced CommonFileOptions to handle common configuration fields such as:
    - path_glob_filter
    - recursive_file_lookup
    - ignore_corrupt_files
    - ignore_missing_files
    - modified_before
    - modified_after

    - Updated CsvOptions, JsonOptions, OrcOptions, ParquetOptions, and TextOptions
    to use CommonFileOptions for the shared fields.

    - Updated the new() constructors for each file format options struct to initialize
    CommonFileOptions.

    - Refactored tests for each file format (e.g., ORC, CSV) to utilize the new
    CommonFileOptions, ensuring that both format-specific and shared options
    are properly tested.

    - Updated and verified tests for DataFrame reading and writing operations with updated options.

* Updated rustdocs.

* Updated typo in rustdocs:  /// -  - Common file options...

* Updated README - DataFrameReader/Writer section.

---------

Co-authored-by: lexara-prime-ai <irfanghta@gmail.com>
  • Loading branch information
lexara-prime-ai and lexara-prime-ai authored Oct 11, 2024
1 parent 2b138c6 commit 84f170a
Show file tree
Hide file tree
Showing 2 changed files with 1,588 additions and 11 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,17 +163,17 @@ The following section outlines some of the larger functionality that are not yet

|DataFrameReader |API |Comment |
|------------------|----------|---------------------------------------|
|csv |![open] | |
|csv |![done] | |
|format |![done] | |
|json |![open] | |
|json |![done] | |
|load |![done] | |
|option |![done] | |
|options |![done] | |
|orc |![open] | |
|parquet |![open] | |
|orc |![done] | |
|parquet |![done] | |
|schema |![done] | |
|table |![done] | |
|text |![open] | |
|text |![done] | |

### DataStreamWriter

Expand Down Expand Up @@ -373,21 +373,21 @@ required jars
|DataFrameWriter |API |Comment |
|------------------|----------|---------------------------------------|
|bucketBy |![done] | |
|csv | | |
|csv |![done] | |
|format |![done] | |
|insertInto |![done] | |
|jdbc | | |
|json | | |
|json |![done] | |
|mode |![done] | |
|option |![done] | |
|options |![done] | |
|orc | | |
|parquet | | |
|orc |![done] | |
|parquet |![done] | |
|partitionBy | | |
|save |![done] | |
|saveAsTable |![done] | |
|sortBy |![done] | |
|text | | |
|text |![done] | |

### DataFrameWriterV2

Expand Down
Loading

0 comments on commit 84f170a

Please sign in to comment.