About 8,410,000 results
Open links in new tab
  1. What are the pros and cons of the Apache Parquet format compared …

    Apr 24, 2016 · Parquet has gained significant traction outside of the Hadoop ecosystem. For example, the Delta Lake project is being built on Parquet files. Arrow is an important project that makes it easy …

  2. Reading / Fixing a corrupt parquet file - Stack Overflow

    Sep 3, 2024 · Thank you. I got pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file. when I tried to construct a ParquetFile instance. I …

  3. Inspect Parquet from command line - Stack Overflow

    How do I inspect the content of a Parquet file from the command line? The only option I see now is $ hadoop fs -get my-path local-file $ parquet-tools head local-file | less I would like to avoid

  4. Querying multiple parquet files in a range using duckdb

    Apr 28, 2024 · I have parquet files arranged in this format /db/ {year}/table {date}.parquet In each year folder, there are up to 365 files. If I want to query data from a time range, say the week 2024-04-28 …

  5. How to read partitioned parquet files from S3 using pyarrow in python

    Jul 13, 2017 · By setting dataset=True awswrangler expects partitioned parquet files. It will read all the individual parquet files from your partitions below the s3 key you specify in the path.

  6. Spark parquet partitioning : Large number of files

    Jun 28, 2017 · Spark 2.2+ From Spark 2.2 on, you can also play with the new option maxRecordsPerFile to limit the number of records per file if you have too large files. You will still get at least N files if you …

  7. Methods for writing Parquet files using Python? - Stack Overflow

    Oct 5, 2015 · I'm having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction with it. Thus far …

  8. Reading partitioned multi-schema parquet files from S3 using Polars

    Jan 23, 2024 · Unfortunately scan_parquet doesn't have that option. The pl.concat method does allow for a vertical relaxed combination of frames so you can use that. There are two steps to this …

  9. Read multiple parquet files in a folder and write to single csv file ...

    Aug 5, 2018 · I need to read these parquet files starting from file1 in order and write it to a singe csv file. After writing contents of file1, file2 contents should be appended to same csv without header.

  10. How to read a Parquet file into Pandas DataFrame?

    How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of dat...