Would it be fair to say that C++ knowledge is mandatory to create language bindings for parquet?

There are lots of different questions in this thread :slight_smile: If you want to use the Parquet C++ library to write Parquet files, you must talk to it in a language it understands. Which means you probably need to give it Arrow data.

@antoine thanks for the pointer! I looked through that now, and here is how I interpreted how this would work. Am I on the right track/understanding with that? I’m still not sure whether I understand properly how this is all supposed to work :slight_smile:

We would in Julia allocate arrays and data structures that conform to the C Data interface layout. Right now I think that should be fairly straightforward. So then we have a pointer to one of these C Data interface structures.

Do I then call one of the functions from bridge.h and pass this pointer to the structures we allocated in Julia to it? And then I get something back from those functions that I can for example pass to the parquet C++ writer? And all of that wouldn’t require that a copy of the original array data structures has to be made?

That’s the idea… But the point of the C Data Interface is to be able to expose or ingest Arrow data without taking a dependency on the Arrow C++ library. If you already plan to take a dependency on the Arrow C++ library (for example because you want to use the C++ Parquet implementation), then I’m not sure taking a detour through the C Data Interface is useful. You can just as well construct a regular C++ Array instance around your data (unless you’re extremely uncomfortable with C++, but proficient in C, in which case the C Data Interface may help).

The example given in the spec may point you to the kind of scenarios where the C Data Interface is really useful. Say database engine FooDB wants to expose a C client API that gives out Arrow-compatible data, but without burdening itself with a dependency on Arrow C++ (because other client APIs are available). Then it can expose a C client API that basically gives out a C struct ArrowArray.

Ok I am figuring out things as I go along. Seems like the first step is to follow the step in Building Arrow C++

git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir release
cd release
cmake ..
make parquet

and this will build the parquet library used in here

Now I just need to come up with the table in

#include "parquet/arrow/writer.h"

   std::shared_ptr<arrow::io::FileOutputStream> outfile;

      parquet::arrow::WriteTable(table, arrow::default_memory_pool(), outfile, 3));

and turn that into a function so I can use CxxWrap.jl.

Now, it seems like a few things still need to be done.

  1. Write a Julia DataFrame into arrow blob structure (potentially leveraging Arrow.jl)
  2. Write the C++ function using CxxWrap.jl that calls the parquet write function (yet to be writer) to write the arrow blob into parquet file

These are notes for me in case I forget. Also for other to let me know if I am sort of on the right track.

Hey @evalparse i found your “Arrow C++ for the completely clueless” post here, it helped me to get started with reading and writing parquet files in C++ exploring further i wanted to encrypt the parquet files so i tried to look into the inbuilt encryption-reader-writer code in examples i am getting an error at cmake build which is in cpp/examples/parquet/

Error : – Building using CMake version: 3.10.2
– Configuring done
– Generating done
– Build files have been written to: /home/sandesh/apachearrow/arrow/cpp/examples/parquet/encryptlib
[ 12%] Linking CXX executable parquet-stream-api-example
/usr/bin/ld: cannot find -lparquet_static
collect2: error: ld returned 1 exit status
CMakeFiles/parquet-stream-api-example.dir/build.make:94: recipe for target ‘parquet-stream-api-example’ failed
make[2]: *** [parquet-stream-api-example] Error 1
CMakeFiles/Makefile2:67: recipe for target ‘CMakeFiles/parquet-stream-api-example.dir/all’ failed
make[1]: *** [CMakeFiles/parquet-stream-api-example.dir/all] Error 2
Makefile:83: recipe for target ‘all’ failed
make: *** [all] Error 2

And also when i try directly to compile the encryption code after executing it gives a runtime error of built without openssl so i think that is directly linked cmake.

Have you gone through this , if so your guidance would be appreciated.

No idea. Sorry. I ended up writing a parquet writer in pure Julia