Querying Apache Parquet Data Files (AWS)

Justin.Finley · April 28, 2021, 3:19pm

Did a quick search and didn’t see anything pop up, so wanted to see if anyone else has tried or been able to query parquet formatted data in Ignition. I am looking at querying historical measurement data that has been converted from AWS Sitewise to S3 as a set of .parquet files.

I am able to pull the file down to my PC via Ignition http request, and can manually run an SQL query via Apache Drill. But have not figured out if I can query within Ignition, or if I would need to convert to CSV or another format.

Appreciate any suggestions, thanks.
Justin

Justin.Finley · April 28, 2021, 5:35pm

I should add that I have also had some level of success reading the file directly in a Python script, after installing Apache Spark. So I’m also trying to see how doable it would be to import Spark and its dependencies to Ignition - open to feedback if there may be a more efficient way to do it.
Thanks

paul-griffith · April 28, 2021, 5:42pm

You will probably have the best results, and certainly the best performance, by implementing a custom module that loads the first-party Java library for Parquet. You might be able to get by with the hacky workaround mentioned on the forums (dropping a built third party .jar in the right filesystem location so the gateway automatically recognizes it) but a module would be recommended.

Justin.Finley · April 28, 2021, 5:55pm

Gotcha, thanks, I’ll check that out.