An alternative approach to reading JSON in Pentaho’s Data Integration

Reading JSON via JavaScript in Java is slow

Probably no surprise there, right? But if you’ve ever worked with the JSONInput step in Kettle, you know that it’s anything but a time saver.
After a lot of research and extensive benchmarking, we were able to identify Kettle’s JSONInput step as our largest bottleneck. Plagued with performance troubles, the JSONInput step spans JIRA Issues dating back as far as 2012 (PDI-8809: Investigate JSON parsing performance improvement), as well as multiple other requests for a streaming solution (PDI-9785), the ability to handle large datasets (PDI-10858), and rewriting the step to use a native library (PDI-10344).

Reading JSON via Java in Java is faster
I just wanted to point that out one last time. But seriously, that is how much faster a native Java library can make things. We couldn’t be happier to see the performance improvements in this FastJSON implementation. In decreasing both the runtime and memory consumption necessary to parse JSON and process it through Kettle, we have ensured that our ETL processes will stay performant and reliable while keeping our Product Managers development time low and (relatively) pain free.