Use JSON Input step to process uneven data

I’m trying to process the following with an JSON Input step:

<span class="pun">{</span><span class="str">"address"</span><span class="pun">:[</span>
  <span class="pun">{</span><span class="str">"AddressId"</span><span class="pun">:</span><span class="str">"1_1"</span><span class="pun">,</span><span class="str">"Street"</span><span class="pun">:</span><span class="str">"A Street"</span><span class="pun">},</span>
  <span class="pun">{</span><span class="str">"AddressId"</span><span class="pun">:</span><span class="str">"1_101"</span><span class="pun">,</span><span class="str">"Street"</span><span class="pun">:</span><span class="str">"Another Street"</span><span class="pun">},</span>
  <span class="pun">{</span><span class="str">"AddressId"</span><span class="pun">:</span><span class="str">"1_102"</span><span class="pun">,</span><span class="str">"Street"</span><span class="pun">:</span><span class="str">"One more street"</span><span class="pun">,</span> <span class="str">"Locality"</span><span class="pun">:</span><span class="str">"Buenos Aires"</span><span class="pun">},</span>
  <span class="pun">{</span><span class="str">"AddressId"</span><span class="pun">:</span><span class="str">"1_102"</span><span class="pun">,</span><span class="str">"Locality"</span><span class="pun">:</span><span class="str">"New York"</span><span class="pun">}</span>
<span class="pun">]}</span>

However this seems not to be possible:

<span class="typ">Json</span> <span class="typ">Input</span><span class="pun">.</span><span class="lit">0</span> <span class="pun">-</span><span class="pln"> ERROR </span><span class="pun">(</span><span class="pln">version </span><span class="lit">4.2</span><span class="pun">.</span><span class="lit">1</span><span class="pun">-</span><span class="pln">stable</span><span class="pun">,</span><span class="pln"> build </span><span class="lit">15952</span> <span class="kwd">from</span> <span class="lit">2011</span><span class="pun">-</span><span class="lit">10</span><span class="pun">-</span><span class="lit">25</span> <span class="lit">15.27</span><span class="pun">.</span><span class="lit">10</span> <span class="kwd">by</span><span class="pln"> buildguy</span><span class="pun">)</span> <span class="pun">:</span> 
<span class="typ">The</span><span class="pln"> data structure </span><span class="kwd">is</span> <span class="kwd">not</span><span class="pln"> the same inside the resource</span><span class="pun">!</span> 
<span class="typ">We</span><span class="pln"> found </span><span class="lit">1</span><span class="pln"> values </span><span class="kwd">for</span><span class="pln"> json path </span><span class="pun">[</span><span class="pln">$</span><span class="pun">..</span><span class="typ">Locality</span><span class="pun">],</span><span class="pln"> which </span><span class="kwd">is</span><span class="pln"> different that the number retourned </span><span class="kwd">for</span><span class="pln"> path </span><span class="pun">[</span><span class="pln">$</span><span class="pun">..</span><span class="typ">Street</span><span class="pun">]</span> <span class="pun">(</span><span class="lit">3509</span><span class="pln"> values</span><span class="pun">).</span> 
<span class="typ">We</span><span class="pln"> MUST have the same number of values </span><span class="kwd">for</span><span class="pln"> all paths</span><span class="pun">.</span>

The step provides Ignore Missing Path flag but it only works if all the rows misses the same path. In that case that step acts as as expected an fills the missing values with null.

This limits the power of this step to read uneven data, which was really one of my priorities.

My step Fields are defined as follows:

JSON Input Fields definition

Am I missing something? Is this the correct behavior?

 

What I have done is use JSON Input using $.address[*] to read to a jsonRow field the full map of each element p.e:

<span class="pun">{</span><span class="str">"address"</span><span class="pun">:[</span>
    <span class="pun">{</span><span class="str">"AddressId"</span><span class="pun">:</span><span class="str">"1_1"</span><span class="pun">,</span><span class="str">"Street"</span><span class="pun">:</span><span class="str">"A Street"</span><span class="pun">},</span>  
    <span class="pun">{</span><span class="str">"AddressId"</span><span class="pun">:</span><span class="str">"1_101"</span><span class="pun">,</span><span class="str">"Street"</span><span class="pun">:</span><span class="str">"Another Street"</span><span class="pun">},</span>  
    <span class="pun">{</span><span class="str">"AddressId"</span><span class="pun">:</span><span class="str">"1_102"</span><span class="pun">,</span><span class="str">"Street"</span><span class="pun">:</span><span class="str">"One more street"</span><span class="pun">,</span> <span class="str">"Locality"</span><span class="pun">:</span><span class="str">"Buenos Aires"</span><span class="pun">},</span>   
    <span class="pun">{</span><span class="str">"AddressId"</span><span class="pun">:</span><span class="str">"1_102"</span><span class="pun">,</span><span class="str">"Locality"</span><span class="pun">:</span><span class="str">"New York"</span><span class="pun">}</span> 
<span class="pun">]}</span>

This results in 4 jsonRows one for each element, p.e. jsonRow = {"AddressId":"1_101","Street":"Another Street"}. Then using a Javascript step I map my values using this:

<span class="kwd">var</span> <span class="typ">AddressId</span> <span class="pun">=</span><span class="pln"> getFromMap</span><span class="pun">(</span><span class="str">'AddressId'</span><span class="pun">,</span><span class="pln"> jsonRow</span><span class="pun">);</span>
<span class="kwd">var</span> <span class="typ">Street</span> <span class="pun">=</span><span class="pln"> getFromMap</span><span class="pun">(</span><span class="str">'Street'</span><span class="pun">,</span><span class="pln"> jsonRow</span><span class="pun">);</span>
<span class="kwd">var</span> <span class="typ">Locality</span> <span class="pun">=</span><span class="pln"> getFromMap</span><span class="pun">(</span><span class="str">'Locality'</span><span class="pun">,</span><span class="pln"> jsonRow</span><span class="pun">);</span>

In a second script tab I inserted minified JSON parse code from https://github.com/douglascrockford/JSON-js and the getFromMap function:

<span class="kwd">function</span><span class="pln"> getFromMap</span><span class="pun">(</span><span class="pln">key</span><span class="pun">,</span><span class="pln">jsonRow</span><span class="pun">){</span>
  <span class="kwd">try</span><span class="pun">{</span>
   <span class="kwd">var</span><span class="pln"> map </span><span class="pun">=</span><span class="pln"> JSON</span><span class="pun">.</span><span class="pln">parse</span><span class="pun">(</span><span class="pln">jsonRow</span><span class="pun">);</span>
  <span class="pun">}</span>
  <span class="kwd">catch</span><span class="pun">(</span><span class="pln">e</span><span class="pun">){</span>
   <span class="kwd">var</span><span class="pln"> message </span><span class="pun">=</span> <span class="str">"Unparsable JSON: "</span><span class="pun">+</span><span class="pln">jsonRow</span><span class="pun">+</span><span class="str">" Desc: "</span><span class="pun">+</span><span class="pln">e</span><span class="pun">.</span><span class="pln">message</span><span class="pun">;</span>
   <span class="kwd">var</span><span class="pln"> nr_errors </span><span class="pun">=</span> <span class="lit">1</span><span class="pun">;</span>
   <span class="kwd">var</span><span class="pln"> field </span><span class="pun">=</span> <span class="str">"jsonRow"</span><span class="pun">;</span>
   <span class="kwd">var</span><span class="pln"> errcode </span><span class="pun">=</span> <span class="str">"JSON_PARSE"</span><span class="pun">;</span><span class="pln">
   _step_</span><span class="pun">.</span><span class="pln">putError</span><span class="pun">(</span><span class="pln">getInputRowMeta</span><span class="pun">(),</span><span class="pln"> row</span><span class="pun">,</span><span class="pln"> nr_errors</span><span class="pun">,</span><span class="pln"> message</span><span class="pun">,</span><span class="pln"> field</span><span class="pun">,</span><span class="pln"> errcode</span><span class="pun">);</span><span class="pln">
   trans_Status </span><span class="pun">=</span><span class="pln"> SKIP_TRANSFORMATION</span><span class="pun">;</span>
   <span class="kwd">return</span> <span class="kwd">null</span><span class="pun">;</span>
  <span class="pun">}</span>

  <span class="kwd">if</span><span class="pun">(</span><span class="pln">map</span><span class="pun">[</span><span class="pln">key</span><span class="pun">]</span> <span class="pun">==</span> <span class="kwd">undefined</span><span class="pun">){</span>
   <span class="kwd">return</span> <span class="kwd">null</span><span class="pun">;</span>
  <span class="pun">}</span><span class="pln">
  trans_Status </span><span class="pun">=</span><span class="pln"> CONTINUE_TRANSFORMATION</span><span class="pun">;</span>
  <span class="kwd">return</span><span class="pln"> map</span><span class="pun">[</span><span class="pln">key</span><span class="pun">]</span>
<span class="pun">}</span>

merging many json files into one

jq solution:

jq -s '{ attributes: map(.attributes[0]) }' file*.json
  • -s (--slurp) – instead of running the filter for each JSON object in the input, read the entire input stream into a large array and run the filter just once.

Sample output:

{
  "attributes": [
    {
      "name": "Node",
      "value": "test"
    },
    {
      "name": "version",
      "value": "11.1"
    }
  ]
}