Extract Nested Data From Complex JSON

Never manually walk through complex JSON objects again by using this function

# recursivejson.py

def extract_values(obj, key):
    """Pull all values of specified key from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr

    results = extract(obj, arr, key)
    return results

Google SheetAsJSON + Filtering

This is an extension of DJ Adams’ excellent SheetAsJSON Google Apps Script, which provides a way to GET a published Google Spreadsheet as a JSON feed. This version allows generic filtering for terms, more specific control over which rows to parse, and correct MIME type for JSONP output.

Minimal Usage

The following parameters are required for the script to work.

https://script.google.com/macros/s/AKfycbzGvKKUIaqsMuCj7-A2YRhR-f7GZjl4kSxSN1YyLkS01_CfiyE/exec?
+ id=<spreadsheet key>
+ sheet=<sheet name on spreadsheet>

Per the original, the above script serves a representation of all the sheet’s data as JSON, using the first row as the set of keys:

{ records : [
    { (row1, column1): (row2, column1), (row1, column2): (row2, column2), ..},
    { (row1, column1): (row3, column1), (row1, column2): (row3, column2), ..},
    ...
  ]
}

Try it: https://script.google.com/macros/s/AKfycbzGvKKUIaqsMuCj7-A2YRhR-f7GZjl4kSxSN1YyLkS01_CfiyE/exec?id=0AgviZ9NWh5fvdDdNMlI2aXRCR2lCX1B1alZ6ZjZxSkE&sheet=Summary&header=2&startRow=3

Replacing EAV with JSONB in PostgreSQL

Enter Entity-Attribute-Value. I’ve seen this pattern in almost every database I’ve worked with. One table contains the entities, another table contains the names of the properties (attributes) and a third table links the entities with their attributes and holds the value. This gives you the flexibility for having different sets of properties (attributes) for different entities, and also for adding properties on the flywithout locking your table for 3 days.

Nonetheless, I wouldn’t be writing this post if there were no downsides to this approach. Selecting one or more entities based on 1 attribute value requires 2 joins: one with the attribute table and one with the value table. Need entities bases on 2 attributes? That’s 4 joins! Also, the properties usually are all stored as strings, which results in type casting, both for the result as for the WHERE clause. If you write a lot of ad-hoc queries, this is very tedious.

Despite these obvious shortcomings, EAV has been used for a long time to solve this kind of problem. It was a necessary evil, and there just was no better alternative. But then PostgreSQL came along with a new feature…

Starting from PostgreSQL 9.4, a JSONB datatype was added for storing binary JSON data.

Trying JSON in Django and PostgreSQL (and compare with MongoDB)

class Product(models.Model):
    name = models.CharField(max_length=100)
    category = models.ForeignKey(Category)
    price = models.IntegerField()
    attributes = JSONField()

    def __str__(self):
        return self.name

 

Product.objects.create(name='Bamboo tshirt', category=tshirt, price=120, attributes={
    'colors': ['white', 'yellow'],
    'sizes': ['M', 'L', 'XL'],
    'model': 'poet',
    'material': 'bamboo',
})