SELECT * FROM histogram($table_name_or_subquery, $column_name);
. . . to give sweet results like this, in a check of the distribution of 2016 political contributions in Vermont:
fec=# SELECT * FROM histogram('(SELECT * FROM small_donors_vt LIMIT 50000)', 'transaction_amt'); bucket | range | freq | bar --------+-----------+------+----------------- 1 | [0,9] | 2744 | ****** 2 | [10,19] | 5630 | ************* 3 | [20,29] | 6383 | *************** 4 | [30,39] | 1290 | *** 5 | [40,49] | 369 | * 6 | [50,59] | 3541 | ******** 7 | [60,69] | 174 | 8 | [70,79] | 313 | * 9 | [80,89] | 171 | 10 | [90,99] | 65 | 11 | [100,109] | 2363 | ****** 12 | [110,119] | 51 | 13 | [120,129] | 115 | 14 | [130,139] | 32 | 15 | [140,146] | 11 | 16 | [150,159] | 187 | 17 | [160,169] | 24 | 18 | [170,177] | 33 | 19 | [180,189] | 19 | 20 | [191,199] | 24 | 21 | [200,200] | 795 | **
PostgreSQL, Aggregates and Histograms
Let’s have a look at our dataset from the NBA games and statistics, and get back to counting rebounds in the
drb
field. A preliminary query informs us that we have stats ranging from 10 to 54 rebounds per team in a single game, a good information we can use in the following query:<span class="keyword">select</span> width_bucket(drb, <span class="number">10</span>, <span class="number">54</span>, <span class="number">9</span>) <span class="keyword">as</span> buckets, <span class="keyword">count</span>(*) <span class="keyword">from</span> team_stats <span class="keyword">group</span> <span class="keyword">by</span> buckets <span class="keyword">order</span> <span class="keyword">by</span> buckets;
We asked for 9 separations so we have 10 groups as a result:
width_bucket | count --------------+------- 1 | 52 2 | 1363 3 | 8832 4 | 20917 5 | 20681 6 | 9166 7 | 2093 8 | 247 9 | 20 10 | 1 (10 rows)