Introduction to K-means Clustering

This introduction to the K-means clustering algorithm covers:

  • Common business cases where K-means is used
  • The steps involved in running the algorithm
  • A Python example using delivery fleet data

 

<span class="token keyword">import</span> numpy <span class="token keyword">as</span> np
<span class="token keyword">from</span> sklearn<span class="token punctuation">.</span>cluster <span class="token keyword">import</span> KMeans

<span class="token comment" spellcheck="true">### For the purposes of this example, we store feature data from our</span>
<span class="token comment" spellcheck="true">### dataframe `df`, in the `f1` and `f2` arrays. We combine this into</span>
<span class="token comment" spellcheck="true">### a feature matrix `X` before entering it into the algorithm.</span>
f1 <span class="token operator">=</span> df<span class="token punctuation">[</span><span class="token string">'Speeding Feature'</span><span class="token punctuation">]</span><span class="token punctuation">.</span>values<span class="token punctuation">(</span><span class="token punctuation">)</span>
f2 <span class="token operator">=</span> df<span class="token punctuation">[</span><span class="token string">'Distance Feature'</span><span class="token punctuation">]</span><span class="token punctuation">.</span>values<span class="token punctuation">(</span><span class="token punctuation">)</span>

X<span class="token operator">=</span>np<span class="token punctuation">.</span>matrix<span class="token punctuation">(</span>zip<span class="token punctuation">(</span>f1<span class="token punctuation">,</span>f2<span class="token punctuation">)</span><span class="token punctuation">)</span>
kmeans <span class="token operator">=</span> KMeans<span class="token punctuation">(</span>n_clusters<span class="token operator">=</span><span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">.</span>fit<span class="token punctuation">(</span>X<span class="token punctuation">)</span>

Tracking Down a Freaky Python Memory Leak

To track the memory allocated by my Python process, I turned to Performance Monitor2.

Here’s what I did:

  1. I launched Performance Monitor (perfmon.msc).
  2. I created a new Data Collector Set and added the Process > Private Bytescounter for all Python instances3.
  3. I started the Data Collector Set.
  4. I let Performance Monitor collect data for several minutes4.
  5. I exported the data contained in the generated report to a CSV file.
  6. I opened the CSV file in Excel and created a chart for the process that I suspected had the memory leak:c

AsyncIO for the Working Python Developer

By using yield from on another coroutine we declare that the coroutine may give the control back to the event loop, in this case sleep will yield and the event loop will switch contexts to the next task scheduled for execution: bar. Similarly the bar function yields from sleep which allows the event loop to pass control back to foo at the point where it yielded, as it happens with all generators.