BIC and AIC in python using scipy.vq kmeans

def aic(data,distortion, clusterNumber):
import math
return distortion+2*clusterNumber*len(data[0])

Quote from ppl using R:

To compute BIC, Add .5*k*d*log(n) (where k is the number of means, d is the length of a vector in your dataset, and n is the number of data points) to the standard k-means error function.

The standard k-means penalty is \sum_n (m_k(n)-x_n)^2, where m_k(n) is the mean associated with the nth data point. This penalty can be interpreted as a log probability, so BIC is perfectly valid.

BIC just adds an additional penalty term to the k-means error proportional to k.


def bic(data,distortion, clusterNumber):
import numpy as np
import math
if type(data)!= type(np.array([])):
print('invlaid data type in bic')
return 0
return distortion+0.5*math.log(data.size)*clusterNumber*len(data[0])


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s