# BIC and AIC in python using scipy.vq kmeans

def aic(data,distortion, clusterNumber):
import math
return distortion+2*clusterNumber*len(data[0])

Quote from ppl using R:

To compute BIC, Add .5*k*d*log(n) (where k is the number of means, d is the length of a vector in your dataset, and n is the number of data points) to the standard k-means error function.

The standard k-means penalty is \sum_n (m_k(n)-x_n)^2, where m_k(n) is the mean associated with the nth data point. This penalty can be interpreted as a log probability, so BIC is perfectly valid.

BIC just adds an additional penalty term to the k-means error proportional to k.

 def bic(data,distortion, clusterNumber): import numpy as np import math if type(data)!= type(np.array([])): print('invlaid data type in bic') return 0 return distortion+0.5*math.log(data.size)*clusterNumber*len(data[0]) 

Advertisements