def aic(data,distortion, clusterNumber):

import math

return distortion+2*clusterNumber*len(data[0])

Quote from ppl using R:

To compute BIC, Add `.5*k*d*log(n)`

(where `k`

is the number of means, `d`

is the length of a vector in your dataset, and `n`

is the number of data points) to the standard k-means error function.

The standard k-means penalty is `\sum_n (m_k(n)-x_n)^2`

, where `m_k(n)`

is the mean associated with the nth data point. This penalty *can* be interpreted as a log probability, so BIC is perfectly valid.

BIC just adds an additional penalty term to the k-means error proportional to `k`

.

def bic(data,distortion, clusterNumber):

import numpy as np

import math

if type(data)!= type(np.array([])):

print('invlaid data type in bic')

return 0

return distortion+0.5*math.log(data.size)*clusterNumber*len(data[0])