content/snippets/js/s/k-means.md
The K-means clustering algorithm is a popular unsupervised machine learning algorithm used to group a set of data into clusters. It works by iteratively assigning data points to the nearest cluster centroid and then recalculating the centroids based on the new assignments. This process is repeated until the centroids no longer change significantly or a maximum number of iterations is reached.
This implementation of the K-means clustering algorithm groups the given data into k clusters. No maximum number of iterations is set, so the algorithm will run until convergence is reached.
k data points as the initial centroids, using Array.prototype.slice().distances array to store the distances between each data point and each centroid, as well as the classes array to store the cluster assignments for each data point.while loop to repeat the assignment and update steps as long as there are changes in the previous iteration, as indicated by the itr variable.Math.hypot(), Object.keys(), and Array.prototype.map().Array.prototype.indexOf() and Math.min() to find the closest centroid for each data point.classes array and check if any changes were made.Array.from(), Array.prototype.reduce(), Number.parseFloat(), and Number.prototype.toFixed().const kMeans = (data, k = 1) => {
const centroids = data.slice(0, k);
const distances = Array.from({ length: data.length }, () =>
Array.from({ length: k }, () => 0)
);
const classes = Array.from({ length: data.length }, () => -1);
let itr = true;
while (itr) {
itr = false;
for (let d in data) {
for (let c = 0; c < k; c++) {
distances[d][c] = Math.hypot(
...Object.keys(data[0]).map(key => data[d][key] - centroids[c][key])
);
}
const m = distances[d].indexOf(Math.min(...distances[d]));
if (classes[d] !== m) itr = true;
classes[d] = m;
}
for (let c = 0; c < k; c++) {
centroids[c] = Array.from({ length: data[0].length }, () => 0);
const size = data.reduce((acc, _, d) => {
if (classes[d] === c) {
acc++;
for (let i in data[0]) centroids[c][i] += data[d][i];
}
return acc;
}, 0);
for (let i in data[0]) {
centroids[c][i] = Number.parseFloat(
Number(centroids[c][i] / size).toFixed(2)
);
}
}
}
return classes;
};
kMeans([[0, 0], [0, 1], [1, 3], [2, 0]], 2); // [0, 1, 1, 0]
The time complexity of the K-means clustering algorithm is O(n * k * d * i), where n is the number of data points, k is the number of clusters, d is the number of dimensions in the data, and i is the number of iterations until convergence. The space complexity is O(n * k + n * d), where the first term represents the space used by the centroids and classes arrays, and the second term represents the space used by the distances array.