Back to Developer Roadmap

Knn Vs Kmeans

src/data/question-groups/data-science/content/knn-vs-kmeans.md

4.01.9 KB
Original Source

KNN stands for K-nearest neighbors is a classification (or regression) algorithm that, to determine the classification of a point, combines the classification of the K nearest points. It is supervised because you are trying to classify a point based on the known classification of other points.

K-means is a clustering algorithm that tries to partition a set of points into K sets (clusters) such that the points in each cluster tend to be near each other. It is unsupervised because the points have no external classification.

This table shows the difference between KNN and K-means depending on the use case, data usage, purpose, and other features.

FeatureK-Nearest Neighbors (KNN)K-Means Clustering
Algorithm TypeSupervised Learning (Classification/Regression)Unsupervised Learning (Clustering)
PurposeClassifies new data points on labeled training dataGroups unlabeled data points into clusters
Data UsageUses the entire dataset for predictionsSplits the data into clusters iteratively
ScalabilitySlow for large datasets because all data points are needed for predictionsFaster for large datasets because initial centroids are already set
Example Use CaseImage Classification, Recommendation SystemsCustomer Grouping