Visualize Thread by @mdancho84 | Thread Navigator

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Matt Dancho (Business Science)

@mdancho84

K-means is an essential algorithm for Data Science.

But it's confusing for beginners.

Let me demolish your confusion:

Thread image

Matt Dancho (Business Science)

@mdancho84

1. K-Means

K-means is a popular unsupervised machine learning algorithm used for clustering. It's a core algorithm used for customer segmentation, inventory categorization, market segmentation, and even anomaly detection.

Thread image

Matt Dancho (Business Science)

@mdancho84

2. Unsupervised:

K-means is an unsupervised algorithm used on data with no labels or predefined outcomes. The goal is not to predict a target output, but to explore the structure of the data by identifying patterns, clusters, or relationships within the dataset.

Matt Dancho (Business Science)

@mdancho84

3. Objective Function:

The objective of K-means is to minimize the within-cluster sum of squares (WCSS). It does this though a series of iterative steps that include Assignments and Updated Steps.

Thread image

Matt Dancho (Business Science)

@mdancho84

4. Assignment Step:

In this step, each data point is assigned to the nearest cluster centroid. The "nearest" is typically determined using the Euclidean distance.

Thread image

Matt Dancho (Business Science)

@mdancho84

5. Update Step:

Recalculate the centroids as the mean of all points in the cluster. Each centroid is the average of the points in its cluster.

Matt Dancho (Business Science)

@mdancho84

6. Iterate Step(s):

The assignment and update steps are repeated until the centroids no longer change significantly, indicating that the clusters are as good as stable. This process minimizes the within-cluster variance.

Matt Dancho (Business Science)

@mdancho84

7. Silhouette Score (Evaluation):

This metric measures how similar a data point is to its own cluster compared to other clusters. The silhouette score ranges from -1 to 1, where a high value indicates that the data point is well-matched to its own cluster and poorly matched to neighboring clusters.

Thread image

Matt Dancho (Business Science)

@mdancho84

8. Elbow Method (Evaluation):

This method involves plotting the inertia as a function of the number of clusters and looking for an 'elbow' in the graph. The elbow point, where the rate of decrease sharply changes, can be a good choice for the number of clusters.

Thread image

Matt Dancho (Business Science)

@mdancho84

9. There's a new problem that has surfaced --

Companies NOW want AI.

AI is the single biggest force of our decade. Yet 99% of data scientists are ignoring it.

That's a huge advantage to you. I'd like to help.

Matt Dancho (Business Science)

@mdancho84

On Wednesday, June 25th, I'm sharing one of my best AI Projects for FREE:

How I built an AI Customer Segmentation Agent with Python:

Register here (limit 500 seats): learn.business-science.io/ai-register

Thread image

Matt Dancho (Business Science)

@mdancho84

That's a wrap! Over the next 24 days, I'm sharing the 24 concepts that helped me become an AI data scientist.

If you enjoyed this thread:

1. Follow me @mdancho84 for more of these
2. RT the tweet below to share this thread with your audience

View Tweet

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export