pyCluster – Python Clustering

pyCluster is a Python implementation for clustering algorithms, including PAM and Clara. Enjoy!

1. PAM

kMedoids – PAM implementation
See more : http://en.wikipedia.org/wiki/K-medoids
The most common realisation of k-medoid clustering is the Partitioning Around Medoids (PAM) algorithm and is as follows:[2]
1. Initialize: randomly select k of the n data points as the medoids
2. Associate each data point to the closest medoid. (“closest” here is defined using any valid distance metric, most commonly Euclidean distance, Manhattan distance or Minkowski distance)
3. For each medoid m
For each non-medoid data point o
Swap m and o and compute the total cost of the configuration
4. Select the configuration with the lowest cost.
5. repeat steps 2 to 4 until there is no change in the medoid.

2. Clara

CLARA implementation
1. For i = 1 to 5, repeat the following steps:
2. Draw a sample of 40 + 2k objects randomly from the
entire data set,2 and call Algorithm PAM to find
k medoids of the sample.
3. For each object Oj in the entire data set, determine
which of the k medoids is the most similar to Oj.
4. Calculate the average dissimilarity of the clustering
obtained in the previous step. If this value is less
than the current minimum, use this value as the
current minimum, and retain the k medoids found in
Step 2 as the best set of medoids obtained so far.
5. Return to Step 1 to start the next iteration.

Project Name: pyCluster
Destination: Python Clustering
Language: Python
IDE: Vim
Library:
Project Web: https://github.com/daveti/pycluster
Git Read Only: https://github.com/daveti/pycluster.git

About daveti

Interested in kernel hacking, compilers, machine learning and guitars.

View all posts by daveti →

pyCluster – Python Clustering

About daveti

Leave a comment Cancel reply

Categories

Tags

Blog Stats