pyCluster is a Python implementation for clustering algorithms, including PAM and Clara. Enjoy!
1. PAM
kMedoids – PAM implementation
See more : http://en.wikipedia.org/wiki/K-medoids
The most common realisation of k-medoid clustering is the Partitioning Around Medoids (PAM) algorithm and is as follows:[2]
1. Initialize: randomly select k of the n data points as the medoids
2. Associate each data point to the closest medoid. (“closest” here is defined using any valid distance metric, most commonly Euclidean distance, Manhattan distance or Minkowski distance)
3. For each medoid m
For each non-medoid data point o
Swap m and o and compute the total cost of the configuration
4. Select the configuration with the lowest cost.
5. repeat steps 2 to 4 until there is no change in the medoid.
2. Clara
CLARA implementation
1. For i = 1 to 5, repeat the following steps:
2. Draw a sample of 40 + 2k objects randomly from the
entire data set,2 and call Algorithm PAM to find
k medoids of the sample.
3. For each object Oj in the entire data set, determine
which of the k medoids is the most similar to Oj.
4. Calculate the average dissimilarity of the clustering
obtained in the previous step. If this value is less
than the current minimum, use this value as the
current minimum, and retain the k medoids found in
Step 2 as the best set of medoids obtained so far.
5. Return to Step 1 to start the next iteration.
Project Name: pyCluster
Destination: Python Clustering
Language: Python
IDE: Vim
Library:
Project Web: https://github.com/daveti/pycluster
Git Read Only: https://github.com/daveti/pycluster.git