Looking over the source code for Bisecting K-means it seems that it builds an internal tree representation of the cluster assignments at each level it progresses. Is it possible to get access to that tree? The built-in methods only give the cluster assignment at the leafs and not the nodes.
Follow up on this: has anyone modified the Spark ML source code to be able to store & return the hierarchical clustering tree structure?
I found a GitHub repo with intro to MLlib 1.6's implementation of Bisecting K-means Clustering: https://github.com/yu-iskw/bisecting-kmeans-blog/blob/master/blog-article.md
In the section "What's Next?", the first JIRA ticket [SPARK-11664] "Add methods to get bisecting k-means cluster structure" (https://issues.apache.org/jira/browse/SPARK-11664) seems to be the request to obtain the hierarchical cluster tree structure as a built-in effort. As of today, this ticket status is marked as "resolved".
However, in Spark MLlib's latest implementation (2.4.4) as follows, we didn't find this tree structure, or dendrogram to be a built-in output:
PySpark MLlib 2.4.4 official documentation: https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.clustering.BisectingKMeans https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.clustering.BisectingKMeansModel
Scala MLlib 2.4.4 official documentation: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.clustering.BisectingKMeans https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.clustering.BisectingKMeansModel
We also looked up into their source code, and it does not seem to have the hierarchical tree structure stored as built-in output?
If the hierarchical clustering tree structure is not available in Spark MLlib 2.4.4 BisectingKMeans, does anyone know if there's modified the source code to get the tree structure available?