CLAMS: Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering

CLAMS: Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering
Hyeon Jeon, Ghulam Jilani Quadri, Hyunwook Lee, Paul Rosen, Danielle Albers Szafir, and Jinwook Seo
IEEE Transaction on Computer Graphics and Visualization (IEEE VIS), 2024

Abstract

Visual clustering is a common perceptual task in scatterplots that supports diverse analytics tasks (e.g., cluster identification). However, even with the same scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can differ due to the differences among individuals. Although such perceptual variability casts doubt on the reliability of data analysis based on visual clustering, we lack a systematic way to efficiently assess this variability. In this research, we study perceptual variability in conducting visual clustering, which we call textit{Cluster Ambiguity}. To this end, we introduce textit{CLAMS}, a data-driven visual quality measure for automatically predicting cluster ambiguity in monochrome scatterplots. We first conduct a qualitative study to identify key factors that affect the visual separation of clusters (e.g., proximity or size difference between clusters). Based on the study findings, we deploy a regression module that estimates the human-judged separability of two clusters. Then, CLAMS predicts cluster ambiguity by analyzing the aggregated results of all pairwise separability between clusters that are generated by the module. CLAMS outperforms widely-used clustering techniques in predicting ground truth cluster ambiguity. Meanwhile, CLAMS exhibits performance on par with human annotators. We conclude our work by presenting two applications for optimizing and benchmarking data mining techniques using CLAMS.

Downloads

Download the Paper Download the BiBTeX

Citation

Hyeon Jeon, Ghulam Jilani Quadri, Hyunwook Lee, Paul Rosen, Danielle Albers Szafir, and Jinwook Seo. CLAMS: Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering. IEEE Transaction on Computer Graphics and Visualization (IEEE VIS), 2024.

Bibtex


@article{jeon2023clams,
  title = {{CLAMS}: Cluster Ambiguity Measure for Estimating Perceptual Variability in
    Visual Clustering},
  author = {Jeon, Hyeon and Quadri, Ghulam Jilani and Lee, Hyunwook and Rosen, Paul and
    Szafir, Danielle Albers and Seo, Jinwook},
  journal = {IEEE Transaction on Computer Graphics and Visualization (IEEE VIS)},
  year = {2024},
  note = {textit{Presented at IEEE VIS 2023. Honorable Mention for Best Paper.}},
  abstract = {Visual clustering is a common perceptual task in scatterplots that supports
    diverse analytics tasks (e.g., cluster identification). However, even with the same
    scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can
    differ due to the differences among individuals. Although such perceptual variability
    casts doubt on the reliability of data analysis based on visual clustering, we lack a
    systematic way to efficiently assess this variability. In this research, we study
    perceptual variability in conducting visual clustering, which we call textit{Cluster
    Ambiguity}. To this end, we introduce textit{CLAMS}, a data-driven visual quality
    measure for automatically predicting cluster ambiguity in monochrome scatterplots. We
    first conduct a qualitative study to identify key factors that affect the visual
    separation of clusters (e.g., proximity or size difference between clusters). Based on
    the study findings, we deploy a regression module that estimates the human-judged
    separability of two clusters. Then, CLAMS predicts cluster ambiguity by analyzing the
    aggregated results of all pairwise separability between clusters that are generated by
    the module. CLAMS outperforms widely-used clustering techniques in predicting ground
    truth cluster ambiguity. Meanwhile, CLAMS exhibits performance on par with human
    annotators. We conclude our work by presenting two applications for optimizing and
    benchmarking data mining techniques using CLAMS.}
}