Buckaroo: A Direct Manipulation Visual Data Wrangler

Buckaroo: A Direct Manipulation Visual Data Wrangler
Annabelle Warner, Andrew McNutt, Paul Rosen, and El Kindi Rezig
Proceedings of the VLDB Demo Track, 2025

Abstract

Preparing datasets — a critical phase known as data wrangling — constitutes the dominant phase of data science development, consuming upwards of 80% of the total project time. This phase encompasses a myriad of tasks: parsing data, restructuring it for analysis, repairing inaccuracies, merging sources, eliminating duplicates, and ensuring overall data integrity. Traditional approaches, typically through manual coding in languages such as Python or using spreadsheets, are not only laborious but also error-prone. These issues range from missing entries and formatting inconsistencies to data type inaccuracies, all of which can affect the quality of downstream tasks if not properly corrected. To address these challenges, we present Buckaroo, a visualization system to highlight discrepancies in data and enable on-the-spot corrections through direct manipulations of visual objects. Buckaroo (1) automatically finds “interesting” data groups that exhibit anomalies compared to the rest of the groups and recommends them for inspection; (2) suggests wrangling actions that the user can choose to repair the anomalies; and (3) allows users to visually manipulate their data by displaying the effects of their wrangling actions and offering the ability to undo or redo these actions, which supports the iterative nature of data wrangling. A video companion is available at https://www.youtube.com/watch?v=iXdCYbvpQVE

Downloads

Download the Paper Download the BiBTeX

Citation

Annabelle Warner, Andrew McNutt, Paul Rosen, and El Kindi Rezig. Buckaroo: A Direct Manipulation Visual Data Wrangler. Proceedings of the VLDB Demo Track, 2025.

Bibtex


@article{warner2025buckaroo,
  title = {Buckaroo: A Direct Manipulation Visual Data Wrangler},
  author = {Warner, Annabelle and McNutt, Andrew and Rosen, Paul and Rezig, El Kindi},
  journal = {Proceedings of the VLDB Demo Track},
  year = {2025},
  abstract = {Preparing datasets -- a critical phase known as data wrangling --
    constitutes the dominant phase of data science development, consuming upwards of 80% of
    the total project time. This phase encompasses a myriad of tasks: parsing data,
    restructuring it for analysis, repairing inaccuracies, merging sources, eliminating
    duplicates, and ensuring overall data integrity. Traditional approaches, typically
    through manual coding in languages such as Python or using spreadsheets, are not only
    laborious but also error-prone. These issues range from missing entries and formatting
    inconsistencies to data type inaccuracies, all of which can affect the quality of
    downstream tasks if not properly corrected. To address these challenges, we present
    Buckaroo, a visualization system to highlight discrepancies in data and enable
    on-the-spot corrections through direct manipulations of visual objects. Buckaroo (1)
    automatically finds "interesting" data groups that exhibit anomalies compared to the
    rest of the groups and recommends them for inspection; (2) suggests wrangling actions
    that the user can choose to repair the anomalies; and (3) allows users to visually
    manipulate their data by displaying the effects of their wrangling actions and offering
    the ability to undo or redo these actions, which supports the iterative nature of data
    wrangling. A video companion is available at https://www.youtube.com/watch?v=iXdCYbvpQVE}
}