title: "Proactive Wrangling: Mixed-Initiative End-User Programming of Data Transformation Scripts" authors: Philip J. Guo, Sean Kandel, Joseph M. Hellerstein, Jeffrey Heer venue: ACM Symposium on User Interface Software and Technology (UIST) year: 2011 tweet: A data wrangling UI uses proactive suggestions to help users transform data into relational formats abstract: > Analysts regularly wrangle data into a form suitable for computational tools through a tedious process that delays more substantive analysis. While interactive tools can assist data transformation, analysts must still conceptualize the desired output state, formulate a transformation strategy, and specify complex transforms. We present a model to proactively suggest data transforms which map input data to a relational format expected by analysis tools. To guide search through the space of transforms, we propose a metric that scores tables according to type homogeneity, sparsity and the presence of delimiters. When compared to "ideal" hand-crafted transformations, our model suggests over half of the needed steps; in these cases the top-ranked suggestion is preferred 77% of the time. User study results indicate that suggestions produced by our model can assist analysts' transformation tasks, but that users do not always value proactive assistance, instead preferring to maintain the initiative. We discuss some implications of these results for mixed-initiative interfaces. bibtex: > @inproceedings{GuoProWrangler2011, author = {Guo, Philip J. and Kandel, Sean and Hellerstein, Joseph M. and Heer, Jeffrey}, title = {Proactive Wrangling: Mixed-initiative End-user Programming of Data Transformation Scripts}, booktitle = {Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology}, series = {UIST '11}, year = {2011}, isbn = {978-1-4503-0716-1}, location = {Santa Barbara, California, USA}, pages = {65--74}, numpages = {10}, url = {http://doi.acm.org/10.1145/2047196.2047205}, doi = {10.1145/2047196.2047205}, acmid = {2047205}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {data analysis, data cleaning, data transformation, end-user programming, mixed-initiative interfaces}, }