title: "Software Tools to Facilitate Research Programming"
authors: Philip J. Guo
venue: Ph.D. dissertation, Department of Computer Science, Stanford University
year: 2012
links:
- Blog post
tweet: Data scientists code in different ways than software engineers, so they need new kinds of tools
abstract: >
Research programming is a type of programming activity where the goal
is to write computer programs to obtain insights from data. Millions of
professionals in fields ranging from science, engineering, business, finance,
public policy, and journalism, as well as numerous students and computer
hobbyists, all perform research programming on a daily basis.
My thesis is that by understanding the unique challenges faced during research
programming, it becomes possible to apply techniques from dynamic program
analysis, mixed-initiative recommendation systems, and OS-level tracing to
make research programmers more productive.
This dissertation characterizes the research programming process,
describes typical challenges faced by research programmers, and presents
five software tools that I have developed to address some key challenges.
1.) Proactive Wrangler is an interactive graphical tool that helps
research programmers reformat and clean data prior to analysis.
2.) IncPy is a Python interpreter that speeds up the data
analysis scripting cycle and helps programmers manage code and data
dependencies.
3.) SlopPy is a Python interpreter that automatically makes
existing scripts error-tolerant, thereby also speeding up the data analysis
scripting cycle.
4.) Burrito is a Linux-based system that helps programmers organize,
annotate, and recall past insights about their experiments.
5.) CDE is a software packaging tool that makes it easy to deploy,
archive, and share research code.
Taken together, these five tools enable research programmers to iterate and
potentially discover insights faster by offloading the burdens of data
management and provenance to the computer.
bibtex: >
@PhdThesis{GuoPhD2012,
author = {Guo, Philip J.},
title = {Software Tools to Facilitate Research Programming},
school = {Stanford University},
month = May,
year = 2012,
}