August 8–10. Iowa State University, Ames, Iowa

Sponsored by:


ASA Sections on Statistical Graphics and Computing


Programming competition

The prizes

  • $500 cash prize
  • useR books from Springer
  • Presentation at useR

The challenge

Develop a package useful for the analysis of large data sets.

Your package could augment an existing package like biglm, for example, by providing graphical or numerical diagnostic tools, or by adding support methods for transparently handling data from a database. Alternatively, you could develop a package for fitting linear or generalised linear models to large data sets that takes an entirely different approach.

Your entry

Deadline: June 30.

You will be expected to submit a complete R package, suitable for upload to CRAN (i.e. it should pass R CMD check). Your package should include as a vignette a paper describing your approach, illustrating its use, and explaining how it will scale to handling data sets larger than memory (or possibly larger than your disk). You should reference the relevant statistical literature.

Don't forget to include the names, affiliations and email address of everyone who contributed to the entry in the DESCRIPTION file.

Please submit your entry, via email, to [email protected]. In your email you should include a statement affirming that the submission is all your work, and has been completed specifically for the useR 2007 programming competition.

If you have any questions, please contact Hadley Wickham, [email protected].


The judging committee is made up of:

  • Michael Lawrence
  • Thomas Lumley
  • Luke Tierney
  • Simon Urbanek
  • Hadley Wickham

and we will be judging the entries based on the following criteria:

  • Quality of code and code design. Is the code easy to read and appropriate documented internally? Is it easy to follow the flow of logic in the code?
  • Contribution, including efficiency. How much faster is it than the current best available solution in R? Does it implement new ideas?
  • Usability. How well is it documented? Is it easy for the new user to get started? Does it provide appropriate references to the statistical literature?
  • Scalability. How well does it scale to large problems?

The cash prize will be shared equally among all participants in the winning entry. In the event of a tie for first place the winnings will first be divided equally among the winning projects and then shared equally among participants in each project.