A segment on 60 Minutes on February 12, 2012 concerns the cancer clinical trials at Duke University that were recently found to be based on manipulated and/or badly analyzed data. This case is frequently evoked as evidence for the need for higher standards for data sharing and reproducibility in scientific work.
For more details of the statistical work uncovering the problems, see this talk by Keith Baggerly: The Importance of Reproducible Research in High-Throughput Biology: Case Studies in Forensic Bioinformatics
For another video on the importance of reproducibility and the Duke trials, see the talk Freedom (to reproduce) by John Wilbanks, who runs the Science Commons project at Creative Commons. This was the lead-off talk at the workshop Reproducible Research: Tools and Strategies for Scientific Computing that I organized with Victoria Stodden and Ian Mitchell last summer, see this link for all the talks, abstracts, and slides. There are several good ones here!
Update 29 Feb 2012: There are some interesting discussions of the Duke case and the 60 Minutes segment at http://groups.google.com/group/reproducible-research.
The 2012 [HPC]^3 workshop at KAUST just ended. It’s been a busy week with lots of exciting developments — this was a workshop with an emphasis on work. Each morning there were a few talks, but the afternoons were devoted to working in groups developing new software capabilities, much of it related to Clawpack and in particular the PyClaw suite of software. This was the second such workshop at KAUST and some of the projects were continuations of things started at the 2011 workshop.
The title [HPC]^3 refers to 3 different interpretations of the acronym HPC. The full title is High Performance Computing and Hybrid Programming Concepts for Hyperbolic PDE Codes.
The workshop program is on the web and eventually videos and slides from the talks will also appear, so I won’t say too much about these here. Instead I’ll mention a few of the highlights and accomplishments of the working groups. Eventually a final report from each group will be posted. For now you can find out more about what went on in some groups on the workshop wiki.
- The AMR group (Donna Calhoun, Tobias Weinzierl, Carsten Burstedde, Kristof Unterweger, Amal Alghamdi, Qi Tang, Kyle Mandli, and Marsha Berger via Skype) made progress on two different approaches to incorporating tree-based adaptive mesh refinement into PyClaw, one based on p4est and the other on Peano.
- The Manycore group (George Turkiyyah, Nathan Bell, Andy Terell, Garune Ohannessian, Rio Yokota) made it possible to call CUDAClaw from Python and started re-engineering it to simplify other manycore implementations, including OpenCL, TBB, and ISPC.
- The implicit group (Jed Brown, Matteo Parsani, Lulu Liu) did some interesting work on using preconditioners for hyperbolic systems that equalize the wave speeds and on an implementation of downwind WENO for Runge-Kutta stages with negative coefficients.
- The Discontinuous Galerkin group (James Rossmanith and Scott Moe) wrapped DoGPack in Python and got it working with VisClaw. They started planning how to more fully incorporate DG into a DoGClaw code.
- Matt Emmett started working on applying WENO on mapped grids, and has a working code in 1d.
- The geosciences group (Chris Kees, Marc Hesse, Robert Weiss) worked on Clawpack implementations of two problems: two-phase flow in porous media with an interesting nonconvex and multimodal flux function, and sediment transport with an eye towards tsunami deposits. The final presentation also had a nice demo of the new IPython Notebook.
- The visualization and steering group (Madhu Srinivasan, Chris Knox, Atanas Atanasov, Bruce D’Amora) made progress on several fronts: improvement in the HDF5 PyClaw output routines and conversion of this output to xdmf form, starting to add VisIt and Paraview capabilities into VisClaw using this form, and coupling on-the-fly visualization into PyClaw so that the solution can be plotted during computation rather than as postprocessing. The ultimate goal is to use this to help steer the computation.
- A documentation sprint one evening got us started working on using Sphinx to better document each example and produce a better set of galleries of sample results. Yiannis Hadjimichael and Amal Alghamdi fixed many documentation pages to reflect recent changes to the code, and got the doctests working.
Thanks to everyone who participated and worked so hard, often late into the night. Thanks to my co-organizers and the other plenary speakers who moved between groups and helped out with many issues. And a special thanks to [David K]^2 (Ketcheson and Keyes) for helping create the wonderful environment at KAUST for computational science and multi-cultural interaction (including elliptic/hyperbolic and FEM/FV!), and for the financial support of the workshop and participants.
Recently Tim Gowers brought together a group of mathematicians to discuss the Elsevier boycott he started. The group prepared a statement that explains some of the background to, and reasons for, the boycott, along with a more general discussion of mathematical publishing.
This statement has just been published in a new post on Gowers’s Weblog.
We plan to advertise this statement widely among science writers and journalists and hope that the story will be picked up to a greater extent in the media. We also hope that word will spread among researchers, referees, and editors in other fields. Please help spread the word!
Update on 13 February 2012:
This statement and the boycott movement have attracted quite a bit of media attention, for example an article in the New York Times today.
A long list of media articles and editorials has been compiled by Terry Tao.
I’m signing the pledge at http://thecostofknowledge.com/ and thought I’d say a bit here about why I’m doing so.
In case you haven’t heard, this was set up in response to a blog post of Tim Gowers proposing such a site, and has gained several thousand signatories in the first few weeks. See also his follow-up post and the PolyMath journal publishing reform page, where a large number of links are posted, many to articles, editorials and blog posts that have rapidly appeared in the two weeks since Gowers’s original post.
I have been mostly boycotting Elsevier for several years. In particular I have not submitted a paper to the Journal of Computational Physics since they were acquired by Elsevier, and rarely referee for this journal any more, in spite of the fact that I was once on the editorial board and was a big fan of the journal. But Elsevier has had a reputation for decades for being the worst sort of the commercial publisher from the viewpoint of scientists. Of course some point out they are a business and they can run it as they see fit, but we are equally free to take our business elsewhere. Except that we’re not quite, which is part of the problem. They now bundle many journals in such a way that libraries cannot buy the ones essential to research without buying a bunch of other low-quality and/or un-needed journals at inflated prices. Elsevier made profits of more than a billion dollars last year. Meanwhile, the University of Washington libraries (a major academic library system) had to cancel several thousand journal subscriptions in the past few years due to severe budget cuts. The requirement to spend huge sums on journals that nobody on campus needs or wants is impairing our ability to access the journals we do need, or to use the funds for other worthy research or educational purposes.
Here are some of my thoughts when deciding to sign. For many more discussions (and opposing viewpoints) you should see the links above.
- Elsevier has a reputation for publishing more bogus papers (and entire journals) than any other publisher I know. For a discussion of some specific cases, see this post by Doug Arnold explaining why he is signing on to the boycott. Doug looked into these issues extensively as part of a joint working group on rankings of mathematical journals for the IMU and ICIAM, and while investigating plagiarism as President of SIAM (see also his article on this topic in SIAM News).
- Commercial publishers add relatively little value to articles these days, beyond giving a stamp of approval via the prestige of the journal and the hard work of the editorial board and referees (who are generally unpaid volunteers). Once upon a time publishers were necessary for typesetting and copy editing, but these days every mathematician I know does all their own typesetting using LaTeX. This is hardly new; I started using TeX at Stanford in 1977 and within a few years it was available everywhere. Some commercial publishers (particularly Elsevier, I’ve heard) don’t even provide much copy editing these days.
- Rather than volunteering our time to write papers, typeset them, referee them, edit them, etc., only to be forced to buy them back at high prices from companies that add little value, we should be smart enough to figure out better ways. Until we do, commercial publishers have a stranglehold on libraries. People have been bemoaning this for years and I’m happy to see some action being taken that might get their attention.
- Elsevier is probably not the only commercial publisher who should be targeted. My last blog post was a draft of an article I’m writing for the Encyclopedia of Applied and Computational Mathematics to be published by Springer. When I searched for it to add a link I was rather shocked to see the list price will be “approximately $5000″. As far as I know authors are being paid nothing (beyond a 33% discount on Springer books and online access to the encyclopedia, which I may have anyway if our library feels compelled to buy a copy). However, as Gowers says in his original blog post, we have to start somewhere and Elsevier is the obvious first choice for many people I know. For the time being some researchers must rely on journals from commercial publishers and we can’t expect people to boycott all publishers. But I do try to favor non-profit journals in general, particularly when prioritizing accepting the many invitations I receive to referee papers. I often point editors to my webpage on refereeing.
- I’m not opposed to selling journals at reasonable prices. Many professional societies publish journals and make money on them, including my main society SIAM (full disclosure: I’m chair of the SIAM Journals Subcommittee). But there’s an order of magnitude difference in the prices being charged, and SIAM does copy editing and still makes enough of a profit on the journals that they help support many other aspects of the society. This is another advantage of nonprofit society journals: whatever profits there are go back to the members one way or another.
- I would not necessarily encourage young (un-tenured) people to sign this pledge, particularly in fields where Elsevier journals are prevalent. And I have some qualms about doing so myself since I often write papers with students and younger colleagues whose careers depend on publishing in the recognized top journals. But for the most part I’ve managed to avoid publishing in Elsevier journals for several years now and haven’t missed it. One exception was a recent paper for a special issue of Advances in Water Resources on software, but in the future I can find plenty of other alternative venues for publishing in my field. (On the flip side, some comments on Gowers blog suggest that soon it may be seen as a negative for young people to publish with Elsevier. Probably not soon, but perhaps some day.)
- I’m often surprised that so many mathematicians and scientists are oblivious to the issue of journal pricing and I hope this campaign will raise awareness if nothing else. Personally I was introduced to the issue very early in my career. My father was Executive Director of the American Mathematical Society in the 1980′s when the Society was threatened with a lawsuit by Gordon & Breach for publishing a survey of journal pricing and value. Other societies (and authors) were sued. The suits went nowhere and I understand that the name of Gordon & Breach was so tarnished in academic circles that it eventually disappeared (acquired by Taylor & Francis). Elsevier take note!
I have prepared a draft of an article on reproducible research methods that I was invited to contribute to the Encyclopedia of Applied and Computational Mathematics to be published by Springer. [Update: I just checked with Springer and the $5000 price quoted is not a typo. Needless to say I'm not happy about this! I'm looking into it further.] [Further update on 29 Feb 2012: I've been told that the price on that webpage was a mistake and it's a mystery how it got in the database. That was last week and it's still there...]
I welcome comments on this draft, either sent by email or posted below. Note that this article was greatly constrained in size (it may be too long already) and so I was very selective in what to include, but if you think I’ve neglected something important I will consider it.
I have avoided mentioning specific tools (other than some version control
systems) since there are a so many out there it would be hard to choose
which to point out, and hard to know which will stand the test of time.
I have included only two references, selected primarily as sources of other links.