Skip to content

Latest commit

 

History

History
103 lines (61 loc) · 7.46 KB

SEresearchReuseManifesto.md

File metadata and controls

103 lines (61 loc) · 7.46 KB

The SE Research Reuse Manifesto

Tim Menzies
Thomas Zimmermann
Emerson Murphy-Hill
Andrian Marcus

Summary

  • If we take a more realistic view of software engineering research, and replace the repeatability goal with one of partial reuse of reserach artifacts, then we can optimize the research process and open it up to more participants.
  • In this approach, researchers can get earlier reviews and feedback on research artifacts (some of which might be very small) before they are combined into a research paper.
  • For a list of proposed research artifacts, see here.

Details

A belief is like a guillotine, just as heavy, just as light. -- Frank Kafta

The traditional notion of the research paper is over-engineered, too elaborate and arcane, and too labor intensive to produce. As a result:

  • Our industrial partners are locked out of research debates since they do not have time to write these ridiculously elaborate papers.
  • Researchers spent far too much time writing papers, when they should be creating new results.

We should send research papers to the guillotine. Chop them up, liberate their ideas, spread them around, thus granting broader access to their various parts. Vive la révolution.

If we divide research papers into their smaller, more manageable, parts then this will revolutionize the computer science concept of a conference. Normally, researchers write papers and present them at conferences at which time most of their ideas are viewed once, then never used again. This is wrong.

Conferences should be more than about academics strutting about on stage presenting arcane research results. Rather, they should be a place to visit to find “things” that can be reused for a wide range of tasks.

Conferences should be more than about academics strutting about on stage presenting arcane research results. Rather, they should be a place to visit to find “things” that can be reused for a wide range of tasks. Out list of potentially reusable research artifacts is quite broad ranging and includes

  • Executables: e.g. standalone executable tools;
  • Non executables: + e.g. a tutorial that explains complex research results to a generalist industrial audience; + e.g. the data associated with a challenge problem that represents the state of the art in some area.

So our idea is as follows:

  • The international research community changes its review practices and allows for the seperate peer review of any or all research artifacts.
  • Researchers should write their papers in on-line, freely accessible "research repositories" (one per paper) where each repository contains the full text of their paper, as well as the scripts, tools, and other supporting materials that enable others to quickly use some or all of that work for other tasks.
  • We rush to add that it is something that many researchers are doing already. E.g. many SE researchers develop papers using on-line tools that integrate to, say, GitHub. Those researchers are already dividing their words into one repo per paper.
  • We only ask that they add to those repos the other artifacts that make that paper do-able in the first place and (possible) partially reusable by others.

Theory

For years, software engineering research has lamented the lack of repeated results. Yet the art of software engineering continues to evolve. Today we can build bigger software systems, used by more people, that run on more computers, than ever before. Why? How? We must be doing something right, even if we are not achieving the goal of repeatable results.

Perhaps we have misunderstood repeatable, at least in the context of software.

  • In the hard science world, things hardly change. So when one scientist studies a grain of grass, it is possible that previously researchers might have seen the same kinds of grass.
  • In the software world, things are softer and easier to change. So tomorrow's software applications are different to that of today. Tomorrow, when we use software, we many be using different tools, touch different users, perform different tasks, run on different platforms. So when one software scientist studies a database, it may well be that that kind of database has not been studied before.

Perhaps instead of repeatability, we need to talk about partial reusability. That is, we want to read each other's code, check out each other's systems, since it is possible that some part of what they did is relevant to what we want to accomplish today.

In this reusability approach, we take a hammer to prior products and cut them up into many pieces. These become menu items that we build new stuff by mixing and matching (or ignoring) some parts of old stuff.

To support that kind of reuse, we take a second look at research artifact. Such artifacts many be smaller or bigger than a single research paper:

  • Smaller: When an SE researcher delivers a paper, she is not reporting some complete diamond, perfect in every way, that is destroyed if any little part is removed. Rather, she is reporting on a complex combination of artifacts, any one of which might be useful in some other context.
  • Bigger: Papers are written using other artifacts scripts, data sets, tutorials downloaded from the Internet that helped the researcher understand some complex issue.

Also, some artifacts are not about the paper per se but how to manage the tools associated with that paper. For example:

  • Configuration management tools that let others build the scripts (or even compile the paper and its figures and tables) and/or update those scripts.

For a full list of the kinds of artifacts we are considering see our list of artifacts.

Implications for Publishing

Regardless of the nature of the artifact, we assert that researchers should be rewarded for building these artifacts and maintaining them in freely available on-line locations. Specifically, in an ideal world each of these artifacts should be separately accessible and reviewable by the research community. It should be possible to submit for peer review:

  • Just the motivation statement that sparked the research.
  • Just the data associated with some challenge problem.
  • Just the statistical tests used to analyze some experimental data.
  • Or any other of the artifacts listed in our list of artifacts.

Current practice is to review all of these only when they collected together in a research paper; i.e. only after teams of researchers have wasted months on:

  • Problems that the rest of the community now think are pass&execute; (e.g. the dreaded Seimans suite);
  • Using data sets that are now out-of-date (e.g. the old NASA programs from the 1908s);
  • Analyzed via statistical tests that are now considered uninformative (e.g. hypothesis testing without an effect size test).

Worse, for industrial practitioners, current practice means that they cannot join into the research discussion unless they spend far too much time writing complex papers that require multiple research artifacts:

  • To better integrate industry into research debates, we need to reduce the effort associated with the next paragraph in that debate (e.g. by allowing peer review on smaller sets of reserach artifacts).
  • Also, we need some way for industry to suggest promising directions for that research (e.g. by offering to community motivational statements and data for challenge problems).