Write the Paper First

by Jason Eisner (2010)

If you're planning to submit a conference paper, I'd like to strongly suggest that you spend the next few days just writing the paper (even if you haven't yet planned or finished the experiments).

The deadline is a few weeks or months away. You can spare a few days now just to focus hard on the writing. Here are several reasons why ...

Writing is the best use of limited time

If you have to choose between clear writing and extensive experiments, pick writing:

Clear writing increases your odds of acceptance

Clear motivation and exposition are more important than results for getting your paper accepted. If you run out of time, it is better to have a great story with incomplete experiments than a sloppy draft with complete experiments. A good paper builds its case with the accumulated weight of several experiments, so missing a few is not fatal (and you can finish them for the camera-ready version). But a confusing, unconvincing, or incomplete writeup is fatal.

A well-known senior academic told me that they review papers by reading as far as they can understand, then assigning a score based on how far they got. I don't recommend that, but it is exactly how editors review manuscripts in the publishing world. A friend in my wife's book club has trouble finishing novels, because all her training as an editor says to make a decision by page 40 and move on.

But let's assume you are aiming for both clear writing and complete experiments. Suppose you have to finish the paper on no sleep. Which would you rather have left for the last minute, writing or hacking?

Hint #1: You can hack on little sleep, if you know what needs to be hacked. But you can't write effectively. Writing involves many big and small decisions, which will seem insurmountable when you're exhausted and panicked.
Hint #2: Five students can be hacking in parallel at the last minute, but they can't all be co-writing with me at the last minute. You have your own computer but you have to share your advisor.

Clear writing increases the influence of your published paper

There are two reasons a paper will be cited.

If you have a great implementation that people can just use as a black box, then they'll download it, use it, and cite you. (At least until someone builds a better black box.)
Otherwise, your paper is only useful for the ideas that it provides (e.g., well-explained reusable techniques).

The latter case is more frequent—and there, the writeup is the research (just as in History or Literature). The ideas in your paper will live on if they get picked up and used in other systems. They have to be clear and convincing.

Aren't the experiments important too? Yes, but mainly to persuade people that the ideas are worth picking up, and to illustrate how they work out in practice. They merely demonstrate whether your method worked as you claim. Certainly experiments take a lot of work—and you should be careful, painstakingly honest, thoughtful in your analysis of results, and willing to give out your code. But other people will often prefer to write their own code based on your writeup. Your code usually dies; your writeup needs to live.

Of course, the experiments are important in their own right if the point of the paper is to disprove the existence of Proto-Indo-European or demonstrate that MySpace users are functionally illiterate. That's actual science where the experiments tell you something about the world.

But for most of our papers, the experiments only tell us whether a particular method achieved high performance for a particular problem on a particular dataset. Your followers will actually be applying variants and extensions of your method to different problems and data. So they will care mostly about the method, as long as the results look okay.

The document is a focus for discussion with others

A draft is concrete, visible, and understandable by others, so it is something that you and others can discuss, debate, and improve. If you have a paper draft early, then you can give it to other people (including me) for feedback. And we can improve the paper together via a git or svn repository and its issue tracker.

Your draft can describe your motivation, formal problem, model, algorithms, and experiments before you actually build anything. (Ideally, it will also explain why you did it this way rather than some other way, and point out gaps that remain for future work.) By showing others the draft at this stage, you'll get important feedback before you invest time in the "wrong" work.

If you run into trouble while doing the work, then I may have difficulty diagnosing or even understanding the problem you are facing. We may waste a whole meeting just getting aligned. But if you can show me a precise writeup of what you're doing, then I can be much more helpful. In general, meetings are very productive when we have a concrete document in front of us.

Of course, there are many kinds of concrete documents. You could instead write up the notes from our (usually extensive) discussions as private documents for further discussion. Still, writing up in the form of a final paper makes you (1) integrate everything in one place, (2) decide which ideas will be made central for this paper, and (3) focus on the coherence and impact of the end product. (Additional discussion and brainstorming can still go into the document—those subsections can be cut or moved to an appendix for the conference paper, but retained for a longer version as a tech report, journal article, or dissertation.)

I look forward to seeing what you write!

Writing is a mechanism for planning what to work on

Is it a good topic?

Most readers decide whether they like a paper before they get to the experimental results. Show people a draft of the first 2-3 pages before you do any experiments. If they tell you that your basic idea will evoke skepticism or yawns from reviewers, it's better to know that before you waste your time.

(You might drop the paper, fix the basic idea, or in some cases pick another conference. If you drop it, the writeup will still come in handy in a year or two when you realize that the idea could be applied elsewhere.)

What needs to be done?

Assuming the idea is indeed a good one, then writing a draft makes you sharpen the message of the paper. Then you can figure out what work needs to be done to support exactly that message.

Your introduction will make some claims. Often you will realize that some interesting additional experiment would really test those claims.
Writing the literature review will help you design your experiments. It may influence what datasets you use, what comparisons you do, and what you are trying to prove you can do that other people can't.

However, don't write the lit review first. Write out your own ideas before comparing them with the literature. This increases the chance that you'll find new angles on the problem.
Writing the experimental section is possible even before you've done the experiments. Explain the full experimental design. Make empty result tables with row and column headers and explanatory captions. Make empty graphs with axes and captions. The actual results will be missing, but it will be clear what work is necessary.

If you are having trouble switching your brain from experiments to writing, just write up your current experiments first. Although writing an intro will also help, so you can explain why you are doing the experiments.

Honest writing may lead you to realize that proving your point requires more work than you'd thought (which is why you'd better write early). Remember Richard Feynman's classic speech on how to do honest science: "The first principle is that you must not fool yourself—and you are the easiest person to fool."

What will fit?

Writing first also helps you gauge the scope of the paper. You may discover that you can't fit your intended presentation into the page limits. That may cause you to divide up the work into two or more papers, each of which has to make a separate clear point. So it will greatly affect your planning.

Alas, I often try to present my whole story at once, so my papers try to force a journal paper's worth of work into a conference-size spoon. Too dense and indigestible. Better to publish a few amuse-bouches. Or maybe the conferences will start allowing us to give talks about work that we've published in journals.

The document is an organizing scheme for your work on the paper

Your first step is to outline the paper. Download the paper template from the conference website. Come up with a good title (and an abstract if you like), and write the section/subsection headers.

Simon Peyton-Jones, who also believes you should write the paper first, has a terrific presentation (video) about writing great papers and how to organize them. Rachel Howard talks about starting by sketching the key thing that you want to convey—"gesture writing," by analogy with "gesture drawing."

This file will turn into your final document. Everything you do from now on should be focused on improving it!

Probably you should flesh out important sections early on, for all the reasons above. (Add the introduction/motivation, a good example, the method description, and an experimental design.)
But your first draft won't be perfect. As your thinking evolves, you'll go back and change the document. You may even change basic aspects of the approach. That's okay. The document is a record of your current thinking, ready to show to colleagues at any time, and nearly ready to submit.

Any new ideas need to end up in the document. So whenever you have a thought about the project, add it immediately into the appropriate section of the paper. Sometimes you'll add it by gracefully editing the current writeup; but more often, you'll just insert a note to worry about later. Placing these notes in the outline will keep your ideas organized and keep them alive until they are dealt with. (Email or face-to-face discussions will get lost unless you add notes about them!)

I usually define macros in the preamble to support notes:

\usepackage[usenames,dvipsnames,svgnames,table]{xcolor}  % allows better color names
\usepackage[]{todonotes}                                 % use option "disable" to suppress notes
\makeatletter
\newcommand*\iftodonotes{\if@todonotes@disabled\expandafter\@secondoftwo\else\expandafter\@firstoftwo\fi}  
\makeatother
\newcommand{\noindentaftertodo}{\iftodonotes{\noindent}{}\ignorespaces}

% Marginal notes color-coded by author.  They accept optional args like size=\small, bordercolor=red, etc.
\newcommand{\jason}[2][]{\note[#1]{jason}{green!40}{#2}} 
\newcommand{\noam}[2][]{\note[#1]{noam}{orange!40}{#2}}  

% Capitalized versions of the macros put the note in the text instead of in the margin.  
% That's useful for long notes or notes in floating environments.
\newcommand{\Jason}[2][]{\jason[inline,#1]{#2}}
\newcommand{\Noam}[2][]{\noam[inline,#1]{#2}}

% When responding to a note, append "\response{myname} My thoughts" inside the note.
\newcommand{\response}[1]{\vspace{3pt}\hrule\vspace{3pt}\textbf{#1:}}

The main goal of your experiments is to produce tables and graphs for the document. These should be produced and included automatically, with minimal fuss and minimal opportunity for human error.

For example, write a Makefile or zymake file that will run the experiments. Among other results, this should produce files containing your final tables (as LaTeX source) and graphs (e.g., as PNG images). Your LaTeX document will \input the tables and \includegraphics the graphs.

To print R tables as LaTeX, try the xtable package (examples). Also see reporttools, which uses xtable to print summary statistics.

This approach allows you to view and share the current results at any time. For your own understanding, you may want to run many more experiments than can be included in the paper. In this case, make a separate "experimental logbook" document that includes and discusses the results of all the experiments. This longer document can also be viewed and shared at any time.
Every paper should have at least one nice picture. For now, don't waste time on making it pretty. It's okay if your diagrams start out as scanned-in scribbles, or as verbal descriptions of what you will draw. Make them pretty later on, sometime when you need a break, but only once you're sure they're in final form.

Some reasonable options for final figures are PGF/TikZ (for beautiful TeX graphics: use directly or from R or gnuplot), graphviz (for automatic graph layout), and manual drawing programs such as PowerPoint, OmniGraffle, MyPaint, or Inkscape.

The document is like a code specification

Writing is a form of thinking and planning. Writing is therefore part of the research process—just as it is part of the software engineering process. When you write a research paper, or when you document code, you are not just explaining the work to other people: you are thinking it through for yourself.

Feynman was a truly great teacher. He prided himself on being able to devise ways to explain even the most profound ideas to beginning students. Once, I said to him, "Dick, explain to me, so that I can understand it, why spin one-half particles obey Fermi-Dirac statistics." Sizing up his audience perfectly, Feynman said, "I'll prepare a freshman lecture on it." But he came back a few days later to say, "I couldn't do it. I couldn't reduce it to the freshman level. That means we don't really understand it." —David Goldstein

As you know, it's bad practice to document code after you write it. The recommended sequence for a coding project is something like this:

Sell the project to management: you need a big picture.

[Analogy: This is the abstract and introduction to your paper. You could even start by mocking up some talk slides that you could use to present your paper.]
Write a spec that sketches the major ideas behind the code design, and describes how the different components will fit together. The spec will also develop some terminology and notation that you will use throughout the spec and also in the code.

[Analogy: The earlier sections of the paper, where you give intuitions, terminology, and notation. Again, you will use the terminology and notation as you code up your experiments. If you code first and only later on figure out how to present things clearly, then your code will be harder to follow and won't match the paper.]
Keep improving and refactoring your code design.

[Analogy: You refine your thinking by editing your prose until it is clear and convincing.]
For each class, method, etc., write header comments that precisely describe its required behavior.

[Analogy: Later sections of the paper, where you flesh out details of the method and experiments.]
Once you have precisely described what a class or method should do, you can write commented code to implement exactly that description.

[Analogy: This is where you carry out the claimed experiments. Notice that it comes late in the process.]
Test the code!

[Analogy: This is where other people read the paper and you see what goes wrong with your carefully crafted exposition.]

Of course, neither coding nor research is purely top-down—in practice, there's feedback. Just as coding will make you rethink parts of the spec, certainly experimentation will make you revise parts of the paper. But crucially, you'll keep the code and the paper in sync.

Writing now is a favor to yourself

You'll feel so much better once you have a draft! The looming deadline will not be nearly so stressful. Pull your all-nighters now (on a self-imposed draft deadline), not in the days leading up to the submission deadline.

So get to it, feel virtuous, and have fun! I'll be happy to comment, correspond, or collaborate along the way—just point me to your git repository.

This page online: http://cs.jhu.edu/~jason/advice/write-the-paper-first.html

Jason Eisner - jason@cs.jhu.edu (suggestions welcome)

Last Mod $Date: 2023/07/18 22:16:23 $