Written by Jason Eisner in 2000, for new Computer Science Ph.D. students at the University of Rochester.
Here are some suggestions about how to organize your work during your research career. Workstyles vary, and you will develop your own organizing system over time. But these are techniques that I personally have found to be helpful.
Be careful, though - what really matters is the work itself, so don't spend all your time getting more organized than necessary! Hacking your environment sometimes pays off, but it's sometimes a form of procrastination.
Although paper has its uses, electronic files have at least four advantages over paper: easy to edit, readable from anywhere with a computer, harder to lose, and searchable. (If I can't find an old message, I just use grep to search all the email I've written and received for the past 10+ years. It's fast!)
Partly for this reason, you may spend much of your workday editing files in Emacs. Obviously, you should organize your files into some sensible directory structure (see below). But where do you put a miscellaneous idea or piece of information so you don't forget about it? Some tricks for filing miscellaneous information:
Put it in a special file. I have a whole directory (called jot) of special files where I record particular kinds of information. I even set up special keys in my editor to pull up these files quickly. Here are some that you might consider for yourself:
bib: A list of almost every paper I've read, with the citation and URL, my summary of the technical ideas that make the paper interesting and/or flawed, and any ideas I have for future work. I can't overemphasize how helpful this is! Keeping such a file will help you find and remember papers instantly, and writing the comments will make you read them better in the first place. (And if you write some of the citations in BibTeX form, then you can automatically turn them into bibliography entries in a variety of formats.)
hacks: short snippets of code (usually shell pipes or 3-line-perl scripts) that might come in handy again. Copying them into this file with a brief comment is easier than turning them into full-fledged programs. This is also a good place to jot down the names of new commands as you learn them.
ideas: ideas for new research projects. Very important!
done: it's good to keep a list of your accomplishments (especially the ones that aren't on your resume yet). You might be asked for this by your advisor, PAS, your tenure committee ...
answer: a list of email that I have to send (mostly replies). Most people seem to keep unanswered email in their inbox to remind them to answer it, but in practice these messages get buried under new mail and forgotten. Better to file them right away and make a note to reply.
learn: Things I need to learn about when I get some free time.
log: when I'm falling behind schedule, I use this file for a few days to jot down exactly how I'm spending every 15-minute block of the day. This discourages me from wasting time and helps me figure out how to get back on track.
todo: to-do list. I try to keep all "to do" lists to one line per item. Then they are easy to take in at a glance and easy to reprioritize. (Mail folders also have this property but allow longer items; see below.) You may want to make a habit of skimming your to-do list at a fixed time every day, for example when you arrive or leave.
Other idiosyncratic files, for phone numbers, linguistically interesting sentences that I might use as examples someday, movies I've seen, recommended reading, stuff I want to fix in my computing environment someday, etc.
Mail it to yourself. This can be a wonderful way to record miscellany. The message will automatically be stamped with a time and date, and you probably already have a system for filing it. (For example, mail to yourself about project X can be filed with your other project X mail.) Mail folders are
Try to configure your mailer so that right at the time you send the message, you can specify which folder to file a copy in. For example, Rmail lets you put a special Fcc: line in the header, and mail lets you specify files (not just addresses) as "recipients" of the mail.
Note: If your mailer does let you keep a copy of your outgoing mail, then you don't really want to send to yourself - that would give you two copies. Instead, send to nobody at all, by using mail /dev/null or by deleting the To: line in the mail header.
Put it where you'll notice it at the right time. Every paper or program I write has its own directory. As I work, I tend to add and edit special files whose names are in all-caps, so that they will jump out from a directory listing. For example:
TO-DO reminds me of changes I should make. (Even after the work is published, this file is the first thing I'll look at if I start a related project two years later.)
HOW-TO lists any tricks necessary to compile or use the code (very useful if the project has been on the back burner for a bit).
LOGBOOK describes results I've obtained and how I got them.
ACKNOWLEDGE lists people whose help should be acknowledged in the writeup.
SEND-TO might list people who have asked for a copy of the work "when it's done."
Put it on your electronic calendar so that you'll get a reminder at the right time. You can manage your calendar with GUI programs like dtcm and cm, or text-based programs like calendar and the Emacs diary package. (Use man for information on the first three, and info emacs diary for the last.)
Embed it into another document as a comment. My programs (and documents)
tend to be full of comments about changes I'd like to make, notes on how
to explain a particular algorithm in the writeup, etc. These are placed
in the code where I'll notice them. I also mark these comments with special
symbols like !!!
so that I can search for them.
Use version control. As you revise a program or document, you may want to keep track of your thought process - the changes you made and why. Looking at this record later can help you explain the work, find bugs, and understand your own work style. The best place for such comments is the log maintained by a version-control program like rcs or cvs, as discussed below.
As for general directory organization, do it any way you like. However, for consistency, you may want to name some subdirectories of your home directory (whose name can be abbreviated as ~) according to general Unix conventions that are followed in other parts of the Unix file system:
You will probably organize your more personal files into subdirectories under ~/proj, ~/teach, ~/mail, etc.
This section is a bit out of date. Nowadays, you might consider using Subversion instead. I should add that Subversion (or CVS) is essential for collaboration -- working together on code or on papers. You and your collaborators need the freedom to work at the same time, to comment on each other's work, and to drop in ideas and suggestions.
Version-control software manages multiple versions of a file and information about those versions, without wasting space or cluttering up your directory.
How it works: Basically, it maintains a special write-protected file that contains both the latest version and a list of changes needed to automatically reconstruct the previous version, the version before that, etc. So it doesn't have to store the full text of every version. The special file also contains version numbers/dates/times and your log notes that describe why you made the changes.
When you're working on a program, using version control will help you do the following:
I actually use version-control for many documents and web pages as well as programs. Almost all of the same reasons apply, plus there may be several "published" versions that you want to keep without wasting space - the term paper, the conference submission, the conference publication, the reformatted web version of the conference publication, and the corrected web version you put up 2 months later.
The simplest version-control tool is RCS. Its most important commands are ci, co, and rcs (and some further utilities are listed on the man pages for these). CVS (command cvs) is the tool for more intensive collaborative development - e.g., open-source projects. CVS is built on top of RCS, but has additional support for multi-file projects. It expects multiple users to work on the same file at the same time, and can usually reconcile their changes.
There are web tutorials on RCS and CVS that give a much clearer and quicker conceptual introduction than the man pages. Try to find one with hands-on examples.
I almost never use RCS and CVS commands directly - instead I use Emacs commands that invoke them. Emacs has great support for version control! So see if you can get away with just reading and using the Emacs stuff. Use info emacs version for documentation, or search the web.
Even though you're a computer scientist, you will have to deal with at least four kinds of paper:
You may also write some of your technical notes online, of course. But complicated formulas are easier to write by hand. Also, some people find that they can think more freely when they write by hand. They are not as tense about making a mistake because the handwritten version is certainly not the final version. (If you prefer to type, using a "temporary file" can have the same relaxing effect.)
Some people like to write notes by hand, but then scan them in using a scanner (or scanning photocopier). The scanner may be able to mail you the document. If not, you'll get a file that you can either store or mail to yourself.
Libraries have the same problems. Their solution: a single organization for hardcopy, together with a catalog that lets you search by multiple criteria. In the same way, I just alphabetize my hardcopy papers by first author, but keep an online list of all the papers I've read, with easy-to-search keywords and comments.
With this hybrid approach, it can also work to organize hardcopy by when or why you got it, rather than alphabetically.
http://cs.jhu.edu/~jason/advice/how-to-organize-your-files.html
Jason Eisner - jason@cs.jhu.edu (suggestions welcome) | Last Mod $Date: 2006/10/17 01:49:56 $ |