Written by Jason Eisner in 2000, for new Computer Science Ph.D. students at the University of Rochester.

How to Organize Your Files

by Jason Eisner

Keeping Track of Information Online
Version Control
Keeping Track of Paper

Here are some suggestions about how to organize your work during your research career. Workstyles vary, and you will develop your own organizing system over time. But these are techniques that I personally have found to be helpful.

Be careful, though - what really matters is the work itself, so don't spend all your time getting more organized than necessary! Hacking your environment sometimes pays off, but it's sometimes a form of procrastination.

Keeping Track of Information Online

Although paper has its uses, electronic files have at least four advantages over paper: easy to edit, readable from anywhere with a computer, harder to lose, and searchable. (If I can't find an old message, I just use grep to search all the email I've written and received for the past 10+ years. It's fast!)

Partly for this reason, you may spend much of your workday editing files in Emacs. Obviously, you should organize your files into some sensible directory structure (see below). But where do you put a miscellaneous idea or piece of information so you don't forget about it? Some tricks for filing miscellaneous information:

Put it in a special file. I have a whole directory (called jot) of special files where I record particular kinds of information. I even set up special keys in my editor to pull up these files quickly. Here are some that you might consider for yourself:
- bib: A list of almost every paper I've read, with the citation and URL, my summary of the technical ideas that make the paper interesting and/or flawed, and any ideas I have for future work. I can't overemphasize how helpful this is! Keeping such a file will help you find and remember papers instantly, and writing the comments will make you read them better in the first place. (And if you write some of the citations in BibTeX form, then you can automatically turn them into bibliography entries in a variety of formats.)
- hacks: short snippets of code (usually shell pipes or 3-line-perl scripts) that might come in handy again. Copying them into this file with a brief comment is easier than turning them into full-fledged programs. This is also a good place to jot down the names of new commands as you learn them.
- ideas: ideas for new research projects. Very important!
- done: it's good to keep a list of your accomplishments (especially the ones that aren't on your resume yet). You might be asked for this by your advisor, PAS, your tenure committee ...
- answer: a list of email that I have to send (mostly replies). Most people seem to keep unanswered email in their inbox to remind them to answer it, but in practice these messages get buried under new mail and forgotten. Better to file them right away and make a note to reply.
- learn: Things I need to learn about when I get some free time.
- log: when I'm falling behind schedule, I use this file for a few days to jot down exactly how I'm spending every 15-minute block of the day. This discourages me from wasting time and helps me figure out how to get back on track.
- todo: to-do list. I try to keep all "to do" lists to one line per item. Then they are easy to take in at a glance and easy to reprioritize. (Mail folders also have this property but allow longer items; see below.) You may want to make a habit of skimming your to-do list at a fixed time every day, for example when you arrive or leave.
- Other idiosyncratic files, for phone numbers, linguistically interesting sentences that I might use as examples someday, movies I've seen, recommended reading, stuff I want to fix in my computing environment someday, etc.
Mail it to yourself. This can be a wonderful way to record miscellany. The message will automatically be stamped with a time and date, and you probably already have a system for filing it. (For example, mail to yourself about project X can be filed with your other project X mail.) Mail folders are
- easy to sort,
- easy to search, and
- easy to browse by subject line.
Try to configure your mailer so that right at the time you send the message, you can specify which folder to file a copy in. For example, Rmail lets you put a special Fcc: line in the header, and mail lets you specify files (not just addresses) as "recipients" of the mail.

Note: If your mailer does let you keep a copy of your outgoing mail, then you don't really want to send to yourself - that would give you two copies. Instead, send to nobody at all, by using mail /dev/null or by deleting the To: line in the mail header.
Put it where you'll notice it at the right time. Every paper or program I write has its own directory. As I work, I tend to add and edit special files whose names are in all-caps, so that they will jump out from a directory listing. For example:
- TO-DO reminds me of changes I should make. (Even after the work is published, this file is the first thing I'll look at if I start a related project two years later.)
- HOW-TO lists any tricks necessary to compile or use the code (very useful if the project has been on the back burner for a bit).
- LOGBOOK describes results I've obtained and how I got them.
- ACKNOWLEDGE lists people whose help should be acknowledged in the writeup.
- SEND-TO might list people who have asked for a copy of the work "when it's done."
Put it on your electronic calendar so that you'll get a reminder at the right time. You can manage your calendar with GUI programs like dtcm and cm, or text-based programs like calendar and the Emacs diary package. (Use man for information on the first three, and info emacs diary for the last.)
Embed it into another document as a comment. My programs (and documents) tend to be full of comments about changes I'd like to make, notes on how to explain a particular algorithm in the writeup, etc. These are placed in the code where I'll notice them. I also mark these comments with special symbols like !!! so that I can search for them.
Use version control. As you revise a program or document, you may want to keep track of your thought process - the changes you made and why. Looking at this record later can help you explain the work, find bugs, and understand your own work style. The best place for such comments is the log maintained by a version-control program like rcs or cvs, as discussed below.

As for general directory organization, do it any way you like. However, for consistency, you may want to name some subdirectories of your home directory (whose name can be abbreviated as ~) according to general Unix conventions that are followed in other parts of the Unix file system:

~/bin to hold utility programs (as opposed to programs that are associated with any project); this directory usually goes on your search path.
~/lib for resource files needed by a software package; for example, ~/lib/tex for latex .sty style files, ~/lib/emacs for Emacs .el libraries.
~/pkg for third-party software packages you've installed under your own account.
~/tmp for temporary files that you might have to create; putting them here will remind you that you can and should delete them. (There's also a global directory /tmp, but it is actually a special disk used for virtual memory, so filling it up with files can make the system run out of swap space and crash!)

You will probably organize your more personal files into subdirectories under ~/proj, ~/teach, ~/mail, etc.

Version Control

This section is a bit out of date. Nowadays, you might consider using Subversion instead. I should add that Subversion (or CVS) is essential for collaboration -- working together on code or on papers. You and your collaborators need the freedom to work at the same time, to comment on each other's work, and to drop in ideas and suggestions.

Version-control software manages multiple versions of a file and information about those versions, without wasting space or cluttering up your directory.

How it works: Basically, it maintains a special write-protected file that contains both the latest version and a list of changes needed to automatically reconstruct the previous version, the version before that, etc. So it doesn't have to store the full text of every version. The special file also contains version numbers/dates/times and your log notes that describe why you made the changes.

When you're working on a program, using version control will help you do the following:

figure out how you broke your program that was running 10 min. ago
compare the output of different versions
keep a log of changes (you enter a message every time you complete a new version)
remember to make changes in logical groups
create new versions very frequently - makes it easy to undo stuff
repair accidental damage to your file that you didn't notice till several versions later
feel safe about modifying working code
feel safe about deleting or replacing stuff you might need again someday
collaborate with other people on a project where you are all modifying the same files

I actually use version-control for many documents and web pages as well as programs. Almost all of the same reasons apply, plus there may be several "published" versions that you want to keep without wasting space - the term paper, the conference submission, the conference publication, the reformatted web version of the conference publication, and the corrected web version you put up 2 months later.

The simplest version-control tool is RCS. Its most important commands are ci, co, and rcs (and some further utilities are listed on the man pages for these). CVS (command cvs) is the tool for more intensive collaborative development - e.g., open-source projects. CVS is built on top of RCS, but has additional support for multi-file projects. It expects multiple users to work on the same file at the same time, and can usually reconcile their changes.

There are web tutorials on RCS and CVS that give a much clearer and quicker conceptual introduction than the man pages. Try to find one with hands-on examples.

I almost never use RCS and CVS commands directly - instead I use Emacs commands that invoke them. Emacs has great support for version control! So see if you can get away with just reading and using the Emacs stuff. Use info emacs version for documentation, or search the web.

Keeping Track of Paper

Even though you're a computer scientist, you will have to deal with at least four kinds of paper:

Administrivia

You probably know how to organize this into file folders already.

Your own papers and presentations

Keep a folder with a couple of copies of all your papers. When you go to a conference, bring this folder so you can give your work to interested parties. Of course, you will also want to put your papers on the web!

Your own technical notes

Keep these organized enough that you can find and follow them 2 years later. Date the pages. If they're loose, also number the pages and staple them together, then file them in a folder associated with the appropriate project.

You may also write some of your technical notes online, of course. But complicated formulas are easier to write by hand. Also, some people find that they can think more freely when they write by hand. They are not as tense about making a mistake because the handwritten version is certainly not the final version. (If you prefer to type, using a "temporary file" can have the same relaxing effect.)

Some people like to write notes by hand, but then scan them in using a scanner (or scanning photocopier). The scanner may be able to mail you the document. If not, you'll get a file that you can either store or mail to yourself.

Research articles

When you think about how to organize your collection of hardcopy articles, keep in mind:

You will read a lot of research papers in grad school.
As you accumulate lots of papers in your specialty, organizing by sub-subtopic may not work well. A given paper might fall into three categories because it combines two techniques and applies them to a particular problem.
You might not have hardcopy for all papers. (You might read on the screen, or you might recycle the printout and keep only the URL.)

Libraries have the same problems. Their solution: a single organization for hardcopy, together with a catalog that lets you search by multiple criteria. In the same way, I just alphabetize my hardcopy papers by first author, but keep an online list of all the papers I've read, with easy-to-search keywords and comments.

With this hybrid approach, it can also work to organize hardcopy by when or why you got it, rather than alphabetically.

This page online: http://cs.jhu.edu/~jason/advice/how-to-organize-your-files.html

Jason Eisner - jason@cs.jhu.edu (suggestions welcome)

Last Mod $Date: 2006/10/17 01:49:56 $