Written by Jason Eisner in 2000, for new Computer Science Ph.D. students at the University of Rochester.

How to Organize Your Files

by Jason Eisner

Here are some suggestions about how to organize your work during your research career. Workstyles vary, and you will develop your own organizing system over time. But these are techniques that I personally have found to be helpful.

Be careful, though - what really matters is the work itself, so don't spend all your time getting more organized than necessary! Hacking your environment sometimes pays off, but it's sometimes a form of procrastination.

Keeping Track of Information Online

Although paper has its uses, electronic files have at least four advantages over paper: easy to edit, readable from anywhere with a computer, harder to lose, and searchable. (If I can't find an old message, I just use grep to search all the email I've written and received for the past 10+ years. It's fast!)

Partly for this reason, you may spend much of your workday editing files in Emacs. Obviously, you should organize your files into some sensible directory structure (see below). But where do you put a miscellaneous idea or piece of information so you don't forget about it? Some tricks for filing miscellaneous information:

As for general directory organization, do it any way you like. However, for consistency, you may want to name some subdirectories of your home directory (whose name can be abbreviated as ~) according to general Unix conventions that are followed in other parts of the Unix file system:

You will probably organize your more personal files into subdirectories under ~/proj, ~/teach, ~/mail, etc.

Version Control

This section is a bit out of date. Nowadays, you might consider using Subversion instead. I should add that Subversion (or CVS) is essential for collaboration -- working together on code or on papers. You and your collaborators need the freedom to work at the same time, to comment on each other's work, and to drop in ideas and suggestions.

Version-control software manages multiple versions of a file and information about those versions, without wasting space or cluttering up your directory.

How it works: Basically, it maintains a special write-protected file that contains both the latest version and a list of changes needed to automatically reconstruct the previous version, the version before that, etc. So it doesn't have to store the full text of every version. The special file also contains version numbers/dates/times and your log notes that describe why you made the changes.

When you're working on a program, using version control will help you do the following:

I actually use version-control for many documents and web pages as well as programs. Almost all of the same reasons apply, plus there may be several "published" versions that you want to keep without wasting space - the term paper, the conference submission, the conference publication, the reformatted web version of the conference publication, and the corrected web version you put up 2 months later.

The simplest version-control tool is RCS. Its most important commands are ci, co, and rcs (and some further utilities are listed on the man pages for these). CVS (command cvs) is the tool for more intensive collaborative development - e.g., open-source projects. CVS is built on top of RCS, but has additional support for multi-file projects. It expects multiple users to work on the same file at the same time, and can usually reconcile their changes.

There are web tutorials on RCS and CVS that give a much clearer and quicker conceptual introduction than the man pages. Try to find one with hands-on examples.

 I almost never use RCS and CVS commands directly - instead I use Emacs commands that invoke them. Emacs has great support for version control! So see if you can get away with just reading and using the Emacs stuff. Use info emacs version for documentation, or search the web.

Keeping Track of Paper

Even though you're a computer scientist, you will have to deal with at least four kinds of paper:

Administrivia
You probably know how to organize this into file folders already.
Your own papers and presentations
Keep a folder with a couple of copies of all your papers. When you go to a conference, bring this folder so you can give your work to interested parties. Of course, you will also want to put your papers on the web!
Your own technical notes
Keep these organized enough that you can find and follow them 2 years later. Date the pages. If they're loose, also number the pages and staple them together, then file them in a folder associated with the appropriate project.

You may also write some of your technical notes online, of course. But complicated formulas are easier to write by hand. Also, some people find that they can think more freely when they write by hand. They are not as tense about making a mistake because the handwritten version is certainly not the final version. (If you prefer to type, using a "temporary file" can have the same relaxing effect.)

Some people like to write notes by hand, but then scan them in using a scanner (or scanning photocopier). The scanner may be able to mail you the document. If not, you'll get a file that you can either store or mail to yourself.

Research articles
When you think about how to organize your collection of hardcopy articles, keep in mind:

Libraries have the same problems. Their solution: a single organization for hardcopy, together with a catalog that lets you search by multiple criteria. In the same way, I just alphabetize my hardcopy papers by first author, but keep an online list of all the papers I've read, with easy-to-search keywords and comments.

With this hybrid approach, it can also work to organize hardcopy by when or why you got it, rather than alphabetically.


This page online: http://cs.jhu.edu/~jason/advice/how-to-organize-your-files.html
Jason Eisner - jason@cs.jhu.edu (suggestions welcome) Last Mod $Date: 2006/10/17 01:49:56 $