Let me explain a bit.

To exclude all robots, that respect the robots.txt file:

User-agent: *
Disallow: /

To exclude just one directory and its subdirectories, say, the /aggregator/ directory:

User-agent: *
Disallow: /aggregator/

To  disallow specific robots you need to know what it calls itself, ia_archiver is the wayback machine


To allow the Internet Archive bot you'd make a line like this:

User-agent: ia_archiver
Disallow:

To block ia_archiver from visiting:

User-agent: ia_archiver
Disallow: /

You can have as many lines like this as you want. So you can disallow all robots from everywhere, and then allow only those you want. You can block certain robots from certain parts. You can block directories and sub directories or individual files.. If you have numerous "aggregator" files in various subdirectories you want to block you need to list them all.

Like this:

User-agent: *
Disallow:/aggregator/
Disallow:/foo/aggretator/
...
Disallow:/hidden/aggregator/

Your syntax looks wonky, missing the final "/".
User-agent tells who to block and Disallow what to block. This all assumes well behaved robots. This file is useless for those that ignore this file. It is not a security device, just a polite sticky note.

You might go here for more detailed info. I'm no expert for sure.

http://www.robotstxt.org/orig.html

Jack

--- On Sun, 1/16/11, Jonathan Hutchins <hutchins@tarcanfel.org> wrote:

From: Jonathan Hutchins <hutchins@tarcanfel.org>
Subject: robots.txt question
To: "KCLUG (E-mail)" <kclug@kclug.org>
Date: Sunday, January 16, 2011, 12:53 PM

I'm wondering about the syntax.  The example file from drupal uses the format

Disallow: /aggregator

However, it says in the comments that only the root /robots.txt file is valid. 

From my understanding of the syntax, /aggregator does not
block /foo/aggregator, so I need to either prepend "/foo" to everything, or
use wildcards per the new google/webcrawler extensions to the protocol.

If anybody can cite an on-line example that explains I'd be grateful.