.htaccess Files : Stop the Spam Bots

If you have an Apache Server (as you cannot do this with an IIS Server) and you host websites that hold mail addresses and you prefer for your e-mail address not to be harvested...
The definition for e-mail harvesting is as follows:

E-mail harvesting is the process of obtaining lists of e-mail addresses using various methods for use in bulk e-mail or other purposes usually grouped as spam.
 

So how do you stop this attack of your personal information well you could do many wonderful things but the simple solution I came across a couple of years ago.....create a .htacess (Windows Users will need to do this in DOS as Windows will not allow this operation)

SetEnvIfNoCase User-agent Mozilla/3.0 getout
SetEnvIfNoCase User-agent Mozilla/2.0 getout
SetEnvIfNoCase User-agent Mozilla/1.0 getout
SetEnvIfNoCase User-agent Notes getout
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=getout
</Limit>
 

How does it work?
This .htaccess works from the User Agent your browser (or Spam software) send out when connecting to a webserver so in this example I have used the module "SetEnv" which is essentially "Set Environment If" followed by that flag I am passing this module in this case "User-Agent" then I specify what user-agent I would like blocked then give this rule a condition called "getout"
So if you move to the next section you will see I Allow then Deny....so I allow from "All" sources but if the user-agent is marked as "getout" the server denies the user access to the website......
Still confused....can you tell me more....
Well....yes I can if you take the first rule I will break it down for you:

SetEnvIfNoCase  - This uses the Apache Module MOD_SETENV
 


User-agent - The tells the SetEnv module that it will use the user agent
 


Mozilla/3.0 - This tells the SetEnv module which UserAgents apply to this rule
 


getout - If you user agent matches our target UserAgent then it has a "getout" flag
 

Unlike RRas and ISA Server which start processing rules from the top and when they find a rule that satisfies the condition they apply that rule...Apache will apply the most restrictive rule but processes all rules....
I am well aware that you can chnage your User-Agent in most browsers which makes the rule useless but not all people know how to chnage the User-Agent and some spamming software does not let you change this option!