The Email, Link and Text Protector encodes any text into entity syntax. The output is meant to be used in a web page, in order to make it more difficult for spam-harvesting spiders to extract the original content. The encoder was originally designed to be used on email addresses. It is not impossible to harvest emails encoded with this method - just slightly more difficult.
The principle is simple. Characters are encoded using the entity syntax &#xxx; where xxx is the numeric ASCII value for a character. For example abc would encode to a b c. You can find more information about character encoding at w3.org.
This script will do the same thing as the email encoder (which I discovered through Jim Kent's wife's, Heidi, web site). Another such service is provided by PCNet. The email protector here has these additional enhancements.
By using a hide ratio less than 1, some characters are left unencoded. This results in a partially encoded text which may foil harvesting spiders which look for text which is fully encoded. By switching the encoding between hex and dec for each token it is possible to make parsing more difficult.
If you are interested in fighting spam and have your own web site, check out my Spider Catcher, the generalized email and web page faker designed to trap spiders and pollute their email databases. The Spider Catcher uses Markov chains and a babelizer to display realistic text while producing bogus, but authentic-looking, email addresses.
Perl subroutine to encode a single character.
sub encode {
my $token = shift;
my $randompad = shift;
my $randombase = shift;
my $MAXPAD = 3;
my $ord = ord($token);
my $format;
my $format_prefix = "";
my $format_digit = "%d";
my $format_suffix = ";";
# random padding
if ($randompad) {
my $thispad = int(rand($MAXPAD));
$format_digit = sprintf("%%0%dd",3+$thispad) if $thispad;
}
# switch base
if ($randombase && rand() < 0.5) {
# we want hex encoding now
$format_digit =~ s/d/x/;
$format_prefix = "";
}
$format = sprintf("%s%s%s",$format_prefix,$format_digit,$format_suffix);
return sprintf($format,$ord);
}