The early implementations of Wpoison raised of couple of very valid safety concerns. These were expressed to the author by early users of Wpoison and they have now been largely eliminated. Two problems, in particular, were obvious from the beginning.

The first problem was the potentially bad effects that Wpoison might have on legitimate web crawlers, such as those used by the major web search engine companies, and the related secondary negative effects which those primary negative effects might have on any Wpoison user site which had high hopes of being properly (and prominently) cataloged by the major search engine companies.

This problem was trivially eliminated by including code in Wpoison which causes each Wpoison-generated randomized web page to carry a clear indication (for the benefit of legitimate web crawlers) that the page in question should not be cataloged in any way. Basically, Wpoison now merely makes proper use of the (pre-existing) Robot Exclusion Protocol. Use of this protocol, and its associated ``off limits'' markers, within all web pages generated by Wpoison serves to insure both that (a) legitimate[1] web crawlers will not get all caught up in repeatedly reading thousands (or millions) of randomized garbage pages generated by Wpoison and that (b) the legitimate search engine companies will still be able to successfully add your web site to their data bases.

The second problem was the potentially bad effects that having a locally installed copy of Wpoison might have on one's own CPU and bandwidth usage. Obviously, given the nature of how Wpoison actually works, it can easily be seen that (unless something is done to prevent it) the evil spammer address harvesting web crawlers may get trapped by Wpoison (as intended) but that then, they might begin to access your installed copy of Wpoison over and over again (as intended) perhaps even to such an extent that they end up using up most/all of your available CPU cycles and/or most/all of your available network bandwidth.

This problem also was solved in a fairly trivial and straightforward way. In a nutshell, just prior to the time it generates the very tail end of any one of its randomly-generated pseudo web pages, Wpoison pauses for several seconds. It just does nothing (other than wasting time) during those several seconds.

The effect of these calculated pauses is that they insure that any address harvesting web crawlers that may be diligently attempting to suck as many Wpoison-generated web pages out of your site as fast as possible will in fact only be able to suck pages out at a reasonable and moderate pace which will not have any sustained dramatic effect upon your CPU usage or network bandwidth, and yet still fast enough so that if one of these spammer address harvesting web crawlers is left to try to digest your entire web site, say, overnight, then within a few hours (and certainly by morning) its data base of e-mail addresses will have been well and throughly polluted by millions of utterly bogus e-mail addresses, just as we would like.

The bottom line is that sites can now safely install and run Wpoison without any fear that doing so may cause sudden large drains of CPU cycles or network bandwidth. It won't. Period. End of story.


[1] It is important to understand the distinction between legitimate web crawlers and the rather different ones that the spammers use. Legitimate web crawlers, such as those used by the major search engine companies do always obey the standardized and widely accepted Robot Exclusion Protocol, and they take its use, on any given web page, as a clear and unambiguous ``keep out'' sign. Spammers who are trawling for e-mail address on the other hand have no incentive whatsoever to skip any web pages that might contain valuable fresh e-mail addresses, so the address harvesting web crawlers that they use tend to totally ignore the established standards of good practice on the net, basically ignoring all posted ``keep out'' signs and blundering recklessly ahead even when they have been warned that there is no data of any permanence or interest on the page or pages ahead. In fact it is this reckless behavior that Wpoison relies upon. By being stupid, brutish, and un-careful, spammers play right into our hands!

It should be noted however that since the development of the first publicly-released version of Wpoison, spammers have been starting to catch on to the fact that their own stupidity and greediness in reading all web pages, even when they have been warned off, was in fact causing them more harm than good. Because of this the author of Wpoison now believes that many (and perhaps even a majority) of the spammer's address harvesting web crawlers have now been reprogrammed so that they now do obey the standard Robot Exclusion Protocol. This actually represents a sort of victory for those of us who do not want to have our e-mail addresses harvested by the spammers, because now we can gain a measure of protection from the spammer address harvesting web crawlers simple by arranging to have all e-mail addresses that are displayed on our web sites appear only on pages that we have marked as un-scrapable (for robots) via the standard Robot Exclusion Protocol.

The author of Wpoison nowadays strongly advises (to all who will listen) that all web pages containing real e-mail addresses should in fact be marked as being ``off limits'' via the standard Robot Exclusion Protocol, both now and into the foreseeable future. Doing that alone now provides a measure of protection from having your address harvested by spammers, all by itself (and without even having Wpoison anywhere in the picture).