Hi! My name is Konstantin Goudkov, and I would like to ask you something.
Don't you just hate it when you check your web server logs and see that some schmucks have copied all your pages with wget, curl, or some lame-ass win32 downloader app?
What about the times when you see thousands of legitimate-looking user agents from a single IP address grabbing your valuable content?
Do you constantly have to update your firewall reject list or .htaccess file and add more lines with user agents and/or IP ranges to block?
With CopierBuster, not only will you protect your content -- you will also teach those thieves a lesson they won't soon forget.
Here is how it works:
When CopierBuster determines that your pages are being requested by the same entity too fast or that too many pages have been requested in a short period of time, the buster silently starts to serve fake content just to that entity while still allowing regular access for all other visitors.
With proper use, it does not produce any external redirects, so the URLs of the pages remain the same for everybody.
Here is the best part:
You can specify your own fake file to serve.
You can also use a dynamic page as a fake file.
That dynamic page can display a bunch of different random words on each request, having the resulting size comparable to the average size of your regular pages.
So when the offending party is done downloading thousands of your files, they won't be able to salvage even the first few real files that were served before the buster kicked in!
Think about it. Would you want to go through each of thousands of files and look if it's the real one of fake? I doubt it.
Once they realize they have a directory full of garbage, they will just delete everything and move on to the next site.
That's the best way to defend yourself!
And can you imagine the looks on their faces when they realize what happened?
Well, enough theory. Here is how you can start protecting your website:
For a limited time, you can get it at an and incredibly low introductory price of only $0.00. And the best part is that the CopierBuster is distributed under the GPL license so you can feel all warm and fuzzy whenever you use it to protect your pages.
Absolutely no programming skills are required! **
Download CopierBuster today before any more of your precious content gets stolen!
copierbuster-20050521.jar
** -- assuming you've got cash to hire someone on Elance or Rentacoder.
How to use the buster:
Well, there are three basic ways to use it.
1) Script within a JSP page
2) With a wrapper custom tag
3) In a filter
The best way is to use it in a filter. With this setup, you won't have to make any changes to your content pages.
You can intercept the requests and run them through the buster. If the buster says it's OK then the filter passes the request down the chain. If not, then the filter creates a wrapper for the request and sends down a request for the fake page.
The second best solution is to use a tag.
If you decided to use a custom tag then you have to code your tag with some attribute like "displayFor" and match it to the response of the buster.
In the JSP page you would have something like:
In the example above, you can protect just the parts of your page.
The third way is to make calls to the buster directly from within the JSP page.
It's the worst way simply because scripting inside JSP is for morons that are not much higher on the evolutionary plane than people who use PHP (or Perl for web development).
But if you need a quickie and you don't care about the World Peace, then it's the best way to do it.
Here is an example:
You can construct the resulting URL based on the request parameters and any other information; it does not have to be hard-coded.
The main point here is to see that runHit returns true if the request is good and false if not. You can make additional decisions based on other factors.
One such factor is determining if it's a legitimate search engine crawler. I'm not going to discuss it here, but just want to mention that you need to take care of it or you'll suffer big time.
runHit method takes care of the tracking buy using an internal data-structure that holds the info about the past requests.
There is another constructor for CopierBuster that takes (int,int,long). The size of the queue (number of last distinct IPs to remember, the earlier IPs are thrown away), the maximum number of hits to allow for a single IP, and the maximum amount of time in milliseconds until the IPs hits are reset to O.
The default values are (1000,100, 60*60*1000).
I'll probably release a tag, a filter, and some JSP scriptlet examples at a later time, but for now have fun playing with the core.
If you have any questions then drop me a line at k AT goudkov DOT com
Konstantin Goudkov,
www.goudkov.com