[update] I've accepted an answer, as lc deserves the bounty due to the well thought-out answer, but sadly, I believe we're stuck with our original worst case scenario: CAPTCHA everyone on purchase attempts of the crap. Short explanation: caching / web farms make it impossible for us to actually track hits, and any workaround (sending a non-cached web-beacon, writing to a unified table, etc.) slows the site down worse than the bots would. There is likely some pricey bit of hardware from Cisco or the like that can help at a high level, but it's hard to justify the cost if CAPTCHAing everyone is an alternative. I'll attempt to do a more full explanation in here later, as well as cleaning this up for future searchers (though others are welcome to try, as it's community wiki).
I've added bounty to this question and attempted to explain why the current answers don't fit our needs. First, though, thanks to all of you who have thought about this, it's amazing to have this collective intelligence to help work through seemingly impossible problems.
I'll be a little more clear than I was before: This is about the bag o' crap sales on woot.com. I'm the president of Woot Workshop, the subsidiary of Woot that does the design, writes the product descriptions, podcasts, blog posts, and moderates the forums. I work in the css/html world and am only barely familiar with the rest of the developer world. I work closely with the developers and have talked through all of the answers here (and many other ideas we've had).
Usability of the site is a massive part of my job, and making the site exciting and fun is most of the rest of it. That's where the three goals below derive. CAPTCHA harms usability, and bots steal the fun and excitement out of our crap sales.
To set up the scenario a little more, bots are slamming our front page tens of times a second screenscraping (and/or scanning our rss) for the Random Crap sale. The moment they see that, it triggers a second stage of the program that logs in, clicks I want One, fills out the form, and buys the crap.
In current (2/6/2009) order of votes:
lc: On stackoverflow and other sites that use this method, they're almost always dealing with authenticated (logged in) users, because the task being attempted requires that.
On Woot, anonymous (non-logged) users can view our home page. In other words, the slamming bots can be non-authenticated (and essentially non-trackable except by IP address). So we're back to scanning for IPs, which a) is fairly useless in this age of cloud networking and spambot zombies and b) catches too many innocents given the number of businesses that come from one IP address (not to mention the issues with non-static IP ISPs and potential performance hits to trying to track this).
Oh, and having people call us would be the worst possible scenario. Can we have them call you?
BradC Ned Batchelder's methods look pretty cool, but they're pretty firmly designed to defeat bots built for a network of sites. Our problem is bots are built specifically to defeat our site. Some of these methods could likely work for a short time until the scripters evolved their bots to ignore the honeypot, screenscrape for nearby label names instead of form ids, and use a javascript-capable browser control.
lc again "Unless, of course, the hype is part of you
-
You could try to make the price harder for scripts to read. This is achieved most simply by converting it to an image, but a text recognition algorithm could still get around this. If enough scripters get around it, you could try applying captcha-like things to this image, but obviously at the cost of user experience. Instead of an image, the price could go in a flash app.
Alternately, you could try to devise a way to "shuffle" the HTML pf a page in some way that doesn't affect the rendering. I can't think of a good example off the top of my head, but I'm sure it's somehow doable.
-
How about implementing something like SO does with the CAPTCHAs?
If you're using the site normally, you'll probably never see one. If you happen to reload the same page too often, post successive comments too quickly, or something else that triggers an alarm, make them prove they're human. In your case, this would probably be constant reloads of the same page, following every link on a page quickly, or filling in an order form too fast to be human.
If they fail the check x times in a row (say, 2 or 3), give that IP a timeout or other such measure. Then at the end of the timeout, dump them back to the check again.
Since you have unregistered users accessing the site, you do have only IPs to go on. You can issue sessions to each browser and track that way if you wish. And, of course, throw up a human-check if too many sessions are being (re-)created in succession (in case a bot keeps deleting the cookie).
As far as catching too many innocents, you can put up a disclaimer on the human-check page: "This page may also appear if too many anonymous users are viewing our site from the same location. We encourage you to register or login to avoid this." (Adjust the wording appropriately.)
Besides, what are the odds that X people are loading the same page(s) at the same time from one IP? If they're high, maybe you need a different trigger mechanism for your bot alarm.
Edit: Another option is if they fail too many times, and you're confident about the product's demand, to block them and make them personally CALL you to remove the block.
Having people call does seem like an asinine measure, but it makes sure there's a human somewhere behind the computer. The key is to have the block only be in place for a condition which should almost never happen unless it's a bot (e.g. fail the check multiple times in a row). Then it FORCES human interaction - to pick up the phone.
In response to the comment of having them call me, there's obviously that tradeoff here. Are you worried enough about ensuring your users are human to accept a couple phone calls when they go on sale? If I were so concerned about a product getting to human users, I'd have to make this decision, perhaps sacrificing a (small) bit of my time in the process.
Since it seems like you're determined to not let bots get the upper hand/slam your site, I believe the phone may be a good option. Since I don't make a profit off your product, I have no interest in receiving these calls. Were you to share some of that profit, however, I may become interested. As this is your product, you have to decide how much you care and implement accordingly.
The other ways of releasing the block just aren't as effective: a timeout (but they'd get to slam your site again after, rinse-repeat), a long timeout (if it was really a human trying to buy your product, they'd be SOL and punished for failing the check), email (easily done by bots), fax (same), or snail mail (takes too long).
You could, of course, instead have the timeout period increase per IP for each time they get a timeout. Just make sure you're not punishing true humans inadvertently.
brad : I use StackOverflow regulary and didn't even know it used this technique. Obviously it's not interfering with my user experience.Ross : Google uses this same approach, and they only have IP addresses to go on. Frequently at work I'll get a CAPTCHA before I can search on Google because they see bot-like behavior from the same IP address. I think this approach (CAPTCHA after bot-like behavior) is the best you're going to get.Marcus Downing : I've had google ask me for a CAPTCHA before, but it was my own fault - I was using them as a calculator, doing dozens of nearly-identical sums.Simucal : Good answer but those bold lines are used a little gratuitously. It is kind of distracting.lc : True and understood. Since it was long, I was trying to highlight the general points. (I actually wanted to the less important text smaller instead of the important text bold, but we don't have that option.) It was a bit distracting and I've changed the format a bit. Thanks for the comment.xan : The CAPTCHA option sounds like a winner to me. You hurt the bots hard and if well balanced you should never get in your legitimate users way.Sam : Instead of locking people out and using a phone call, could you generate a temporary email address like cur92Siva@site.com, but generate the front part with an image.lc : That might work too, unless the bots just get used to the system and can screen-scrape the email address. My point with the phone call is it actually forces human interaction and requires the user to explain themselves directly with their voice. Bot owners probably don't want to do that.Chris Lloyd : Demonoid does this as well, if you download more than `x` torrents in `y` time they get you to authenticate with a CAPTCHA.firebird84 : Google does this too.RichardOD : I see the CAPTCHAs a lot on Stack Overflow. Sometimes I make too many edits too quickly!Philip Schlump : I see the CAPTCHAs almost every time I answer a question. -
I say expose the price information using an API. This is the unintuitive solution but it does work to give you control over the situation. Add some limitations to the API to make it slightly less functional than the website.
You could do the same for ordering. You could experiment with small changes to the API functionality/performance until you get the desired effect.
There are proxies and botnets to defeat IP checks. There are captcha reading scripts that are extremely good. There are even teams of workers in india who defeat captchas for a small price. Any solution you can come up with can be reasonably defeated. Even Ned Batchelder's solutions can be stepped past by using a WebBrowser control or other simulated browser combined with a botnet or proxy list.
-
Take a look at this article by ned Batchelder here. His article is about stopping spambots, but the same techniques could easily apply to your site.
Rather than stopping bots by having people identify themselves, we can stop the bots by making it difficult for them to make a successful post, or by having them inadvertently identify themselves as bots. This removes the burden from people, and leaves the comment form free of visible anti-spam measures.
This technique is how I prevent spambots on this site. It works. The method described here doesn't look at the content at all.
Some other ideas:
- Create an official auto-notify mechanism (RSS feed? Twitter?) that people can subscribe to when your product goes on sale. This reduces the need for people to make scripts.
- Change your obfuscation technique right before a new item goes on sale. So even if the scripters can escalate the arms race, they are always a day behind.
EDIT: To be totally clear, Ned's article above describe methods to prevent the automated PURCHASE of items by preventing a BOT from going through the forms to submit an order. His techniques wouldn't be useful for preventing bots from screen-scraping the home page to determine when a Bandoleer of Carrots comes up for sale. I'm not sure preventing THAT is really possible.
With regard to your comments about the effectiveness of Ned's strategies: Yes, he discusses honeypots, but I don't think that's his strongest strategy. His discussion of the SPINNER is the original reason I mentioned his article. Sorry I didn't make that clearer in my original post:
The spinner is a hidden field used for a few things: it hashes together a number of values that prevent tampering and replays, and is used to obscure field names. The spinner is an MD5 hash of:
- The timestamp,
- The client's IP address,
- The entry id of the blog entry being commented on, and
- A secret.
Here is how you could implement that at WOOT.com:
Change the "secret" value that is used as part of the hash each time a new item goes on sale. This means that if someone is going to design a BOT to auto-purchase items, it would only work until the next item comes on sale!!
Even if someone is able to quickly re-build their bot, all the other actual users will have already purchased a BOC, and your problem is solved!
The other strategy he discusses is to change the honeypot technique from time to time (again, change it when a new item goes on sale):
- Use CSS classes (randomized of course) to set the fields or a containing element to display:none.
- Color the fields the same (or very similar to) the background of the page.
- Use positioning to move a field off of the visible area of the page.
- Make an element too small to show the contained honeypot field.
- Leave the fields visible, but use positioning to cover them with an obscuring element.
- Use Javascript to effect any of these changes, requiring a bot to have a full Javascript engine.
- Leave the honeypots displayed like the other fields, but tell people not to enter anything into them.
I guess my overall idea is to CHANGE THE FORM DESIGN when each new item goes on sale. Or at LEAST, change it when a new BOC goes on sale.
Which is what, a couple times/month?
If you accept this answer, will you give me a heads-up on when the next one is due? :)
Marcus Downing : +1 for the RSS. Make it so that legitimate users are rewarded.TM : RSS seems like a good solution, but might that hurt the ad revenue that I am guessing this site depends on?VirtuosiMedia : You can put ads in RSS.epochwolf : This site already uses twitter. :)Michael Haren : They already use RSS, too -
Disclaimer: This answer is completely non-programming-related. It does, however, try to attack the reason for scripts in the first place.
Another idea is if you truly have a limited quantity to sell, why don't you change it from a first-come-first-served methodology? Unless, of course, the hype is part of your marketing scheme.
There are many other options, and I'm sure others can think of some different ones:
an ordering queue (pre-order system) - Some scripts might still end up at the front of the queue, but it's probably faster to just manually enter the info.
a raffle system (everyone who tries to order one is entered into the system) - This way the people with the scripts have just the same chances as those without.
a rush priority queue - If there is truly a high perceived value, people may be willing to pay more. Implement an ordering queue, but allow people to pay more to be placed higher in the queue.
auction (credit goes to David Schmitt for this one, comments are my own) - People can still use scripts to snipe in at the last minute, but not only does it change the pricing structure, people are expecting to be fighting it out with others. You can also do things to restrict the number of bids in a given time period, make people phone in ahead of time for an authorization code, etc.
David Schmitt : You forgot auctioninglc : Thank you. See, I knew there were others.Andy Dent : any raffle system will just be overloaded to increase the chances in the bot's favourlc : Not if you limit it to one per person/household/(physical) address it won't -
Time-block user agents that make so-many requests per minute. Eg if you've got somebody requesting a page exactly every 5 seconds for 10 minutes, they're probably not a user... But it could be tricky to get this right.
If they trigger an alert, redirect every request to a static page with as little DB-IO as possible with a message letting them know they'll be allowed back on in X minutes.
It's important to add that you should probably only apply this on requests for pages and ignore all the requests for media (js, images, etc).
Karl : I've done this on a personal project, it seems like a good method. You just need to remember all of the ip's as they hit your page, and have rules set up for what it means to be hitting your page too often. The problem is that the OP said checking IPs is way too expensive, which I dont understand.rmeador : If you implement the IP checking yourself (i.e. in your database, from your PHP script or whatever), then it will be quite expensive. Get the firewall to do it for you and it becomes much more feasible.Oli : rmeador: It also seems like it would be a lot harder to determine if the request was for HTML or other media. If you've got 20 external things on your page, you're looking at a minimum of 21 requests for a new user in 1-2 seconds. -
How about introducing a delay which requires human interaction, like a sort of "CAPTCHA game". For example, it could be a little Flash game where during 30 seconds they have to burst checkered balls and avoid bursting solid balls (avoiding colour blindness issues!). The game would be given a random number seed and what the game transmits back to the server would be the coordinates and timestamps of the clicked points, along with the seed used.
On the server you simulate the game mechanics using that seed to see if the clicks would indeed have burst the balls. If they did, not only were they human, but they took 30 seconds to validate themselves. Give them a session id.
You let that session id do what it likes, but if makes too many requests, they can't continue without playing again.
Arve Systad : Fun idea, but totally and completely ruining the user experience. Normal people visiting the site will think of it as 30 seconds of useless waiting. 30 seconds of useless waiting when browsing the internet or using web-apps is not in any way acceptable.Paul Dixon : normal people visiting wouldn't trigger the delay, only someone making an unreasonable number of requests. The idea *is* a little tongue in cheek, but I can see it working if the target audience are used to little flash games :)Groxx : Entertaining (and nigh-foolproof) idea, but I'd be irritated (especially during a Bag Of Canaries frenzy), and that would require massively more processing on their servers to perform checking (which is a big part of the problem). Also, bots can burst bubbles. You'd have to frequently change rules.Paul Dixon : Assuming each game is issued a token, and you know the time you issued the tokens, you need only attempt to process a token once, and only between 30 and say 300 seconds after it was issued. The beauty of it is that even if a bot does burst the bubble, they've still waited 30 seconds to do so.Paul Dixon : Plus, let's not forget the idea is to limit traffic. The page could say "we're very busy, if you're in a hurry, play this game for 30 seconds, or try again in a few minutes... -
There are a few other / better solutions already posted, but for completeness, I figured I'd mention this:
If your main concern is performance degradation, and you're looking at true hammering, then you're actually dealing with a DoS attack, and you should probably try to handle it accordingly. One common approach is to simply drop packets from an IP in the firewall after a number of connections per second/minute/etc. For example, the standard Linux firewall, iptables, has a standard operation matching function 'hashlimit', which could be used to correlate connection requests per time unit to an IP-address.
Although, this question would probably be more apt for the next SO-derivate mentioned on the last SO-podcast, it hasn't launched yet, so I guess it's ok to answer :)
EDIT:
As pointed out by novatrust, there are still ISPs actually NOT assigning IPs to their customers, so effectively, a script-customer of such an ISP would disable all-customers from that ISP.Robert Venables : Unfortunately some ISPs have shared exit IP addresses. For example, AOL has a limited collection of IPs that the members appear under: http://webmaster.info.aol.com/proxyinfo.html Your solution would impose a hard limit on the number of users for many ISPs.roe : Wow, I am awestruck. Stuff like this is still going on?Karl : Holy cow. I guess AOL won't be accessing my site then.StingyJack : ...and well it shouldnt =) -
- Provide an RSS feed so they don't eat up your bandwidth.
- When buying, make everyone wait a random amount of time of up to 45 seconds or something, depending on what you're looking for exactly. Exactly what are your timing constraints?
- Give everyone 1 minute to put their name in for the drawing and then randomly select people. I think this is the fairest way.
- Monitor the accounts (include some times in the session and store it?) and add delays to accounts that seem like they're below the human speed threshold. That will at least make the bots be programmed to slow down and compete with humans.
TM : These are interesting concepts but the "random selection" and the waiting period removes much of the "frenzy" that I am guessing woot is depending on. Taking away the timing urgency kind've ruins the site.jmucchiello : If it looks like a drawing, then he has to deal with gambling laws. Not worth it. -
I'm not seeing the great burden that you claim from checking incoming IPs. On the contrary, I've done a project for one of my clients which analyzes the HTTP access logs every five minutes (it could have been real-time, but he didn't want that for some reason that I never fully understood) and creates firewall rules to block connections from any IP addresses that generate an excessive number of requests unless the address can be confirmed as belonging to a legitimate search engine (google, yahoo, etc.).
This client runs a web hosting service and is running this application on three servers which handle a total of 800-900 domains. Peak activity is in the thousand-hits-per-second range and there has never been a performance issue - firewalls are very efficient at dropping packets from blacklisted addresses.
And, yes, DDOS technology definitely does exist which would defeat this scheme, but he's not seeing that happen in the real world. On the contrary, he says it's vastly reduced the load on his servers.
-
Preventing DoS would defeat #2 of @davebug's goals he outlined above, "Keep the site at a speed not slowed by bots" but wouldn't necessary solve #1, "Sell the item to non-scripting humans"
I'm sure a scripter could write something to skate just under the excessive limit that would still be faster than a human could go through the ordering forms.
-
Q: How would you stop scripters from slamming your site hundreds of times a second?
A: You don't. There is no way to prevent this behavior by external agents.You could employ a vast array of technology to analyze incoming requests and heuristically attempt to determine who is and isn't human...but it would fail. Eventually, if not immediately.
The only viable long-term solution is to change the game so that the site is not bot-friendly, or is less attractive to scripters.
How do you do that? Well, that's a different question! ;-)
...
OK, some options have been given (and rejected) above. I am not intimately familiar with your site, having looked at it only once, but since people can read text in images and bots cannot easily do this, change the announcement to be an image. Not a CAPTCHA, just an image -
- generate the image (cached of course) when the page is requested
- keep the image source name the same, so that doesn't give the game away
- most of the time the image will have ordinary text in it, and be aligned to appear to be part of the inline HTML page
- when the game is 'on', the image changes to the announcement text
- the announcement text reveals a url and/or code that must be manually entered to acquire the prize. CAPTCHA the code if you like, but that's probably not necessary.
- for additional security, the code can be a one-time token generated specifically for the request/IP/agent, so that repeated requests generate different codes. Or you can pre-generate a bunch of random codes (a one-time pad) if on-demand generation is too taxing.
Run time-trials of real people responding to this, and ignore ('oops, an error occurred, sorry! please try again') responses faster than (say) half of this time. This event should also trigger an alert to the developers that at least one bot has figured out the code/game, so it's time to change the code/game.
Continue to change the game periodically anyway, even if no bots trigger it, just to waste the scripters' time. Eventually the scripters should tire of the game and go elsewhere...we hope ;-)
One final suggestion: when a request for your main page comes in, put it in a queue and respond to the requests in order in a separate process (you may have to hack/extend the web server to do this, but it will likely be worthwhile). If another request from the same IP/agent comes in while the first request is in the queue, ignore it. This should automatically shed the load from the bots.
EDIT: another option, aside from use of images, is to use javascript to fill in the buy/no-buy text; bots rarely interpret javascript, so they wouldn't see it
Frank Krueger : I would make sure that the "default text" changes also. This would prevent the scraping app from just comparing the image to a previous image and waiting for a significant change. +1. Great idea.Dave Sherohman : Amendment to the "final suggestion": If a second request comes in from an address while a previous request from the same address is pending, discard the first request and put the second one in the queue. This will act as a penalty for hammering the site instead of letting the page load.Steven A. Lowe : @[Frank Krueger]: i thought i implied this, but upon re-reading i guess i didn't - thanks for pointing it out! It might also be useful to have the default-text image change just a few pixels to mess with comparisons, and/or generate nearly invisible watermark-style text to further mess with botsSteven A. Lowe : @[Dave Sherohman]: you could, but that might cause the queue to churn; it may be better to just discard the new requests to shed the load immediately - testing/profiling would tell for certain which is better, but thanks for a good suggestion! -
Instead of blocking suspected IPs it may be effective to reduce the amount of data you give to an address as its hits/min goes up. So if the bot hits you up more than a secret randomly changing threshold it will not see the data. Logged in users would always see the data. Logged in users that hit the server too often would be forced to re-authenticate, or be given a captcha.
-
I don't know how feasible this is: ... go on the offensive.
Figure out what data the bots are scanning for. Feed them the data that they're looking for when you're NOT selling the crap. Do this in a way that won't bother or confuse human users. When the bots trigger phase two, they'll log in and fill out the form to buy $100 roombas instead of BOC. Of course, this assumes that the bots are not particularly robust.
Another idea is to implement random price drops over the course of the bag o crap sale period. Who would buy a random bag o crap for $150 when you CLEARLY STATE that it's only worth $20? Nobody but overzealous bots. But then 9 minutes later it's $35 dollars ... then 17 minutes later it's $9. Or whatever.
Sure, the zombie kings would be able to react. The point is to make their mistakes become very costly for them (and to make them pay you to fight them).
All of this assumes you want to piss off some bot lords, which may not be 100% advisable.
Shawn Miller : Don't think pissing off bot lords is desirable, but you have an interesting idea here.Stephan Eggermont : Pissing off he script kiddies is always goodNicholas Flynt : I agree, and I'm liking this repeating idea of fooling the bots into making bogus purchases. It's payback, and since they're breaking the ToS already, they can hardly complain. -
All right so the spammers are out competing regular people to win the "bog of crap" auction? Why not make the next auction be a literal "bag of crap"? The spammers get to pay good money for a bag full of doggy do, and we all laugh at them.
j_random_hacker : Very original concept! -
The solution to this may be to attach a little bit of client side processing to actions of logging in and buying. The processing can be a negligible amount so that individuals are not affected but bots attempting to do the tasks many times will be hampered by the extra work load.
The processing can be a simple equation to solve done in javascript, unless you don't want to have to require javascript on your site.
Shawn Miller : This wouldn't stop people from scripting a WebBrowser control (that contains a full JavaScript implementation) to walk through the order process. There are a few of these approaches we've seen already out in the wild.DaveC : The idea is that it takes an incredibly large amount of processing power to spam hundreds of time a second but is negligible for real users using the site as normal. The amount of times the bots are hitting the site is the only real difference between the bots and real people. -
Hm I remember having read "Linux Firewalls" Attack Detection and Response with ... The situations there seem to be very comparable. And someone else has suggested that also. Just block a client temporarily or in progressive steps to throttle them down. If it's realyl from a few sites this must be quite efficient
Regards
-
You need to figure a way to make the bots buy stuff that is massively overpriced: 12mm wingnut: $20. See how many the bots snap up before the script-writers decide you're gaming them.
Use the profits to buy more servers and pay for bandwidth.
Tai Squared : What if they then return the items or issue a chargeback? This could end up costing you and chargebacks can hurt your business with credit card processors. The bots are also likely using stolen cards, but that may exacerbate the level of chargebacks as higher amounts will be challenged more often.Darryl Hein : There are no returns on the site...Christopher Mahan : on the chargeback: That would be perfect. Chargebacks can't be scripted. The script kiddies would not be able to make that scale. They would have to give up.Kibbee : Don't charge them, but mark them as bots, specifically for trying to buy the item. If any body buys a phoney item, then just mark them as a bot, and disallow them. You could probably just lock them out for a few hours.Kibbee : Try to see what the bots are looking to buy, and them make a fake item that looks exactly like the item, to a computer, but isn't the real item. It would be easy to make it clear to the user that the item wasn't real, but make it hard for a computer to figure it out.Christopher Mahan : Kibbee: if you don't charge them, there is no negative incentive to the script writer. You've got to make it so that the more they try to pound the site, the more money they lose. They will stop eventually. Also, on making a fake item: that's deceptive, and their chargeback would be warranted.Nicholas Flynt : I like it. Only a real human is going to be able to properly appraise an item based on an image/description, and so "random crap" could include a small amount of actual overpriced "crap" that you wouldn't expect real humans to buy. ^_^Organiccat : Indeed, a batch of a hundred pens and pencils could keep your spammers out of the loop, or monetarily inconvenience them enough to stop. You could also charge them a hefty sum for shipping & handling. One pen.WolfmanDragon : This is a wonderful solution. One word of warning, run this by a lawyer before implementing the plan. There are some odd federal statutes about mail fraud. I am NOT a lawyer and am NOT giving legal advice.mabwi : This has serious comedy value, until you anger a script-kiddie that happens to have more skills than just scraping woot, and causes you real problems because you ripped him off.Jacco : If the script kiddie gets angry they might just expose themselves enough for you to tag them and hand them over to law-enforcement.Nik Reiman : This is some good BOFH stuff, but I'd hate to see this answer win the bounty over some of the more realistic solutions suggested. This is the type of solution that someone taking over your code will curse your name for when the boss reports "unusual pricing patterns" on the website.Christopher Mahan : mabwi: Sure, but that's a different problem. Beside, at that point the list of suspects is narrowed to the list of people who have bought the $20 wingnut. Much easier for law enforcement to hunt down.Christopher Mahan : sqook: this is not a technology solution, but a real world solution. Putting security guards with guns in banks is the same thing. It may seem hard-nosed, but so are the crooks, so be hard-nosed. Hurt them where it hurts until they stop. -
Your entire business model is based on "first come, first served." You can't do what the radio stations did (they no longer make the first caller the winner, they make the 5th or 20th or 13th caller the winner) - it doesn't match your primary feature.
No, there is no way to do this without changing the ordering experience for the real users.
Let's say you implement all these tactics. If I decide that this is important, I'll simply get 100 people to work with me, we'll build software to work on our 100 separate computers, and hit your site 20 times a second (5 seconds between accesses for each user/cookie/account/IP address).
You have two stages:
- Watching front page
- Ordering
You can't put a captcha blocking #1 - that's going to lose real customers ("What? I have to solve a captcha each time I want to see the latest woot?!?").
So my little group watches, timed together so we get about 20 checks per second, and whoever sees the change first alerts all the others (automatically), who will load the front page once again, follow the order link, and perform the transaction (which may also happen automatically, unless you implement captcha and change it for every wootoff/boc).
You can put a captcha in front of #2, and while you're loathe to do it, that may be the only way to make sure that even if bots watch the front page, real users are getting the products.
But even with captcha my little band of 100 would still have a significant first mover advantage - and there's no way you can tell that we aren't humans. If you start timing our accesses, we'd just add some jitter. We could randomly select which computer was to refresh so the order of accesses changes constantly - but still looks enough like a human.
First, get rid of the simple bots
You need to have an adaptive firewall that will watch requests and if someone is doing the obvious stupid thing - refreshing more than once a second at the same IP then employ tactics to slow them down (drop packets, send back refused or 500 errors, etc).
This should significantly drop your traffic and alter the tactics the bot users employ.
Second, make the server blazingly fast.
You really don't want to hear this... but...
I think what you need is a fully custom solution from the bottom up.
You don't need to mess with TCP/IP stack, but you may need to develop a very, very, very fast custom server that is purpose built to correlate user connections and react appropriately to various attacks.
Apache, lighthttpd, etc are all great for being flexible, but you run a single purpose website, and you really need to be able to both do more than the current servers are capable of doing (both in handling traffic, and in appropriately combating bots).
By serving a largely static webpage (updates every 30 seconds or so) on a custom server you should not only be able to handle 10x the number of requests and traffic (because the server isn't doing anything other than getting the request, and reading the page from memory into the TCP/IP buffer) but it will also give you access to metrics that might help you slow down bots. For instance, by correlating IP addresses you can simply block more than one connection per second per IP. Humans can't go faster than that, and even people using the same NATed IP address will only infrequently be blocked. You'd want to do a slow block - leave the connection alone for a full second before officially terminating the session. This can feed into a firewall to give longer term blocks to especially egregious offenders.
But the reality is that no matter what you do, there's no way to tell a human apart from a bot when the bot is custom built by a human for a single purpose. The bot is merely a proxy for the human.
Conclusion
At the end of the day, you can't tell a human and a computer apart for watching the front page. You can stop bots at the ordering step, but the bot users still have a first mover advantage, and you still have a huge load to manage.
You can add blocks for the simple bots, which will raise the bar and fewer people with bother with it. That may be enough.
But without changing your basic model, you're out of luck. The best you can do is take care of the simple cases, make the server so fast regular users don't notice, and sell so many items that even if you have a few million bots, as many regular users as want them will get them.
You might consider setting up a honeypot and marking user accounts as bot users, but that will have a huge negative community backlash.
Every time I think of a "well, what about doing this..." I can always counter it with a suitable bot strategy.
Even if you make the front page a captcha to get to the ordering page ("This item's ordering button is blue with pink sparkles, somewhere on this page") the bots will simply open all the links on the page, and use whichever one comes back with an ordering page. That's just no way to win this.
Make the servers fast, put in a reCaptcha (the only one I've found that can't be easily fooled, but it's probably way too slow for your application) on the ordering page, and think about ways to change the model slightly so regular users have as good a chance as the bot users.
Jens Roland : "Every time I think of a "well, what about doing this..." I can always counter it with a suitable bot strategy" I came to the same conclusion when designing my authentication system, BUT -- there is one difference here that makes me doubt that logic: False positives aren't a big problemJens Roland : (continued) E.g. if a few real users here and there are unable to get the special offers, that's actually not a big dealbreaker (as long as they don't know what they're missing). In an auth system, it *is* a dealbreaker - you don't want users being prevented from logging inJens Roland : (continued) What this means is, you can design the Woot system to be more restrictive than 'traditional' spambot countermeasures, and because of this, you may actually be able to thwart the bots effectively.Jens Roland : (however, now that I've given it some more thought, I can't think of a way that works, that will also thwart distributd / botnet 'attacks') -
My solution would be to make screen-scraping worthless by putting in a roughly 10 minute delay for 'bots and scripts.
Here's how I'd do it:
- Log and identify any repeat hitters.
You don't need to log every IP address on every hit. Only track one out of every 20 hits or so. A repeat offender will still show up in a randomized occassional tracking.
Keep a cache of your page from about 10-minutes earlier.
When a repeat-hitter/bot hits your site, give them the 10-minute old cached page.
They won't immediately know they're getting an old site. They'll be able to scrape it, and everything, but they won't win any races anymore, because "real people" will have a 10 minute head-start.
Benefits:
- No hassle or problems for users (like CAPTCHAs).
- Implemented fully on server-side. (no reliance on Javascript/Flash)
- Serving up an older, cached page should be less performance intensive than a live page. You may actually decrease the load on your servers this way!
Drawbacks
- Requires tracking some IP addresses
- Requires keeping and maintaining a cache of older pages.
What do you think?
Jens Roland : Damn it. I just spent an hour and a half writing up my own five-vector scheme for woot, and after thinking long and hard over my fifth countermeasure (a botnet throttle), I had to admit defeat. It doesn't work. And the rest of my hour-long solution is -- well, this one. abelenky, I tip my hat to youJens Roland : ....and now I have a redundant three-page article explaining why your solution is the right one. Well, +1 to you, and I hope you get the bounty.SquareCog : To build on top of this: Put the IPs into an in-memory LRU counting hash (increment and push to top every time an IP comes back). Add heuristics based on reverse IP info, activity, image/js/cookie downloads. Scale your response by how bad the attack is, minimizing consequences of false negatives.Soraz : Only downside is that alot of people come from behind a firewall. So if you block the IP to an entire office building because someone in IT is runnning a bot, you are at the same time shutting out the rest of the office. Maybe not a real problem.abelenky : @Soraz: this is easily overcome with session cookies: If you are getting multiple hits from the same IP, but with distinct, valid cookies, then it is likely an office-behind-NAT. If the hits have the same cookie, or no cookie, they are a candidate for my technique.abelenky : (continued:) And my technique doesn't shut-out / ban anyone. It just gives them delayed information. No one in the office may win a prize, but that isn't much a problem from a customer-service / accessibility viewpoint.bruceatk : I think his is the best solution, but I wouldn't give them an old page I would give them a page designed just for bots.abelenky : @bruceatk: If you give them a special bots-only page, they will eventually learn to detect it, and learn to spoof a regular client more accurately. By giving old page, they will have NO IDEA that they are receiving old data. The old data is legitimate! Its just useless for contest/race purposes.Erik Forbes : +1 from me as wellabelenky : Big thanks to those who upvoted my idea. Even though the bounty is over, I think this idea has lots of merit in terms of being easier to implement than a captcha, less likely to harass humans, and more likely to foil bots. I hope someone gives this a try on some website. -
We are currently using the latest generation of BigIP load balancers from F5 to do this. The BigIP has advanced traffic management features that can identify scrapersand bots based on frequency and patterns of use even from amongst a set of sources behind a single IP. It can then throttle these, serve them alternative content or simply tag them with headers or cookies so you can identify them in your application code.
wds : This is the exact solution I was going to suggest, especially the automatic throttling. You could roll your own, just relies on some regular to advanced signal analysis. -
My approach would be to focus on non-technological solutions (otherwise you're entering an arms race you'll lose, or at least spend a great deal of time and money on). I'd focus on the billing/shipment parts - you can find bots by either finding multiple deliveries to same address or by multiple charges to a single payment method. You can even do this across items over several weeks, so if a user got a previous item (by responding really really fast) he may be assigned some sort of "handicap" this time around.
This would also have a side effect (beneficial, I would think, but I could be wrong marketing-wise for your case) of perhaps widening the circle of people who get lucky and get to purchase woot.
-
First of all, by definition, it is impossible to support stateless, i.e. truly anonymous, transactions while also being able to separate the bots from legitimate users.
If we can accept a premise that we can impose some cost on a brand-spanking-new woot visitor on his first page hit(s), I think I have a possible solution. For lack of a better name, I'm going to loosely call this solution "A visit to the DMV."
Let's say that there's a car dealership that offers a different new car each day, and that on some days, you can buy an exotic sports car for $5 each (limit 3), plus a $5 destination charge.
The catch is, the dealership requires you to visit the dealership and show a valid driver's license before you're allowed in through the door to see what car is on sale. Moreover, you must have said valid driver's license in order to make the purchase.
So, the first-time visitor (let's call him Bob) to this car dealer is refused entry, and is referred to the DMV office (which is conveniently located right next door) to obtain a driver's license.
Other visitors with a valid driver's license is allowed in, after showing his driver's license. A person who makes a nuisance of himself by loitering around all day, pestering the salesmen, grabbing brochures, and emptying the complimentary coffee and cookies will eventually be turned away.
Now, back to Bob without the license -- all he has to do is endure the visit to the DMV once. After that, he can visit the dealership and buy cars anytime he likes, unless he accidentally left his wallet at home, or his license is otherwised destroyed or revoked.
The driver's license in this world is nearly impossible to forge.
The visit to the DMV involves first getting the application form at the "Start Here" queue. Bob has to take the completed application to window #1, where the first of many surly civil servants will take his application, process it, and if everything is in order, stamp the application for the window and send him to the next window. And so, Bob goes from windows to window, waiting for each step of his application to go through, until he finally gets to the end and receives his drivere's license.
There's no point in trying to "short circuit" the DMV. If the forms are not filled out correctly in triplicate, or any wrong answers given at any window, the application is torn up, and the hapless customer is sent back to the start.
Interestingly, no matter how full or empty the office is, it takes about the same amount of time to get serviced at each successive window. Even when you're the only person in line, it seems that the personnel likes to make you wait a minute behind the yellow line before uttering, "Next!"
Things aren't quite so terrible at the DMV, however. While all the waiting and processing to get the license is going on, you can watch a very entertaining and informative infomercial for the car dealership while you're in the DMV lobby. In fact, the infomerical runs just long enough to cover the amount of time you spend getting your license.
The slightly more technical explanation:
As I said at the very top, it becomes necessary to have some statefulness on the client-server relationship which allows you to separate humans from bots. You want to do it in a way that doesn't overly penalize the anonymous (non-authenticated) human visitor.
This approach probably requires an AJAX-y client-side processing. A brand-spanking-new visitor to woot is given the "Welcome New User!" page full of text and graphics which (by appropriate server-side throttling) takes a few seconds to load completely. While this is happening (and the visitor is presumably busy reading the welcome page(s)), his identifying token is slowly being assembled.
Let's say, for discussion, the token (aka "driver's license) consists of 20 chunks. In order to get each successive chunk, the client-side code must submit a valid request to the server. The server incorporates a deliberate delay (let's say 200 millisecond), before sending the next chunk along with the 'stamp' needed to make the next chunk request (i.e., the stamps needed to go from one DMV window to the next). All told, about 4 seconds must elapse to finish the chunk-challenge-response-chunk-challenge-response-...-chunk-challenge-response-completion process.
At the end of this process, the visitor has a token which allows him to go to the product description page and, in turn, go to the purchasing page. The token is a unique ID to each visitor, and can be used to throttle his activities.
On the server side, you only accept page views from clients that have a valid token. Or, if it's important that everyone can ultimately see the page, put a time penalty on requests that is missing a valid token.
Now, for this to be relatiely benign to the legitimate human visitor,t make the token issuing process happen relatively non-intrusively in the background. Hence the need for the welcome page with entertaining copy and graphics that is deliberately slowed down slightly.
This approach forces a throttle-down of bots to either use an existing token, or take the minimum setup time to get a new token. Of course, this doesn't help as much against sophisticated attacks using a distributed network of faux visitors.
-
No matter how secure the Nazi's thought their communications were, the allies would often break their messages. No matter how you try to stop bots from using your site the bot owners will work out a way around it. I'm sorry if that makes you the Nazi :-)
I think a different mindset is required
- Do not try to stop bots from using your site
- Do not go for a fix that works immediately, play the long game
Get into the mindset that it doesn't matter whether the client of your site is a human or a bot, both are just paying customers; but one has an unfair advantage over the other. Some users without much of a social life (hermits) can be just as annoying for your site's other users as bots.
Record the time you publish an offer and the time an account opts to buy it.
This gives you a record of how quickly the client is buying stuff.
Vary the time of day you publish offers.
For example, have a 3 hour window starting at some obscure time of the day (midnight?) Only bots and hermits will constantly refresh a page for 3 hours just to get an order in within seconds. Never vary the base time, only the size of the window.
Over time a picture will emerge.
01: You can see which accounts are regularly buying products within seconds of them going live. Suggesting they might be bots.
02: You can also look at the window of time used for the offers, if the window is 1 hour then some early buyers will be humans. A human will rarely refresh for 4 hours though. If the elapsed time is quite consistent between publish/purchase regardless of the window duration then that's a bot. If the publish/purchase time is short for small windows and gets longer for large windows, that's a hermit!
Now instead of stopping bots from using your site you have enough information to tell you which accounts are certainly used by bots, and which accounts are likely to be used by hermits. What you do with that information is up to you, but you can certainly use it to make your site fairer to people who have a life.
I think banning the bot accounts would be pointless, it would be akin to phoning Hitler and saying "Thanks for the positions of your U-boats!" Somehow you need to use the information in a way that the account owners wont realise. Let's see if I can dream anything up.....
Process orders in a queue:
When the customer places an order they immediately get a confirmation email telling them their order is placed in a queue and will be notified when it has been processed. I experience this kind of thing with order/dispatch on Amazon and it doesn't bother me at all, I don't mind getting an email days later telling me my order has been dispatched as long as I immediately get an email telling me that Amazon knows I want the book. In your case it would be an email for
- Your order has been placed and is in a queue.
- Your order has been processed.
- Your order has been dispatched.
Users think they are in a fair queue. Process your queue every 1 hour so that normal users also experience a queue, so as not to arouse suspicion. Only process orders from bot and hermit accounts once they have been in the queue for the "average human ordering time + x hours". Effectively reducing bots to humans.
Peter Morris : What does that mean? :-)wds : http://en.wikipedia.org/wiki/Godwin%27s_lawPeter Morris : Ah thanks :-) I mention Nazi's because I am very interested in WWII stories about Bletchley park :-) Some of the stories on how messages were broken used a different mental approach to the problem, such as assuming operators were too lazy to change the codes from the night before :-) -
The important thing here is to change the system to remove load from your server, prevent bots from winning the bag of crap WITHOUT letting the botlords know you are gaming them or they will revise their strategy. I don't think there is any way to do this without some processing at your end.
So you record hits on your home page. Whenever someone hits the page that connection is compared to its last hit, and if it was too quick then it is sent a version of the page without the offer. This can be done by some sort of load balancing mechanism that sends bots (the hits that are too fast) to a server that simply serves cached versions of your home page; real people get sent to the good server. This takes the load off the main server and makes the bots think that they are still being served the pages correctly.
Even better if the offer can be declined in some way. Then you can still make the offers on the faux server but when the bot fills out the form say "Sorry, you weren't quick enough" :) Then they will definitely think they are still in the game.
-
Use JavaScript to dynamically write the info into the page. Without a JS rendering engine, surely the screen-scrapers & bots won't be able to read the information.
Shawn Miller : Sure they could. It would be almost as easy to parse JS as it would be to parse HTML.Meff : Depends how far they take obfuscation. Dynamically render a differently obfuscated JS for each visitor if need be. Could be a bit of an arms-race I admit.FryGuy : Couldn't they at that point use mozilla/mshtml and render the page, then scrape the DOM (a la greasemonkey scripts)? Doesn't seem like it would be that difficult to get around.Meff : If you have any links as to how you'd approach this as regards creating a bot I'd appreciate them. Seems you'd need a JS rendering engine, are we talking browser automation here? I'm thinking my approach would stop a standard screen-scraper, I'd be grateful for links proving otherwise :) -
The method I will describe has two requirements. 1) Javascript is enforced 2) a web browser with a valid http://msdn.microsoft.com/en-us/library/bb894287.aspx browser session.
With out either of these you are "by design" out of luck. The internet is built by design to allow anonymous clients view content. There is no way around this with simple HTML. Oh and I just wanted to say that simple, image based CAPTCHA can be defeated easily, even the authors admit to this.
Moving along to the problem and the solution. The problem is in two parts. The first is that you cannot block out an individual for "doing bad things". To fix this you setup a method that takes in the browsers valid session and generate a md5sum + salt + hash (of your own private device) and send it back to the browser. The browser then is REQUIRED to return that hashed key back during every post / get. If you do not ever get a valid browser session, then you reply back with "Please use a valid web browser blah blah blah". All popular browsers have valid browser session id's.
Now that we have an identity at least for that browser session (I know it does not lock out permanently, but it is quite difficult to "renew" a browser session through simple scripting) we can effectively lock out a session (ie; make it annoyingly hard for scripters to actually visit your site with no penalty to valid users).
Now this next part is why it requires javascript. On the client you build a simple hash for each character that comes from the keyboard versus the value of the text in the textarea. That valid key comes over to the server as a simple hash and has to be validated. While this method could easily be reverse engineered, it does make it one extra hoop that individuals have to go through before they can submit data. Mind you this only prevents auto posting of data, not DOS with constant visits to the web site. If you even have access to ajax there is a way to send a salt and hash key across the wire and use javascript with it to build the onkeypress characters "valid token" that gets sent across the wire. Yes like I said it could easily be reversed engineered, but you see where I am going with this hopefully.
Now to prevent constant abuse via traffic. There are ways to establish patterns once given a valid session id. These patterns (even if Random is used to offset request times), have a lower epsilon than if say a human was attempting to reproduce that same margin of error. Since you have a session ID, and you have a pattern that "appears to be a bot", then you can block out that session with a simple lightweight response that is 20 bytes instead of 200000 bytes.
You see here, the goal is to 1) make the anonymous non-anonymous (even if it's only per session) and 2) develop a method to identify bots vs. normal people by establishing patterns in the way they use your system. You can't say that the latter is impossible, because I have done it before. While, my implementations were for tracking video game bots I would seem to think that those algorithms for identifying a bot vs. a user can be generalized to the form of web site visits. If you reduce the traffic that the bots consume you reduce the load on your system. Mind you this still does not prevent DOS attacks, but it does reduce the amount of strain a bot produces on the system.
-
Sell the item to non-scripting humans.
Keep the site running at a speed not slowed by bots.
Don't hassle the 'normal' users with any tasks to complete to prove they're human.
You probably don't want to hear this, but #1 and #3 are mutually exclusive.
Well, nobody knows you're a bot either. There's no programatic way to tell the whether or not there's a human on the other end of the connection without requiring the person to do something. Preventing scripts/bots from doing stuff on the web is the whole reason CAPTCHAs were invented. It's not like this is some new problem that hasn't seen a lot of effort expended on it. If there were a better way to do it, one that didn't involve the hassle to real users that a CAPTCHA does, everyone would be using it already.
I think you need to face the fact that if you want to keep bots off your ordering page, a good CAPTCHA is the only way to do it. If demand for your random crap is high enough that people are willing to go to these lengths to get it, legitimate users aren't going to be put off by a CAPTCHA.
Shawn Miller : And I, for one, welcome our new bot overlordsMartin : +1 for if they want it, a captcha ain't going to stop them ... and for the cartoon. -
I think that sandboxing certain IPs is worth looking into. Once an IP has gone over a threshold, when they hit your site, redirect them to a webserver that has a multi-second delay before serving out a file. I've written Linux servers that can handle open 50K connections with hardly any CPU, so it wouldn't be too hard to slow down a very large number of bots. All the server would need to do is hold the connection open for N seconds before acting as a proxy to your regular site. This would still let regular users use the site even if they were really aggressive, just at a slightly degraded experience.
You can use memcached as described here to cheaply track the number of hits per IP.
-
To solve the first problem of the bots slamming your front page, try making the honeypot exactly the same as a real bag of crap. Make the html markup for the front page include the same markup as if it were for a bag of crap, but make it hidden. This would force the bots to include CSS engines to determine if the bag of crap code is displayed or hidden. Alternatively, you could only output this 'fake' bag of crap html a random amount of time (hours?) before a real bag of crap goes up. This would cause the bots to sound the alarm too soon (but not know how soon).
To cover the second step of actually purchasing the bag of crap, add simple questions. I prefer common sense questions to the math questions suggested above. Things like, "Is ice hot or cold?" "Are ants big or small"? Of course, these would need to be randomized and pulled from a never-ending supply of questions, else the bots could be programmed to answer them. These questions, though, are still much less of an annoyance than CAPTCHAs.
-
What about using Flash?
Yes, I know the overhead of using Flash, plus the fact that some users will be locked out of buying the bag-o-crap (i.e.: iPhone users) might make this detrimental, but it seems to me that Flash would prevent screenscraping or at least make it difficult.
Am I wrong?
Edited to add
What about including a couple of "hidden" fields on your submissions form like what I found below:
Actually, best practice seems to be to use two hidden fields, one with an initial value, and one without. It's the rare bot which can ignore both fields. Check for one field to be blank, and the other to have the initial value. And hide them using CSS, not by making them "hidden" fields:
.important { display : none ; }
Please don't change the next two fields.
Bots tend to like fields with names like 'address'. The text in the paragraph is for those few rare human beings who have a non-CSS capable browser. If you're not worried about them, you can leave it out.
In the logic for processing the form, you'd do something like:
if (address2 == "xyzzy" and address3 == "") { /* OK to send / } else { / probably have a bot */ }
GregD : Could the person(people) who are modding me down, please take a moment to explain to me why this wouldn't work?Stephan Eggermont : Didn't mod down, but locking out potential buyers is a big no-no. You have to take a very careful look at your users demographics before doing that. You might be surprised by the number of iPhone users.GregD : Yes but it seems to me that they could augment that by building a specific iPhone app?? -
- Go after the money stream. It is much easier than tracking the IP side. Make bots pay too much a few times (announcement with white text on white background and all variants of it) kills their business case quickly. You should prepare this carefully, and make good use of the strong points of bots: their speed. Did you try a few thousand fake announcements a few seconds apart? If they are hitting ten times/second you can go even faster. You want to keep this up as long as they keep buying, so think carefully about the moment of the day/week you want to start this. Ideally, they will stop paying, so you can hand over your case to a bank.
- Make sure your site is fully generated, and each page access returns different page content (html, javascript and css). Parsing is more difficult than generating, and it is easy to build-in more variation than bot developers can handle. Keep on changing the content and how you generate it.
- You need to know how fast bots can adapt to changes you make, and preferably the timezone they are in. Is it one botnet or more, are they in the same timezone, a different one, or is it a worldwide developer network? You want your counterattack to be timed right.
- Current state of the art bots have humans enter captcha's (offered against porn/games).
- Make it unattractive to react very fast.
- Use hashes and honeypots, as Ned Batchelder explains.
[edit] It is simply not true that you cannot defend against botnets. Especially my second suggestion provides for adequate defense against automated buyers. it requires a complete rethinking about the technology you're using, though. You might want to do some experiments with Seaside, or alternatively directly in c.
-
You can't totally prevent bots, even with a captcha. However you can make it a pain to write and maintain a bot and therefore reduce the number. Particularly by forcing them to update their bots daily you'll be causing most to lose interest.
Here are a some ideas to make it harder to write bots:
Require running a javascript function. Javascript makes it much more of a pain to write a bot. Maybe require a captcha if they aren't running javascript to still allow actual non-javascript users (minimal).
Time the keystrokes when typing into the form (again via javascript). If it's not human-like then reject it. It's a pain to mimic human typing in a bot.
Write your code to update your field ID's daily with a new random value. This will force them to update their bot daily which is a pain.
Write your code to re-order your fields on a daily basis (obviously in some way that's not random to your users). If they're relying on the field order, this will trip them up and again force daily maintenance to their bot code.
You could go even further and use Flash content. Flash is totally a pain to write a bot against.
Generally if you start taking a mindset of not preventing them, but making it more work for them, you can probably achieve the goal you're looking for.
Loren Pechtel : Humans sometimes engage in non-human typing, though--form fillers.porneL : You need to allow for very different typing styles/speeds - everything from hunt'n'peck to touchtyping. It's not hard to write bot that falls somewhere between. Things like variable field IDs and order can be circumvented by reading and parsing of form, which is not very hard. -
Most purely technical solutions have already been offered. I'll therefore suggest another view of the problem.
As I understand it, the bots are set up by people genuinely trying to buy the bags you're selling. The problem is -
- Other people, who don't operate bots, deserve a chance to buy, and you're offering a limited amount of bags.
- You want to attract humans to your site and just sell the bags.
Instead of trying to avoid the bots, you can enable potential bag-buyers to subscribe to an email, or even SMS update, to get notified when a sell will take place. You can even give them a minute or two head start (a special URL where the sell starts, randomly generated, and sent with the mail/SMS).
When these buyers go to buy they're in you're site, you can show them whatever you want in side banners or whatever. Those running the bots will prefer to simply register to your notification service.
The bots runners might still run bots on your notification to finish the buy faster. Some solutions to that can be offering a one-click buy.
By the way, you mentioned your users are not registered, but it sounds like those buying these bags are not random buyers, but people who look forward to these sales. As such, they might be willing to register to get an advantage in trying to "win" a bag.
In essence what I'm suggesting is try and look at the problem as a social one, rather than a technical one.
Asaf
-
Assumed non-negotiables:
The first screen needs to be dead simple low overhead HTML, with a single easily identiable (bot-wise or people-wise) button to click or equivalent to indicate unambiguously "I want my Crap". Because we assume worst-case - you have the equivalent of a DOS attack from a combination of bots and nonbots, all first click on the site (as far as identfiability). So let's hand these out as quickly as we can from caches, benign echobots, etc.
(Note: As far as wooters are concerned, this is what happens anyway; it's just as painful for users as for Woot, so anything that helps absorb or mitigate the first screen acquisition is in the interests of all of the 3 parties involved.)
Then, the process needs to be no more aggravating for non-bots than it currently is, with no additional steps (or pain) for legits. (Background note on current design: Current wooters usually will be already signed on, or can sign on during the purchase process. New buyers need to register during purchase. So it's practically quicker to be already registered, and quicker yet to already be logged on.)
To complete the crap sale, a progression of transaction screens need to be navigated (say 5, plus or minus, depending on circumstances). The winners are the first who complete the full navigation. The current process rewards bots (or anyone else) who complete the entire sequence of 5 screens the most quickly; but the entire progression is biased toward fast responses (i.e. bots).
No question the bots will have the advantage for the first screen; and whatever edge they have achieved from that point, they keep through the rest of the screens, plus whatever advantage botness provides at other stages as well.
What if Woot were to intentionally decouple the queuing process after the first screen, and feed every session from that point into a sequence of fixed-minimum-time steps? The second screen wouldn't even be presented until 30 seconds had passed; after it was submitted, same for the following screens. I bet wooters would have no problem if they were told that, after the first screen, they would wait in a queue (which is already true) that would spread the load over time in a way that should take no longer than before, be more robust, and help weed out the bots. At this point you can throw in some of the bot speedbumps listed above (subtle variations in DOM objects, etc.) Just the benefit from the perception that Woot is a little more in control of things would help.
If a much higher proportion of the BOC initial hits could segue into a bot-unfriendlier non-time-critical process on their first hit (or close to it), rather than retrying, then real people who get past that point would have more confidence. For sure it would be less hostile than the current situation. It might cut down on the background-noise-ambient-bot-rate that's going on all the time even under normal Woot-Off circumstances. And the bots would lay off the main page and sit in the queue with each other (and everyone else) where they have no advantage.
Hmmm... The concept "apartment-threaded" comes to mind. I wonder if the pattern is approximately useful?
A useful core concept here is being able, after the first screen, to track accumulated total time in queue and be able to adjust to standard. As a bot-mitigation strategy, you would have a little bit of flexibility to maybe fudge the very earliest sessions by maybe 5-10 seconds; doing so would probably be undetectable, but would result in a richer non-bot purchase mix. I'm sure you have statistics to help evaluate stuff like this after the fact.
Just for fun, you could (at least for one wootoff) put together your own bot that combines the best features you've seen, and then hand it out to everyone the day before. Then at least everyone would be equally armed. (Then duck ... incoming ...) -
I wrote 3 blog posts about this recently, they may be of use.
The posts cover dealing with bots and automated voting scripts. It may however not be heavy enough to deal with what you are having which seems pretty serious.
-
I like BradC's answer (using the suggestions in Ned Batchelder's article), but I want to add another level to it. You may be able to randomize not only the field names, but also the field positions and the code that makes them invisible.
Now, this last bit is hard part and I don't know exactly how to do it, but someone with more JavaScript and CSS experience might be able to figure it out. Of course, you can't just keep the same positions all the time, because the scripters will just figure out that the element with position (x,y) is the real one. You would have to have some code that changes the positioning of form elements relative to other elements in order to move them off the page, overlay them on each other, etc. Then obfuscate the code that does this with some randomness introduced into it. Automatically change the obfuscation daily, before a new item is made available. The idea is that without a proper CSS and JavaScript implementation (and code to read layout of the page as a human would) a bot won't be able to figure out which elements are being shown to the user. Your server-side code, of course, knows which fields are real and which are fake.
In summary:
- The field names are random
- The field order is random
- The field hiding code is complex
- The field hiding code is obfuscated - randomly
- The random factors are automatically changed every day by server-side code
With the constraints you've given I don't think there is a way to avoid an "arms race" of some kind, but that doesn't mean all is lost. If you can automate your side of the arms race and the scripters cannot then you would win it every time.
-
Write a reverse-proxy on an apache server in front of your application which implements a Tarpit (Wikipedia Article) to punish bots. It would simply manage a list of IP addresses that connected in the last few seconds. You detect a burst of requests from a single IP address and then exponentially delay those requests before responding.
Of course, multiple humans can come from the same IP address if they're on a NAT'd network connection but it's unlikely that a human would mind your response time going for 2mS to 4mS (or even 400mS) whereas a bot will be hampered by the increasing delay pretty quickly.
-
Make it unprofitable for the bot users and they'll go away pretty quickly - that is, occasionally sell something that no human being could possibly ever want (a bag of literal crap maybe).
-
I guess the only thing to do is make the effort exceed the benefits for spammers. So here is a "brainstorm" idea and I don't know all the technical details of how this would be implemented. I would have to do some research but from my current knowledge it worth investigating if the other suggested approaches are rejected.
You already use flash on your site so why not use a flash control to assist with or do the form submit? The control could do some encrypted comms with the web server with a key pair or some other algorithm to hash values?
I suppose the whole form could be in flash? Personally I would use Java applets because thats my favourite language.
-
A possible solution to the goals, not necessarily the question title:
Instead of serving up the special deal to everyone, serve it to random sets of ip addresses at a time. For instance, partition the IP space into 256 unique blocks, and at time=0, only allow people with ip addresses in the first block, and at time=5 seconds, allow people from the first block and the second block... until the last time slot arrives, and allow everyone to see the deal. One idea to randomize it would be to take the least significant bits of the md5/sha of their IP plus some salt based on the deal.
This would allow the scripters to still have an advantage in the fact that they have near-zero response time, and the strength by having multiple ip addresses, but it would mean that a given bot wouldn't have any advantage over another customer that was 'luckier' than them because of their IP address.
Combining this with some of the other ideas seems like a good idea.
-
First, let me recap what we need to do here. I realize I'm just paraphrasing the original question, but it's important that we get this 100% straight, because there are a lot of great suggestions that get 2 or 3 out of 4 right, but as I will demonstrate, you will need a multifaceted approach to cover all of the requirements.
Requirement 1: Getting rid of the 'bot slamming':
The rapid-fire 'slamming' of your front page is hurting your site's performance and is at the core of the problem. The 'slamming' comes from both single-IP bots and - supposedly - from botnets as well. We want to get rid of both.
Requirement 2: Don't mess with the user experience:
We could fix the bot situation pretty effectively by implementing a nasty verification procedure like phoning a human operator, solving a bunch of CAPTCHAs, or similar, but that would be like forcing every innocent airplane passenger to jump through crazy security hoops just for the slim chance of catching the very stupidest of terrorists. Oh wait - we actually do that. But let's see if we can not do that on woot.com.
Requirement 3: Avoiding the 'arms race':
As you mention, you don't want to get caught up in the spambot arms race. So you can't use simple tweaks like hidden or jumbled form fields, math questions, etc., since they are essentially obscurity measures that can be trivially autodetected and circumvented.
Requirement 4: Thwarting 'alarm' bots:
This may be the most difficult of your requirements. Even if we can make an effective human-verification challenge, bots could still poll your front page and alert the scripter when there is a new offer. We want to make those bots infeasible as well. This is a stronger version of the first requirement, since not only can't the bots issue performance-damaging rapid-fire requests -- they can't even issue enough repeated requests to send an 'alarm' to the scripter in time to win the offer.
Okay, so let's se if we can meet all four requirements. First, as I mentioned, no one measure is going to do the trick. You will have to combine a couple of tricks to achieve it, and you will have to swallow two annoyances:
- A small number of users will be required to jump through hoops
- A small number of users will be unable to get the special offers
I realize these are annoying, but if we can make the 'small' number small enough, I hope you will agree the positives outweigh the negatives.
First measure: User-based throttling:
This one is a no-brainer, and I'm sure you do it already. If a user is logged in, and keeps refreshing 600 times a second (or something), you stop responding and tell him to cool it. In fact, you probably throttle his requests significantly sooner than that, but you get the idea. This way, a logged-in bot will get banned/throttled as soon as it starts polling your site. This is the easy part. The unauthenticated bots are our real problem, so on to them:
Second measure: Some form of IP throttling, as suggested by nearly everyone:
No matter what, you will have to do some IP based throttling to thwart the 'bot slamming'. Since it seems important to you to allow unauthenticated (non-logged-in) visitors to get the special offers, you only have IPs to go by initially, and although they're not perfect, they do work against single-IP bots. Botnets are a different beast, but I'll come back to those. For now, we will do some simple throttling to beat rapid-fire single-IP bots.
The performance hit is negligable if you run the IP check before all other processing, use a proxy server for the throttling logic, and store the IPs in a memcached lookup-optimized tree structure.
Third measure: Cloaking the throttle with cached responses:
With rapid-fire single-IP bots throttled, we still have to address slow single-IP bots, ie. bots that are specifically tweaked to 'fly under the radar' by spacing requests slightly further apart than the throttling prevents.
To instantly render slow single-IP bots useless, simply use the strategy suggested by abelenky: serve 10-minute-old cached pages to all IPs that have been spotted in the last 24 hours (or so). That way, every IP gets one 'chance' per day/hour/week (depending on the period you choose), and there will be no visible annoyance to real users who are just hitting 'reload', except that they don't win the offer.
The beauty of this measure is that is also thwarts 'alarm bots', as long as they don't originate from a botnet.
(I know you would probably prefer it if real users were allowed to refresh over and over, but there is no way to tell a refresh-spamming human from a request-spamming bot apart without a CAPTCHA or similar)
Fourth measure: reCAPTCHA:
You are right that CAPTCHAs hurt the user experience and should be avoided. However, in *one* situation they can be your best friend: If you've designed a very restrictive system to thwart bots, that - because of its restrictiveness - also catches a number of false positives; then a CAPTCHA served as a last resort will allow those real users who get caught to slip by your throttling (thus avoiding annoying DoS situations).
The sweet spot, of course, is when ALL the bots get caught in your net, while extremely few real users get bothered by the CAPTCHA.
If you, when serving up the 10-minute-old cached pages, also offer an alternative, optional, CAPTCHA-verified 'front page refresher', then humans who really want to keep refreshing, can still do so without getting the old cached page, but at the cost of having to solve a CAPTCHA for each refresh. That is an annoyance, but an optional one just for the die-hard users, who tend to be more forgiving because they know they're gaming the system to improve their chances, and that improved chances don't come free.
Fifth measure: Decoy crap:
Christopher Mahan had an idea that I rather liked, but I would put a different spin on it. Every time you are preparing a new offer, prepare two other 'offers' as well, that no human would pick, like a 12mm wingnut for $20. When the offer appears on the front page, put all three 'offers' in the same picture, with numbers corresponding to each offer. When the user/bot actually goes on to order the item, they will have to pick (a radio button) which offer they want, and since most bots would merely be guessing, in two out of three cases, the bots would be buying worthless junk.
Naturally, this doesn't address 'alarm bots', and there is a (slim) chance that someone could build a bot that was able to pick the correct item. However, the risk of accidentally buying junk should make scripters turn entirely from the fully automated bots.
Sixth measure: Botnet Throttling:
[deleted]
Okay............ I've now spent most of my evening thinking about this, trying different approaches.... global delays.... cookie-based tokens.. queued serving... 'stranger throttling'.... And it just doesn't work. It doesn't. I realized the main reason why you hadn't accepted any answer yet was that noone had proposed a way to thwart a distributed/zombie net/botnet attack.... so I really wanted to crack it. I believe I cracked the botnet problem for authentication in a different thread, so I had high hopes for your problem as well. But my approach doesn't translate to this. You only have IPs to go by, and a large enough botnet doesn't reveal itself in any analysis based on IP addresses.
So there you have it: My sixth measure is naught. Nothing. Zip. Unless the botnet is small and/or fast enough to get caught in the usual IP throttle, I don't see any effective measure against botnets that doesn't involve explicit human-verification such as CAPTHAs. I'm sorry, but I think combining the above five measures is your best bet. And you could probably do just fine with just abelenky's 10-minute-caching trick alone.
Shawn Miller : Very well stated. Thanks for your input.Andy Dent : doesn't 3. mean you're serving up old pages to all of AOL, assuming a few bots come from AOL's IP pool?Jens Roland : @Andy: Only if *all* AOL users share the same IP addresses that the bots used while spamming. -
How about a delay page where the user must wait for a delay that is shown in an image?
You only do the ordering from the page they get to if they click within a short enough time period of that specified in the image, maybe the image could be doing a countdown within an animated gif or very small javascript or flash timer.
If they jump to the details page outside the time limit, they see an expensive item as discussed in previous answers.
-
Stick a 5 minute delay on all product announcements for unregistered users. Casual users won't really notice this and noncasual users will be registered anyhow.
-
I am not 100% sure this would work, at least not without trying.
But it seems as if it should be possible, although technically challenging, to write a server-side HTML/CSS scrambler that takes as its input a normal html page + associated files, and outputs a more or less blank html page, along with an obfuscated javascript file that is capable of reconstructing the page. The javascript couldn't just print out straightforward DOM nodes, of course... but it could spit out a complex set of overlapping, absolute-positioned divs and paragraphs, each containing one letter, so it comes out perfectly readable.
Bots won't be able to read it unless they have employ a complete rendering engine and enough AI to reconstruct what a human would be seeing.
Then, because it's an automated process, you can re-scramble the site as often as you have the computational power for - every minute, or every ten minutes, or every hour, or even every page load.
Granted, writing such an obfuscater would be difficult, and probably not worth it. But it's a thought.
-
Not a complete fix, but I didnt see it here yet.
Track the "slamming" addresses, and put up a disclaimer saying that BOC/ items will not be shipped to any address that is not following your TOS.
This will have psych impact on some, and others who want to take advantage of your site will have to switch up methods, but you will have negated one avenue for them.
-
How about this: Create a form to receive an email if a new item is on sale and add a catching system that will serve the same content to anyone refreshing in less than X seconds.
This way you win all the escenarios: you get rid of the scrapers(they can scrape their email account) and you give chance to the people who wont code something just to buy in your site! Im sure i would get the email in my mobile and log in to buy something if i really wanted to.
-
There's a lot of suggestions here so pardon me if this has already been posted.
The first thing I would do is make the ordering a two step process. The first step would pass back a GUID while logging the IP Address. The second step would receive the GUID and compare it against IP Addresses that have been logged. In conjunction with blocking IP Addresses which are spamming the site (IE: faster than a human can click refresh) this technique could stop spammers from successfully making purchases thereby solving 1 & 3.
The second item is problematic but I would keep a running list of your regular user's IP addresses and throttle traffic for any newcomers. This could leave first time visitors and dial up users (due to changing IP addresses) out in the cold, but I think it's just making the best out of a bad situation by giving preference to repeat business... and dialup users, well it's questionable whether they'd "win" even if there weren't any spammers anyway.
-
Why don't you block the credit cards of users you identify as bots?
- Publish that using bots is illegal on your website
- Find certain heuristics that identify bots (this can be done for example by short-term IP tracking or by the time it takes them to feel up the form)
- If someone you tagged as a bot purchased the item, block his credit card for future use
- Next time he tries to make a purchase, disallow it and return the item to stock
I guess even the professionals will run out of credit cards eventually.
Your server load should decrease with time once the botters give up on you. Another idea is to separate your pages between servers - e.g., RSS feed on one server, homepage on another, checkout on another one.
Good luck.
Shawn Miller : Will the professionals run out of PayPal accounts?idophir : Well... good point... I guess they will get tired of having to open a new PP account every day at some point. As opposed to all other suggestions, my approach is trying to hit the scripters in a place they cannot overcome by some code improvement. It's not perfect but I think it's cost/effective. -
As suggested above, I did some work on non-captcha forms by using a pre-calculated hash of the expected value of a result stored in the form. The idea works for two Wordpress anti-spam plugins: WP-Morph and WP-HashCash. The only drawback is the client browser having to be able to interpret JavaScript.
-
How do you know there are scripters placing orders?
The crux of your problem is that you can't separate the scripters from the legitimate users and therefore can't block them, so how is it that you know there are scripters at all?
If you have a way to answer this question, then you have a set of characteristics you can use to filter the scripters.
-
So your problem is too much business? People are sniping your sales? This is assuming that these scripters are generating qualified sales? And the issue is they are snapping up all your product before everyone else does?
How about you make a full webservice API for 'scripters' to interface with. Then offer a slight discount or some kind of perk to make them play by your rules. Double your business and have your web sales and API sales.
Either that or just get WAY more inventory - you can't fight it - embrace and adapt to it.
gbarry : He's referring to a promotional item. So it's not really about revenue.Pat : @BPAndrew -- I agree. My thought was to build a better bot! http://stackoverflow.com/questions/450835/how-do-you-stop-scripters-from-slamming-your-website-hundreds-of-times-a-second/547190#547190 -
I'm pretty sure your server already logs all the IPs of incoming requests (most do) - so the data is already there.
Maybe you could:
Just validate the "winner" by verifying that it's IP shows up less than a certain threshold value in the logs (I use "grep | wc -l" to get the count). If it's over your threshold, temporarily block that IP (hour or so?).
Disqualify any "winner" with the same shipping address or payment info as the "last" winner, or that has won within a certain time frame to spread the "winning" around.
The bots won't get 'em all that way.
To annoy the crap out of the scrapers: When the "random crap" item goes up, run the HMTL output for that page through a "code obfuscator" ... which doesn't change the "display" of the page ... just scrambles the code with randomly generated Ids etc.
More insidious:
Increase the price charged for the "won" item based on how many times the winning IP shows up in the logs. Then even if the bots win, so do you. :-)
Ron
-
Trying to target the BOTs themselves will never solve the problem - whoever is writing them will figure out a new way around whatever you've put in place. However forcing the user to think before buying would be a much more effective solution. The best way of doing this that I can think of is run a Dutch auction. Start the price high (2x what you buy it for in the shop) and decrease it over time. The first person to hit buy gets it. I don't think any bot is intelligent enough to workout what the best price is for the item.
-
I just wanted to say I'm a woot user and I would not mind jumping through the hoop of a math problem or what have you before I can finalize my order.
That is all.
-
Restrict the times at which you release offers: For example: only from 7 minutes to 8 minutes past the start of an hour. Do not deviate from this, and give penalties on the order of a couple seconds to IPs which check a lot in the half hour before the release time. It then becomes advantageous for bot owners to only screen scrape for a couple minutes every hour instead of all. the. time. Also, because a normal person can check a site once every hour but not every second, you put normal people on a much more even footing with the bots.
Cookies: Use a tracking cookie composed of only a unique ID (a key for a database table). Give "release delays" to clients with no cookie, invalid cookies, clients which use the same cookie from a new IP, or cookies used with high frequency.
Identify likely bots: Cookies will cause the bots to request multiple cookies for each IP they control, which is behavior which can be tracked. IPs with only a single issued cookie are most likely normal clients. IPs with many issued cookies are either large NAT-ed networks, or a bot. I'm not sure how you would distinguish those, but companies are probably more likely to have things like DNS servers, a web page, and things of that nature.
-
Perhaps you need a solution that makes it totally impossible for a bot to distinguish between the bag-o-crap sales and all other content.
This is sort of a variation on the captcha theme, but instead of the user authenticating themselves by solving the captcha, the captcha is instead the description of the sale, rendered in a visually pleasing (but perhaps somewhat obscured by the background) manner.
-
- Sell the item to non-scripting humans.
- Don't hassle the 'normal' users with any tasks to complete to prove they're human.
So basically you want to find out if a particular user is a person without making them prove it. As far as I know that's impossible over the Internet, sorry.
I suggest changing the mechanism to an auction.
-
Here's my take. Attack the ROI of the bot owners, so that they'll instead do the legitimate thing you want them to do instead of cheating. Let's look at it from their point of view. What are their assets? Apparently, an unlimited number of disposable machines, IP addresses, and perhaps even a large number of unskilled people willing to do inane tasks. What do they want? To always get the special deal you are offering before other legitimate people get it.
The good news is that they only have a limited window of time in which to win the race. And what I don't think they have is an unlimited number of smart people who are on call to reverse engineer your site at the moment you unleash a deal. So if you can make them jump through a specific hoop that is hard for them to figure out, but automatic for your legitimate customers (they won't even know it's there), you can delay their efforts just enough that they get beat by the massive number of real people who are just dying to get your hot deal.
The first step is to make your notion of authentication non-binary, by which I mean that, for any given user, you have a probability assigned to them that they are a real person or a bot. You can use a number of hints to build up this probability, many of which have been discussed already on this thread: suspicious rate activity, IP addresses, foreign country geolocation, cookies, etc. My favorite is to just pay attention to the exact version of windows they are using. More importantly, you can give your long-term customers a clear way to authenticate with strong hints: by engaging with the site, making purchases, contributing to forums, etc. It's not required that you do those things, but if you do then you'll have a slight advantage when it comes time to see special deals.
Whenever you are called upon to make an authentication decision, use this probability to make the computer you're talking to do more-or-less work before you will give them what they want. For example, perhaps some javascript on your site requires the client to perform a computationally expensive task in the background, and only when that task completes will you let them know about the special deal. For a regular customer, this can be pretty quick and painless, but for a scammer it means they need a lot more computers to maintain constant coverage (since each computer has to do more work). Then you can use your probability score from above to increase the amount of work they have to do.
To make sure this delay doesn't cause any fairness problems, I'd recommend making it be some kind of encryption task that includes the current time of day from the person's computer. Since the scammer doesn't know what time the deal will start, he can't just make something up, he has to use something close to the real time of day (you can ignore any requests that claim to come in before the deal started). Then you can use these times to adjust the first-come-first-served rule, without the real people ever having to know anything about it.
The last idea is to change the algorithm required to generate the work whenever you post a new deal (and at random other times). Every time you do that, normal humans will be unaffected, but bots will stop working. They'll have to get a human to get to work on the reverse-engineering, which hopefully will take longer than your deal window. Even better is if you never tell them if they submitted the right result, so that they don't get any kind of alert that they are doing things wrong. To defeat this solution, they will have to actually automate a real browser (or at least a real javascript interpreter) and then you are really jacking up the cost of scamming. Plus, with a real browser, you can do tricks like those suggested elsewhere in this thread like timing the keystrokes of each entry and looking for other suspicious behaviors.
So for anyone who you know you've seen before (a common IP, session, cookie, etc) you have a way to make each request a little more expensive. That means the scammers will want to always present you with your hardest case - a brand-new computer/browser/IP combo that you've never seen before. But by putting some extra work into being able to even know if they have the bot working right, you force them to waste a lot of these precious resources. Although they may really have an infinite number, generating them is not without cost, and again you are driving up the cost part of their ROI equation. Eventually, it'll be more profitable for them to just do what you want :)
Hope that's helpful,
Eric
-
Use hashcash.
-
I think your best bet is to watch IP's coming in, but to mitigate the issues you mention in a couple of ways. First, use a probabilistic hash (eg, a Bloom Filter) to mark IP's which have been seen before. This class of algorithm is very fast, and scales well to absolutely massive set sizes. Second, use a graduated response, whereby a server delay is added to each request, predicated by how much you've seen the IP 'recently'.
-
At the expense of Usability by those with screen readers you could just, on 90% of the pages use unlabelled, undenotable picture buttons. Rotate the pictures regularly and use a random generator and random sorting to lay out two buttons that say "I want this" and "I am a bot". Place them side by sort in a different order. At each stage a user can make progress torwards their target but a bot is more likely to make a mistake (50% * number of steps). It's like a capture at every stage on easier for the user and slower for bots who need to prompt their master at EVERY single step. Put the price, the confirm button, the item description in pictures. It sucks but likely more successful.
-
At the expense of Usability by those with screen readers you could just, on 90% of the pages use unlabelled, undenotable picture buttons. Rotate the pictures regularly and use a random generator and random sorting to lay out two buttons that say "I want this" and "I am a bot". Place them side by sort in a different order. At each stage a user can make progress torwards their target but a bot is more likely to make a mistake (50% * number of steps). It's like a capture at every stage on easier for the user and slower for bots who need to prompt their master at EVERY single step. Put the price, the confirm button, the item description in pictures. It sucks but likely more successful.
-
Why not make the content the CAPTCHA?
On the page where you display the prize, always have an image file in the same location with the same name, when a bag o crap sale is on, dynamically generate and load an image with the text etc advertising the prize, when no sale is on just have some default image that integrates well with the site. Seems like its the same concept as CAPTCHA... if the bot cannot figure out the meaning of the image they will not be able to "win" it, if they can they would have been able to figure out your CAPTCHA images anyways.
-
Just make the bots compete on even ground. Encrypt a timestamp and stick it in a hidden form field. When you get a submission decrypt it and see how much time has passed. If it surpasses the threshold of human typing ability reject it. Now bots and humans can only try to buy the bag of crap at the same speed.
-
If you can't beat them... Change the rules!
Why not provide a better system than the scripters have made for themselves?
Modify your site to be fairer for people not using bot scripts. People register (CAPTCHA or email verification) and effectively enter a lottery competition to win!'Winning' makes it more fun. and each person pays a small entry fee so the Winner gets the product for EVEN less
-
I'm not a web developer, so take this with a pinch of salt, but here's my suggestion -
Each user has a cookie (containing a random string of data) that determines whether they see the current crap sale.
(If you don't have a cookie, you don't see them. So users who don't enable cookies never see crap sales; and a new user will never see them the first time they view the page, but will thereafter).
Each time the user refreshes the website, he passes his current cookie to the server, and the server uses that to decide whether to give him a new cookie or leave the current one unchanged; and based on that, decides whether to show the page with or without the crap sale.
To keep things simple on the server side, you could say at any given time, there's only ever one cookie that will let you see crap sales; and there are a couple of other cookies that are labelled "generated in the last 2 seconds", which will always be kept unchanged. So if you refresh the page faster than that, you can't get a new one.
(...ah, well, I guess that doesn't stop a bot from restoring an older cookie and passing it back to you. Still, maybe there's a solution here somewhere.)
-
Stopping all bots would be quite difficult, especially without using a CAPTCHA. I think you should approach this from the standpoint of implementing a wide variety of measures to make life harder for the scripters.
I believe this is one measure that would weed out some of them:
You could try randomizing the IDs and class names of your tags with each response. This would force bots to rely on the position and context of important tags, which requires a more sophisticated bot. Furthermore, you could randomize the position of the tags if you want to use relative or absolute positioning in your CSS.
The biggest drawback with this approach is that you would have to take steps to ensure your CSS file is not cached client-side, because it would of course need to contain the randomized IDs & class names. One way to overcome this is to not use external CSS files and instead put the CSS with the randomized selectors in the
<head></head>
section of the page. This would allow the randomized CSS to be client-side cached along with the rest of the page. -
Steps:
(combining ideas from another poster and gif spammers)
Display the entire offer page as an image, ad-copy and all.
Encrypt the price in the URL.
Attacks:
Bots going to the URL to view the price on the checkout page
turn the checkout price tag into an image, or
apply a captcha before users can go to the order page.
chewing up bandwidth
- Serve special offers using images, normal offers using HTML.
reckless bot ordering
- some of the special "image" offers are actually at normal prices.
RSS Scraping
RSS feeds must be paid for by hashcash or captchas.
This has to be on a per-request basis.
It can be pre-paid, for instance user can enter 20 captchas for 200 RSS look ups
Once the threat of DDOS has been mitigated, you can implement e-mail notification of offers
-
How about coming up with a way to identify bots, probably IP based, but not block them from accessing the site, just don't allow them to actually buy anything. That is, if they buy, they don't actually get it, since bots are against the terms of use.
-
The problem with CAPTCHA is that when you see a crap sale on Woot, you have to act VERY fast as a consumer if you hope to receive your bag of crap. So, if you are going to use a form of CAPTCHA , it must be very quick for the customer.
What if you had a large image, say 600 x 600 that was just a white background and dots of different colors or patterns randomly placed on the image. The image would have an image map on it. This map would have a link mapped to small chunks of the image. Say, 10 x 10 blocks. The user would simply have to click on the specific type of dot. It would be quick for end the user and it would somewhat difficult for a bot developer to code. But this alone may not be that difficult for a good bot creator to get past. I would add ciphered URLs.
I was developing a system some time back that would cipher URLs. If every URL on these pages is ciphered with a random IV, Then they all appear to be unique to the bot. I was designing this to confuse probing bots. I have not completed the technique yet, but I did have a small site coded that functioned in this manor.
While these suggestions are not a full solution, they would make it way harder to build a working bot while still being easy for a human to use.
-
There's probably no good solution as long as the surprise distribution of the bag o' crap is tied only to a point in time - since bots have plenty of time, and the resources to keep slamming the site at short time intervals.
I think you'd have to add an extra criterion that bots can't screen-scrape or manipulate from their end. For instance, say at any time there's 5000 humans hitting the page a few times a minute looking for the bag of crap, and 50 bots slamming it every second. In the first few seconds after it appears, the 50 bots are going to snap it all up.
So, you could add a condition that the crap appears first to any users where the modulus 30 of their integer IP is a random number, say 17. Maybe another random number is added every second, so the crap is revealed incrementally to all clients over 30 seconds.
Now imagine what happens in the first several seconds: currently, all 50 bots are able to snap up all the crap immediately, and the humans get 0. Under this scheme, after 6 seconds only 10 bots have made it through, while 1000 humans have gotten through, and most of the crap goes to the humans. You could play with the timings and the random modulus to try and optimize that interval, depending on user counts and units available.
Not a perfect solution, but an improvement. The upside is many more humans than bots will benefit. There are several downsides, mainly that not every human gets an equal shot at the crap on any particular day - though they don't have much of a shot now, and I'd guess even without bots, most of them get shut out at random unless they happen to refresh at just the right second. And, it wouldn't work on a botnet with lots of distributed IPs. Dunno if anyone's really using a botnet just for woot crap though.
-
Your end goal is to spread out to a larger user base who gets to buy stuff.
What if you did something like releasing your bags of w00t over a period of an hour or two, and over a range of IP addresses, instead of releasing them all at the same time and to any IP address.
Let's say you have 255 bags of w00t. 1.0.0.0 can buy in the first minute, 2.0.0.0 can buy in the second minute (potentially 2 bags of w00t available), etc, etc.
Then, after 255 minutes, you have made bags of w00t available to everybody, although it is highly likely that not all 255 bags of w00t are left.
This limits a true attack to users who have >255 computers, although a bot user might be able to "own" the bag of w00t assigned to their IP range.
There is no requirement that you match up bags to IP's fairly (and you definitely should use some type of MD5 / random seed thing)... if you distribute 10 bags of w00t incrementally, you just have to make sure that it gets distributed ~evenly~ across your population.
If IP's are bad then you can use cookies and exclude the use case where a non-cookied user gets offered a bag of w00t.
If you notice that a particular IP, cookie, or address range has an extreme amount of traffic, make the bag of w00t available to them proportionally later / last, so that occasional / steady / slow visitors are given opportunities before heavy / rapid / probable bot users.
--Robert
-
Let's turn the problem on its head - you have bots buying stuff that you want real people to buy, how about making a real chance that the bots will buy stuff that you don't want the real people to buy.
Have a random chance for some non displayed html that the scraping bots will think is the real situation, but real people won't see (and don't forget that real people includes the blind, so consider screen readers etc as well), and this travels through to purchase something exorbitantly expensive (or doesn't make the actual purchase, but gets payment details for you to put on a banlist).
Even if the bots switch to 'alert the user' rather than 'make the purchase', if you can get enough false alarms, you may be able to make it sufficiently worthless for people (maybe not everyone, but some reduction in the scamming is better than none at all) not to bother.
-
I don't know if this has been suggested yet, but rather than keeping a list of IP's of the bots, which you would need to scan through on every single page request, why not set a cookie or a session var to keep track of the bots? Here's an example in PHP:
<?php // bot check $now = microtime(true); // bot counter var $botCounter = 0; if (array_key_exists('botCheck_panicCounter', $_REQUEST)) { $botCounter = $_REQUEST['botCheck_panicCounter']; } // if this seems to be a bot if ($botCounter > 5) { die('Die()!!'); } // if this user visited before if (array_key_exists('botCheck_lastVisit', $_REQUEST)) { $lastVisit = $_SESSION['botCheck_lastVisit']; $diff = $now - $lastVisit; // if it's less than a second if ($diff < 1) { // increase the bot counter $botCounter += 1; // and save it $_REQUEST['botCheck_panicCounter'] = $botCounter; } } // set the var for future use $_SESSION['botCheck_lastVisit'] = $now; // --------------- // rest of the content goes here ?>
I didn't check for syntax errors, but you get the idea.
Cebjyre : having it as a single standalone cookie would not really be effective, since the bot would just make up what it wants, but if it were cryptographically embedded in a required cookie, that could work. -
I would recommend a firewall-based solution. Netfilter/iptables, as most firewalls, allows you to set a limit to the maximum number of new page requests per unit time.
For example, to limit the number of page views dispensed to something human -- say, 6 requests every 30 second -- you could issue the following rules:
iptables -N BADGUY iptables -t filter -I BADGUY -m recent --set --name badguys iptables -A INPUT -p tcp --dport http -m state --state NEW -m recent --name http --set iptables -A INPUT -p tcp --dport http -m state --state NEW -m recent --name http --rcheck --seconds 30 --hitcount 6 -j BADGUY iptables -A INPUT -p tcp --dport http -m state --state NEW -m recent --name http --rcheck --seconds 3 --hitcount 2 -j DROP
Note that this limit would apply to each visitor independently, so one user's misuse of the site wouldn't affect any other visitor.
Hope this helps!
-
You could reduce the load on your server by having the RSS and HTML update at the same time, so there's no incentive for the bots to screenscrape your site. Of course this gives the bots and advantage in buying your gear.
If you only accept payments via credit card (might be the case, might not be, but it shows my line of thinking) only allow a user to buy a BOC once every 10 sales with the same account and/or credit card. It's easy for a script kiddie to get a swarm of IPs, less easy for them to get a whole heap of credit cards together. And as you've said IPs are really hard to ban, while temporary bans on credit cards should be a walk in the park.
You could let everyone know what the limit is, or you could just tell them that because of the high demand and/or bot interest there's throttling implemented on the purchasing while being unspecific about the mechanism.
Each attempt to purchase during the throttling period could trigger an exponential backoff - you buy a BOC, you have to what for 10 sales to pass before you try again. You try again anyway on the next sale, and now you have to wait 20 sales, then 40, then 80...
This is only really useful if it's really unlikely that a human user would manage to get a BOC twice in less than 10 sales. Tune the number as appropriate.
-
Run your servers on Linux with a light-weight web server like NGINX or YAWS, and you'll solve that niggling problem of the servers crashing every time I try to get my woot.
Shawn Miller : Install lunix, problem solved. -
There are a few solutions you could take, based on the level of complexity you want to get into.
These are all based on IP tracking, which falls apart somewhat under botnets and cloud computing, but should thwart the vast majority of botters. The chances that Joe Random has a cloud of bots at his disposal is far lower than the chance that he's just running a Woot bot he downloaded somewhere so he can get his bag of crap.
Plain Old Throttling
At a very basic, crude level, you could throttle requests per IP per time period. Do some analysis and determine that a legitimate user will access the site no more than X times per hour. Cap requests per IP per hour at that number, and bots will have to drastically reduce their polling frequency, or they'll lock themselves out for the next 58 minutes and be completely blind. That doesn't address the bot problem by itself, but it does reduce load, and increases the chance that legitimate users will have a shot at the item.
Adaptive Throttling
An variant on that solution might be to implement a load balancing queue, where the number of requests that one has made recently counts against your position in the queue. That is, if you keep slamming the site, your requests become lower priority. In a high-traffic situation like the bag of crap sales, this would give legitimate users an advantage over the bots in that they would have a higher connection priority, and would be getting pages back more quickly, while the bots continue to wait and wait until traffic dies down enough that their number comes up.
End-of-the-line captcha
Third, while you don't want to bother with captchas, a captcha at the very end of the process, right before the transaction is completed, may not be a bad idea. At that point, people have committed to the sale, and are likely to go through with it even with the mild added annoyance. It prevents bots from completing the sale, which means that at a minimum all they can do is hammer your site to try to alert a human about the sale as quickly as possible. That doesn't solve the problem, but it does mean that the humans have a far, far better chance of obtaining sales than the bots do currently. It's not a solution, but it's an improvement.
A combination of the above
Implement basic, generous throttling to stop the most abusive of bots, while taking into account the potential for multiple legitimate users behind a single corporate IP. The cutoff number would be very high - you cited bots hitting your site 10x/sec, which is 2.16 million requests/hour, which is obviously far above any legitimate usage, even for the largest corporate networks or shared IPs.
Implement the load balancing queue so that you're penalized for taking up more than your share of server connections and bandwidth. This penalizes people in the shared corporate pools, but it doesn't prevent them from using the site, and their violation should be far less terrible than your botters, so their penalization should be less severe.
Finally, if you have exceeded some threshold for requests-per-hour (which may be far, far, far lower than the "automatically drop the connection" cutoff), then require that the user validate with a captcha.
That way, the users who are legitimately using the site and only have 84 requests per hour, even when they're mega-excited, don't notice a change in the site's slow at all. However, Joe Botter finds himself stuck with a dilemma. He can either:
- Blow out his request quota with his current behavior and not be able to access the site at all, or
- Request just enough to not blow the request quota, which gives him realtime information at lower traffic levels, but causes him to have massive delays between requests during high-traffic times, which severely compromises his ability to complete a sale before inventory is exhausted, or
- Request more than the average user and end up getting stuck behind a captcha, or
- Request no more than the average user, and thus have no advantage over the average user.
Only the abusive users suffer degradation of service, or an increase in complexity. Legitimate users won't notice a single change, except that they have an easier time buying their bags of crap.
Addendum
Throttle requests for unregistered users at rates far below registered users. That way, a bot owner would have to be running a bot via an authenticated account to get past what should be a relatively restrictive throttling rate.
The inventive botters will then register multiple user IDs and use those to achieve their desired query rate; you can combat that by considering any IDs that show from the same IP in a given period to be the same ID, and subject to shared throttling.
That leaves the botter with no recourse but to run a network of bots, with one bot per IP, and a registered Woot account per bot. This is, unfortunately, effectively indistinguishable from a large number of unassociated legitimate users.
You could use this strategy in conjunction with one or more of the above strategies with the goal to produce the overall effect of providing the best service to registered users who do not engage in abusive usage patterns, while progressively penalizing other users, both registered and unregistered, according to their status (anon or registered) and the level of abuse as determined by your traffic metrics.
-
This is always tough, I applaud your desire to avoid using a CAPTCHA. I would suggest first blocking them based on their behavior which you can ascertain via the HTTP requests. Look at the tool known as bad behavior, in the year that I've been using it on several sites it has yet to block a real human being. Most bots don't do a very good job of pretending to be a web browser. I also recommend using the project honey pot API.
Secondly, alter your forms on a random basis, including the labels. This is not designed to fool the bots, this is designed to let you discover their IP addresses / proxies. Something that screws up entries xx times should go on that list.
Finally, if you find yourself in a position where you simply HAVE to use some kind of human verification process, try something like this:
[ image of a pig ] The image above is a: [ ] dog [ ] house [ ] pig
That would not be very annoying to human beings.
In short, there is not 'one' solution to your problem, don't expect to be 100% successful. Set your goal to reduce the annoyance to a very dull roar, you should be able to do it rather quickly.
Peter Morris : This would just result in the bot slamming the site 3 times more often so until its 1 in 3 chance of getting the right answer comes up. -
Mollom.com
-
my first thought was that you say the bots are scraping your webpage, which would suggest they are only picking up the HTML content. So having your order screen verify (from the http-logs) that an offer-related graphic was loaded from the bot
-
Develop a front page component and shopping cart that do not run natively in the brower. If you use something like Flex/Flash or Silverlight, it is much more difficult to scrape, and you have full control over the server communication, and thus can shield the content completely from scripters.
-
This only needs to be a problem if the bot users are paying with invalid credit cards or something. So how about a non-technical solution?
Treat the bot users as normal users as long as their payments are valid and make sure you have enough in stock to satisfy the total demand.
Result: more sales. You're in business to make money, right?
-
To guarantee selling items only to non-scripted humans, could you detect inhumanly quick responses between the item being displayed on the front page and an order being made? This turns the delay tactic on its head, instead of handicapping everyone artificially through a .5 second delay, allow requests as fast as possible and smack bots that are clearly superhuman:)
There is some physical limit to how fast a user can click and make a decision, and by detecting after all the requests have gone through (as opposed to purposely slowing down all interacts), you don't effect performance of non-scripted humans.
If only using CAPTCHAs some of the time is acceptable, you could increase the delay time to fast-human (as opposed to superhuman) and require a post confirmation CAPTCHA if someone clicks really fast. Akin to how some sites require CAPTCHA confirmation if someone posts multiple posts quickly.
Sadly I don't know of any good ways to stop screen scrapers of your product listings :(
-
I'm just wondering if there might be a simple solution to this.
I assume that the message indicating the crap sale is posted in text and this is the bit of information the scrapers look for.
What if you made the announcement using an image instead? Doing so might pose some design problems but they could be overcome and possibly serve as the impetus for some ingenious creativity.
Issue #1
There would have to be some design space dedicated to an image. (Want to be really tricky? Rotate a local ad through this slot. Of course the image's name would need to be static to avoid giving scrapers a scent. That's one slot that would never have to worry about ad-blindness...)Issue #2
RSS. I'm not sure if everyone can view images in their feed readers. If enough of your users can, then you could start sending a daily feed update consisting of an image. You could send whatever miscellaneous stuff you wanted on most days and then switch it for your crap sale alert as desired.I don't know... would they just program their bots to hit your site every time a feed item went out?
Other issues? Probably a lot. Maybe this will help with some brainstorming, though.
Take care,
Brian -
Here are some valid assumptions for you to make:
- Any automated solution can and will be broken.
- Making the site completely require human input (eg CAPTCHA) will greatly increase the difficulty of logging in/checking out/etc.
- You have a limited number of Bandoliers of Cabbage to sell.
- You can track users by session via a client-side cookie.
- You aren't dealing with extremely hardcore criminals here; these are simply technical people who are bending, but not breaking, the law. Successful orders via bots will go to the person's home, and likely not some third-party mail drop.
The solution isn't a technical one. It's a policy one.
- Log all client session ids on your webserver.
- Enact a "limited bots" policy; say, one screen scrape every X seconds, to give people with regular browsers the ability to hit refresh. Any user found to be going over this limit doesn't win the woot.
- Follow this up by sending known bot owners a bunch of Leakfrogs.
-
Here is what I'd do:
- Require all bidders for bag of crap sales to register with the site.
- When you want to start a sale, post "BOC sale starting soon, check your email to see if you are eligible" on your main page.
- Send out invitations to a random selection of the registered players, with a url unique to that particular sale when sale starts.
- Ensure the URL used is different for each sales event.
- Tweak the random selection invitation algorithm to pull down elibiblity for frequent winners, based upon Credit Card used for purchase, paypal account, or shipping address.
This thwarts the bots, as your main page only shows the pending BOC event. The bots will not have access to the URL without recieving it in email, and have no guarantee they will recieve it at all.
If you are concerned about sales impact, you could also incentivize participation by giving away one or two BOC's for each sale. If you don't see enough uptake on an offer in a given time interval, you automatically mail additional registered users, increasing the participant pool in each offer.
Viola. Level playing field, without tons of heuristics and web traffic analysis. System can still be gamed by people setting up huge numbers of email accounts, but tweaking participant selection criteria by CC#, paypal account, shipping address mitigates this.
-
First of all don't try to use technology to defeat technology.
Your issues:
- Usability of the site
- List making the site exciting and fun
- Load on server caused by scripters.
Your Goals:
- Keep the site running at a speed not slowed by bots.
- Sell the item to non-scripting humans.
- Don't hassle the 'normal' users with any tasks to complete to prove they're human.
Goal #1: Keep the site running at a speed not slowed by bots.
This is actually pretty simple. Have someone else host the page. Instead of the front page being hosted on your servers, have Amazon S3 / Akamai host the page. Most of the page is 'static' anyhow. Regenerate the page every 5 minutes or so the more dynamic items get refreshed. (Hell, regenerate it every 1 minute if you want). But now the bots are not hitting your server - they are hitting Akamai's CDN which can certainly take the load.
Of course do this for RSS feeds as well. There is no reason why some other service can't take the bandwidth / load hit for you. On a related note, have all images served by Akamai, etc. Why take the hit?
Goal #2: Sell the item to non-scripting humans
I am in agreement with others that say make it so that scripting gives no real advantage. However, scripting is also a sign of a passionate woot customer, so you don't want to be an a*hole either.
So I would say let them buy but make them pay an inflated amount (or more preferably) just slow them down so that others have a chance.
So each time a user hits the site offer the bag of crap at $29.99 and have a timer at a random speed drop or raise the price. Have an image or some other indicator that tells humans if the price will go lower if they are patient.
The user has a "Buy now!" button that they click when they see price/# items being what they want.
Example:
User:
- 0 sec $29.99 (1 item) Image says:"Wait for a lower price!"
- 7 sec $31.99 (1 item) Image says:"Wait for a lower price!"
- 13 sec $27.99 (1 item) Image says:"Bet you can do better!"
- 16 sec $1.99 (0 item) Image says:"You would be nuts to pay us something for nothing!"
- 21 sec $4.99 (two items) Image says:"Thats getting better!"
- 24 sec $4.99 (tres itemos) Image says:"It doesn't get any better than that!"
- 26 sec $8.99 (2 items) Image says:"Bet you can do better!"
repeat....
on a gradually tightening cycle that will lengthen the time the correct "$4.99 (tres itemos)" is displayed
If the bot hits refresh then the cycle restarts. If the user, misses and selects the wrong # of items / price -- decide if you want to let them buy at that price.
If they "overspend" for example, they pay $24.99 for 3 items and woot was only going to charge them $4.99 for 3 items then include a coupon for $20 off their next woot purchase.
Goal #3: Don't hassle the 'normal' users with any tasks to complete to prove they're human.
You are making a logical fallacy here. You are assuming that any Turing test (http://en.wikipedia.org/wiki/Turing_test ) has to be irritating. This is not true!
Here are some ideas:
- Create a game. The reward for playing the game is a $5 off coupon on the next order.
- Pair up 2 random users and have them chat with each other. Each user is told to answer 2 questions to the other user : "Ask what color is the your hair ?" and "What are you going to do next weekend?" Some users get paired with a woot random sentence generator. Each user is then asked if the other user is a human. If a user says the woot random sentence generator is human then reply "No I am not and may be you are from Mars as well. Do you want to try again?"
- Simple flash game that requires the user to maneuver through an obstacle course to get a discount coupon.
- Ask what city they are in. The reverse geo-code the ip address to see if they are close to being correct.
- Ask silly questions - "Do you think John McCain is a great president?" "Whose picture is on your driver's license?"
Only ask 3 times since all you really want to do is slow down the script kidees.
-
What about the NoBot Control from the ASP.net AJAX control toolkit?
It does some automated javascript request and timing tricks to prevent bots from accessing the site with NO user interaction.
Sorry if this doesn't meet some requirement, i'll just have to call
tl;dr >D -
Turn certain parts of the page into images so the bots can't understand them.
For example create small images of the integers 0-9, the dollar sign, and the decimal point. Cache the images on the client's computer when the page loads... then display the price using images chosen via code running server-side. Most human users won't notice the difference and the bots won't know the prices of any items.
-
My Opinion as a longtime WOOTer
I would be happy to have a CAPTCHA on ordering, turned on only for the BOC. I think most wooters would agree. Plus, 99.9% of the time you don't even get to the order screen because it sells out so fast, so hardly anybody would even know!!
If you make the CAPTCHA a really hard math problem, I'll be able to finally explain to my mom the practical benefit of so many years of studying math.
-
I don't see why IP address filtering HAS to be prohibitively expensive. With IIS you can build an ISAPI filter to do this in native code. I am sure apache has similar interfaces. Using the IP address of the client, you can write a simple rate-limiter for HTTP requests that does not depend on a banned list or other such nonsense.
-
So the problem really seems to be: the bots want their "bag 'o crap" because it has a high perceived value at a low perceived price. You sometimes offer this item and the bots lurk, waiting to see if it's available and then they buy the item.
Since it seems like the bot owners are making a profit (or potentially making a profit), the trick is to make this unprofitable for them by encouraging them to buy the crap.
First, always offer the "bag 'o crap".
Second, make sure that crap is usually crap.
Third, rotate the crap frequently.
Simple, no?
You'll need a permanent "why is our crap sometimes crap?" link next to the offer to explain to humans what's going on.
When the bot sees that there's crap and the crap is automatically purchased, the recipient is going to be awfully upset that they've paid $10 for a broken toothpick. And then an empty trash bag. And then some dirt from the bottom of your shoe.
If they buy enough of this crap in a relatively short period of time (and you have large disclaimers all over the place explaining why you're doing this), they're going to lose a fair "bag 'o cash" on your "bag 'o crap". Even human intervention on their part (checking to ensure that the crap isn't crap) can fail if you rotate the crap often enough. Heck, maybe the bots will notice and not buy anything that's been in the rotation for too short a time, but that means the humans will buy the non-crap.
Heck, your regular customers might be so amused that you can turn this into a huge marketing win. Start posting how much of the "crap" carp is being sold. People will come back just to see how hard the bots have been bitten.
Update: I expect that you might get a few calls up front with people complaining. I don't think you can stop that entirely. However, if this kills the bots, you can always stop it and restart it later.
-
- Tarpit. Limiting page views to 1 per second won't bother human users.
- Links via JavaScript. Simple bots don't dig that. as of usability, statistics show, that less then 1% of users doesn't use JS. 2a. hard-core version of above. Links in Flash.
- parameters stored in session, rather then in query string. Most bot are stateless.
-
Never thought I'd recommend flash for anything, but what about flash? Let your server send down asynchronous, encrypted content to the flash file signaling if it's deal time or not. As long as the response is the same size deal or no deal, the bot can't tell which it is.
At a more general level, you need to focus on the resources a human plus a browser have that a scripted bot doesn't and take advantage of things that are easy for humans/browsers and hard for bots. Captcha is obviously a simplistic attempt at doing this, but doesn't suit your site as you say. Flash would weed out a ton of bots, leaving only the (slower) ones that drive a real browser. The solution could be much simpler than captcha if it just requires the user to click in the right spot.
Take advantage of humans' massively parallel image processing power!
-
Make scanning the site expensive.
There is no way I know that can keep a bot out of your site. I even know a service, where there are humans that scan sites for you. How would you handle that?
The worst thing for bots is, when a site changes. After a while it gets to expensive or to boring to keep the bot running. There might be updates on the your site that look like a new product, but actually are not. If you update unregularly and undpredictable things are getting realy hard to the bot.
Banning IPs might be a countermeasure, as long as it is a known IP. The offender needs to use a proxy. The proxies I know work well, but slow you down a lot.
-
My thoughts (I haven't checked all the others, so I don't know if it's novel)
Dealing with swarming:
Convert the front-page matter for each day's stuff to be a flash/flex object.
- Yes, some people will complain, but we're looking for the common case here, not the ideal.
- You should also randomize the name of your flash objects, so they aren't in any predictable pattern of names.
Using Akamai or another CDN, deploy this flash object in advance to the outside world. Akamai produces what appears to be random URLs, so it makes it hard to predict.
- When it is time for a new sale, you just have to change your URL locally to refer to the appropriate object at Akamai, and people will go fetch the flash object from them to discover if the deal is a BoC or not.
End-of-the-day - you now have Akamai handling your swarms of midnight traffic
Dealing with auto-buy
- Each of the flash objects you create can have lots and lots of content hidden inside - images, links, arbitrary ids, including 'bag of crap' in a thousand places. you should be able to obfuscate the flash as well.
- When the flash object "goes live", people will start to attack it. But there are so many false positives that a simple string scan is useless - they'll have to simulate running the flash locally.
- But the flash doesn't write text. It draws lines and shapes. Shapes in different colors, all connected to timers that make them appear and disappear at different times.
- If you've seen the Colbert Report, you know how the intro has hundreds of words describing Colbert. Imagine something like that for your intro, which will always include Bag O Crap.
- Now, imagine that the intro takes an arbitrary amount of time - sometimes a few seconds, sometimes as long as a minute or more (make it funny)
- Meanwhile, "Bag O Crap" is constantly showing up, but again, clearly as part of the intro.
- Finally, the actual deal of the day is revealed, with an active 'shimmer' effect that makes it difficult for any single snapshot of the canvas to reveal the actual product name. This is floating above an animated background that still says 'bag O crap' and is constantly in motion
- again, all of this is handled with lines and shapes, not with text strings
End result - your hacker is forced to take lots of image snapshots of the deal, figure out how to separate all the false positives and identify the actual deal. Meanwhile, humans just look at it, and between eye fatigue and our ability to fill in gaps in the text, we can read the deal as is.
This won't work forever, but it would work for a while.
Another idea is to simply restrict people from buying BoCs unless they've bought something before with that account, and to never let them buy a BoC again.
-
I agree with the poster above who said about sometimes selling really 'crap' bags of crap.
You appear to have come up with a business model which is serverly limited by the technology through which you are trying to deliver it. Yet like most tech minded individuals (not a crticism, after all that is the what this site is for) you are trying to come up with a technical solution. BUT THIS IS A BUSINESS PROBLEM. This is being caused by a failure in the technology, but that does not mean that technology is the answer. And most all solutions that anyone comes up with (and there will be many options) will in the end by bypassed by those determined to 'auto-buy' (for want of a better short description) your 'bags of crap'.
IMHO you are asking the wrong people the wrong question and you are going to waste a lot of time and resource on the wrong solution.
-
Identify bots via IP or a suit of other mechanisms.
Always serve those identified as bots the normal front page.
Real people falsely identified as bots will not get the specials, but they won't notice anyway.
Bot owners won't realize you've identified them, so they will stop adapting their scripts.
-
My solution is a combination of marketing changes and technology changes.
Currently the technical side of sellng portion of bags of crap promotions are handled as a normal woot sale. The sale starts, people race to buy, all items are sold. The same statistcal charts used for daily sales are used bag of crap sales.
There are several market goals involved:
- Get customers to visit the site once every day (impluse purchasing). The possiblility of a seeing a bag of crap sale is the reason/reward.
- Network/viral/gossipy effect where a customer sees a bag of crap sale is on they will IM/EMail/Telephone their friends.
- There is also what I'd call general "good will". Woot is a really cool place because it occasionally rewards its customers with amazing sales (bag of crap that included a flat panel tv)... AND its done in a fair "first comes first served" manner.
The first 2 seem to be the most important. The sheer number of visitors has an effect on how fast normal deals sell (or sell out). New customers have traditionally been attracted pretty much by word of mouth, and having customers sending their friends to woot.com is a win.
So... my solution is to change the promotion delivery into more of a lottery.
Occasionally users can do something fun to see if they are eligable for a bag of crap. The something fun could be a silly flash game along the lines of "punch the monkey" or Orbitz mini-puts, baseball, hockey. The goal here is game that a bot can't script so some considerable care will be needed. The goal is also not to only award bag of crap to game winners... but to all game players.
The technical core of the game is that at the end of the game a request is made to a server that does an "instant lottery" to determine if the user has won a bag of crap sale opportunity. The server request will need to include something calculated by the game itself (roughly speaking "hash cash"... a complex, CPU cycle consuming, calculation, and hopefully one that is difficult to reproduce). This is to prevent a bot from repeatedly entering the lottery just be querying the lottery server/service.
The game itself can change over time. You can do special event games for halloween, christmas, valinties, easter, etc. There's lots of room for fun marketing ideas that can match woot's "wootiness".
If the user wins they can purchase N bags of crap (in a time limited window)... but they can also send N friends a time limited invitation to purchase a bag of crap (good for 24 hours). This provides a super strong network effect... customers will definately tell their friends. Or you could also do it as "buy 1 give 1"... let customers buy up to a total of N but force every second one to be shipped to a friend. The key here is to make the network/gossip effect an full fledged part... help the customer tell the world about the wonderfulness of woot.
The promotional material arounnd bag of crap sales concept will also need to be revamped. The graphs of how quickly a bag of crap sold out are no longer relevant. Something along the lines how frequently through the month people had the opportunity to purchase. How many people told their friends. The marterials should subtley emphasize the point that a daily woot visit is a good idea.
You can also promote the heck out of why bag of crap sales are changing. Especially that you hired the best bag of crap consultants available for free.
-
Honestly, I think your best solution is to make items during a Woot-Off only be visible to logged in users, and limit each logged-in user to one home page refresh every 500ms or so. (Or possibly make only a picture of the item be visible to unauthenticated users during a Woot-Off, and make sure you don't always use the same picture for Random Crap.) I think Woot users would be willing to accept this if you sell it as a measure to help them get their Bowls of Creaminess, and you can also point out that it'll help them check out quicker. Anything else--even using captchas--is subject to your typical arms race.
-
Build a better bot
The market place is telling you something. They want to get that bag o crap. So rather than fight the scripts (RIAA v file-sharing anyone?) Build a better bot.
Offer everyone an installed app that is just as good or better than anything a script kidee could put together. The user installs your branded app and every time the bag of crap is offered. The app will automatically try to buy it. If the current b-o-c is missed, the app has a "ticket" to give it a better chance for the next b-o-c sale. So if a user rolls their own script, they don't get the "ticket" in line for the next b-o-c sale, while users of the official app do.
Between b-o-c sales the app can show the current item for sale. Hell, make it so that the user can tell the woot app to look for "memory sticks"
Who will build their own script, when the official woot b-o-c+ script app is just as good or not better?
Additionally, woot gets another way of connecting to the customer.
Your customers are telling you what they want.
-
Give the user a choice between the original price and a much higher price. You will have to find some way to associate the buttons with their respective prices - colour, position, perhaps "emotional connotation" of the button - something difficult to programmatically determine but which only needs the user to connect a button to a price. Easy, intuitive and hassle free for the user, difficult and, more importantly, risky for the scripter - especially if you vary the method of association.
-
I'm in agreement with OP here - no captcha's please - it's not a very woot way of doing things.
Firstly set a few bot traps. I'd mention BOC more often on the home page, to trap the bots into looking as bots aren't intelligent, so again wording different each time e.g. "BOC complaints up!" - so bots just scanning for keywords will get trapped.
However, I think the real issue here is twofold, firstly the performance issues that you have need to be addressed, today it's bots causing a problem, but it indicates to me that there is a performance issue to be addressed.
Secondly it's a business opportunity to shift some real crap at a profit. So I'd keep with the overall woot style and state "we check for bots. If we think you are a bot you will get a box of botcrap."
The bot checking would be done offline sometime after the sale has been made, using bot traps, IP numbers, cookies, sessions, browser strings etc. Do some serious analysis with the data that you've got of purchasers to decide who gets botcrap. If you decide to ship botcrap - then you can free up some normal crap to sell to someone else.
-
Some ideas:
Simple: don't name it "Random Crap." Change the name of the item every time so that the bots will have a harder time identifying it. They may still look for the $1.00 items, in which case I suggest occasionally selling $1 sticks of gum for a few minutes. The $5 shipping should make it worth your while.
Harder: don't make the users do anything extra - make the users' computers do something extra. Write a JavaScript function that performs an intensive calculation taking a good amount of processing power - say, the ten-millionth prime number - and have the user's computer calculate that value and pass it back before you accept the order (perhaps even to create the "place order" URL). Change the function for every BoC so that bots can't pre-calculate and cache results (but so that you can). The calculation overhead might just slow down the bots enough to keep them off your backs - if nothing else, it would slow the hits on your servers so that they could breathe. You could also vary the depth of the calculation - ten-millionth prime versus hundred-millionth - at random so that the ordering process is no longer strictly first-come, first served, and to avoid penalizing customers with slower computers.
- E
-
If you are willing to make javascript mandatory, you can use a hashcash scheme to require, for example, ~30 seconds worth of client-side computation for each request. (Of course that might be 5 min on an iPhone or 1 second on a botnet of 30 computers: a significant drawback.)
You can also make scraping more difficult by generating the page with (obfuscated) javascript or (gag) flash.
You can also troll for bots with invisible (via CSS and javascript) random crap links.
You can detect 'bot-like' IP addresses (by rate and by visits to honeypot links) and redirect them to a special server (e.g. one with extra CC verification such as 'verified by visa' -- or merely one with a captcha.)
But really, it's an arms race. :) And one you may very well have to eventually escalate even beyond captchas.
Which brings me to: Why not change from a first-come, first-serve model to a lottery model where bots don't have such a large advantage over real shoppers?
-
Okay, I have a couple of questions more than an answer because I have no experience with the technology to know if it could/would work or would help.
With the following goals:
1. Sell the item to non-scripting humans.
2. Keep the site running at a speed not slowed by bots.
3. Don't hassle the 'normal' users with any tasks to complete to prove they're human.My questions are:
-. Would a Flash application, or Java applet, or Silverlight or anything similar reduce the ease of screen scraping enough to decrease the impact of the bots?
I'm curious if these are as wide open to external manipulation as typical javascript/html. While it is not standard for web development and may not be 'good' from an SEO point of view, it sounds like search visibility isn't your problem if you have millions of users. I believe that any of these could still offer a very good looking interface so your humans wouldn't be put off by the design.-. Could you put all of your information in an Image? I've never seen the part of woot you are referring too, but what I'm suggesting is to place any text that a human needs to know in a human friendly image instead of a bot-friendly textbox.
Oh, and to second something alluded to in some of the other responses. Don't miss the big opportunity you have: You have LOTS of Demand from Bots, and those people with Bots really buy right? Do you still want their money? (Cause if not, I'll take it.)
Do these people with the Bots have any alternative to buy from you? Separate out your bags of crap.
Have a woot subsite built for bots, geared towards bots and let the scripters have lots of fun AND pay you money for it. Sell them crap and let them challenge themselves against other scripters. It's a whole separate market available to you.
If they have an alternative where they can win something AND get bragging rights about it, they might be a little less inclined to beat up on the little old human.
-
Forgive me if this answer was already submitted. There are a lot of answers to try to read & understand all of them.
Why couldn't you just change your purchasing API every once in a while? Wouldn't that be completely transparent to the human users and pretty much kill most of the bot purchasers?
One implementation would be to change the names of the fields that the user has to fill in and submit on the page after hitting the "I Want One" button. How many times a year do you actually sell BOC? Not that often. So this would not be a huge programming burden to have a different purchasing API programmed, tested and ready for use every time a BOC goes on sale.
Just make sure the bots that are using the old and incorrect API don't bring your server down. Maybe host the BOC purchase API on a different server each time too. That way the bots can bring down a server that is not actually being used by us human BOC purchasers.
-
If I understand right, your biggest problem is with the screen scraping, not the automated purchase itself.
If so, your most effective step would be to defeat screen scraping by randomly encoding the page so that it look the same (kind of) but is always different at code level. (use hex codes, java encoding, pictures, change surrounding code structure...)
That would force them to constantly rewrite their scraping code and therefore make it that much more expensive for them to buy your "crap" automatically. If they can manage. They would probably continue to hit your website for a while until they realize they can't gain anything from it and drop it.
The downside of confusing the hell out of bots is that it will also confuse the hell out of search engine crawlers.
-
use concurrent connection limiting per IP address via either iptables on the server (if it is Linux based) or use a dedicated "router"
-
Just a side-remark: it seems to me that the problem is, that your user expected behaviour is very similar to a bot (come in big waves, unautheticated, click every button :)), so the Captcha might be the only turing test able to discern it :)).
-
You should have some record of the users who have purchased BOC most often, why not just ban those accounts or something. Sure legit users will be banned in this process but you are a business providing a product and if your are being abused by a group of users and such you have the right to refuse service to them. You have a lot of info on your users including paypal and bank accounts, you could ban those accounts forcing the bot users to get new accounts. Certainly I could come up with a script to buy BOC all the time or just download one from the net, but I have better morals than that. Never actually having successfully purchased BOC, I know the frustration of legit users who would like to receive a BOC in the hopes of getting a great deal. Perhaps instead of offering a BOC as an individual item every once and awhile, you could just give it to random users every day. When they receive an item they get a little note and and an additional item saying they also received a BOC. Then the only way someone could get a BOC is if they legitimately purchased something that only an actual human would have wanted. There would be nothing better than purchasing a coffee maker or something and also receiving a 42" tv or something in addition to your legitimate purchase. I think the majority of script kiddies would no longer be interested in your site if in order to get a BOC they would also have to commit to a purchase of more than 10 dollars.
-
Upfront caveats:
I'm not script-literate; I haven't read many of the other comments here.
I stumbled on this from the Woot description this morning. I thought a few comments from a moderate user of the woot sites (and two-time manual purchaser of BOCs) might be helpful.
Woot is in a unique position where it is both a commerce site and a destination with loyal users, and I understand the perceived delicacy of that balance. But personally I feel your concern about "negative user impact" of a Crap-CAPCHA ("CRAPCHA" - somehow I doubt I'm the first to make that gag) on users is way overstated. As a user I'd be happy to prove I'm human. And I trust Woot to make the process fun and interesting, integrating it into the overall experience.
Will this lead to the "arms race" posited? I dunno, but it can only help. If, say, key information to purchase is included in the product image or implied in the product description (in a different way each time), about the best a script could do would be to open a purchase page on detection of the C-word. Actually, I think this is fine: you are still required to be on-line and first-come-first-served still applies -- Wootalyzer and similar tools just increase awareness rather than automating purchase while I sleep or work.
Good luck figuring this out, and keep up the good work.
JGM
-
How about selling RSA keys to each user :) Hey, if they can do it for WoW, you guys should be able to do it.
I expect a BoC for my answer ;)
-
Why not make the front page just an image-mapped graphic (all one picture with no labels, tags, etc)? Easy for a human to read and understand on pretty much any device, but impossible for a bot to interrogate. In essence make the whole front page a captcha.
-
You will make enough on the lights today to pay for the CAPTCHA program from Cisco!! We are all used to them from buying concert tickets and other things.. It only seems fair. The way it is being done today is upsetting some and raising questions about a lottery or sweeps. I am sure you checked into that before you tried but it is not really a fun way to buy BOCs... It takes all the excitement out!
Getting the BOC first or a great product even by being on the sight draws people to Woot. If there is no reason to hang around and buy tons of stuff you don't need while waiting for the random BOC to come up, sales will drop off. The CAPTCHA may be the only way to defeat these people and still keep the excitement of Woot.
I was one of the first to get it to order a BOC last time and my first order was taken dumped with the million shipping and the second went through but was taken out of my account later. I was upset. I left Woot and have not purchased items like I did in the past on other days. I was willing to try it again, this way, today. I doubt I will in the future without a CAPTCHA for the fun stuff.
There are many sites trying to be like Woot. Of course they are not up to your level. I find myself reading a product description, not because I want the product, but I check in even for a laugh. I would hate to see someone come in with a fairer program and take away most of your business.
Just my opinion. I know almost nothing about bots and computers since I am a nurse.. But my vote is to upgrade to the higher level... The guys with the bots would just have to get in line with the rest of us and that is the way it should be:) Lori
-
As for CAPTCHAing everyone, why not use the Google solution of only requiring CAPTCHAs from IPs you suspect as being bots, or even just users that hammer the site? I'm sure asking someone for a CAPTCHA when they purchase isn't so bad if they've been hammering the site anyway, its just about the same as staying up and hitting F5 repeatedly. That or maybe require a periodic CAPTCHA when hammering, say every hundred (maybe smaller?) or so refreshes, to stop alarm-bots from working. You need some sort of CAPTCHA to prevent botting, but you also need to account for the fact that your real users will act like bots.
-
As a long time (4 year) user of Woot.com and purchaser of a few bags of crap, amongst the many other items now taking up space in my garage, it seems that the solution should be part of the overall Woot theme.
Use captcha, but in a humorous vein. Much like the $1,000,000 promotion, make a game out of identifying yourself as a person. This has, in the past delayed the "sell out" of the BOC for a reasonable amount of time, while people, like myself, scramble to figure out the fairly simple but humorous puzzle to enter a coupon code.
Also, while people complain endlessly about the server errors, they don't stop coming back. Part of the thrill of a BOC in my opinion is the fact there are a gazillion other people trying to get one. If the servers throw an error, or a funky page, it's a sign that I'm somewhere in a group of way too many people trying to get one of 1500 products.
If you put as much creativity into building the puzzle, and it is original enough, it will delay the bots long enough to give everyone else a chance. Incorporating a random word that's captured as a code, putting an interim page between the "I Want One" and the purchase page, that requires some uniquely human interaction, you've stopped the bots there, until they figure out what needs to happen.
• You haven't implemented a boring, and sometimes painfully difficult to read captcha • you've made the process more fun, • you've reduced the load on the actual secure purchase server • You'll train the users that they will need to "DO" something to get a BOC • You'll stopped the bots at the interim page, delaying their purchases until most people have at least had a chance to try and figure out the funny, but not terribly difficult puzzle.
• Since being random is what a BOC is all about, a random, and changing puzzle/task would fit in simply with the whole pitch of a BOC.As you experiment, the technology behind the interim page can become more advanced, with random information that can be captured for use in the purchase page. Since
I have purchased, without the aid of bots, or any scripts other than wootalyzer, which I feel is an acceptable aid, 7 BOC's since 5/31/05. The best one, which I didn't get, was the Please Please Me BOC. The B&D batteries were also fun, but I'm guessing didn't stump the bots, only frustrated the regular users.
Sometimes the best solution for a technology issue, isn't more technology.
-
Two solutions, one high-tech, one low-tech.
First the high-tech: The BOC offerings sell out in a seconds because bots get many of them in the first few milliseconds. So instead of trying to defeat the bots, sell them what they are scanning for: a bag of crap. Worthless crap, of course: bent paper clips and defiled photos of Rosie O'Donnell. Then have built-in random delays on the server for a few seconds at a time. As the sale continues, the actual value of the product sold will increase while the sell price does not. That way the first buyers (bots in the first few milliseconds) will get something worth much less than what they paid (brown onion cakes?), the next buyers (slower bots or faster humans) will get something unspectacular but worth the purchase price (bought on consignment?), and the last buyers (almost all humans) will get something worth more than the purchase price (break out champagne?). That flat-screen TV might be in the very last BOC purchased.
Anyone that waits too long will miss out, but at the same time anyone who buys too quickly will get hosed. The trick is to wait for some amount of time...but not too much. There's some luck involved, which is as it should be.
The low-tech solution would be to change up the name of the BOC to something humans can interpret but bots can't. Wineskin of excrement? Sack containing smelliness? Topologically flat surface adjacent to assorted goods? Never use the same name twice, use marginally different pictures, and explain in the product description what is actually being sold.
-
A potential solution to your particular problem (and not the general one) would be to require users to be signed in if they want to see the 'crap'. Only display the crap prizes to users that happen to be logged in. All other items can remain viewable to non-logged in users as they always have been. Then your loyal users are given first priority to the crap.
You'd obviously have to notify your users of this, perhaps with a notification that this is being done to increase the chances of real users finding the crap.
If your specific problem is bots harvesting for one particular type of item, then take the least restrictive alternative and only defend against that particular attack. This option would then prevent captchyas and the userability hit that you're concerned about.
If the bots log in and start spamming, you could force their log out and lock the account.
If they're only there to get the bag o' crap, they will leave fairly quickly and your page won't be taking the massive hits. Forget the highly technical solutions.
-
2 things:
server layer solution: mod_evasive (if you use apache)
http://www.zdziarski.com/projects/mod_evasive/
front layer solution: reverse captcha, or other non intrusive captcha
-
What if you randomized or encrypted the form names and IDs, randomized the order of the form fields, and made the form labels a random captcha image, that'd make a script attack a lot harder :-D
-
Make the whole bloody page a CAPTCHA!
Sorta like Sesame Street... eight of these things, don't belong here...Put 9 items, 9 HTML forms, 9 I WANT ONE buttons on the screen.
(9's just the number for the day... pick whatever number you want to make the layout still look good. 12 perhaps. Maybe customize it some for the resolution of the loading browser...)And scramble them for each person.
Make sure the BOC has to be "seen" to know which one it is... of course this means the other 8 have to bee "seen only" also, to know they are NOT the item to buy.
Make sure you only use crazy-ass numbers to reference everything behind the scenes on the page's source. Fine, so the BOT sees its BOC time... but it'll be a wild guess to pick the right HTML form to submit back for processing. -
There is probably not a magic silver bullet that will take care of Bots, but a combination of these suggestions may help deter them, and reduce them to a more manageable number.
Please let me know if you need any clarification on any of these suggestions:- Any images that depict the item should be either always the same image name (such as "current_item.jpg") or should be a random name that changes for each request. The server should know what the current item is and will deliver the appropriate image. This image should also have a random amount of padding to reduce bots comparing image sizes. (Possibly changing a watermark of some sort to deter more sophisticated bots).
- Remove the ALT text from these images. This text is usually redundant information that can be found elsewhere on the pages, or make them generic alt text (such as "Current item image would be here").
- The description could change each time a Bag of Crap comes up. It could rotate (randomly) between a number of different names: "Random Crap", "BoC", "Crappy Crap", etc...
- Woot could also offer more items at the "Random Crap" price, or have the price be a random amount between $0.95 and $1.05 (only change price once for each time the Crap comes up, not for each user, for fairness)
- The Price, Description, and other areas that differentiate a BoC from other Woots could be images instead of text.
- These fields could also be Java (not javaScript) or Flash. While dependent on a third-party plug-in, it would make it more difficult for the bots to scrape your site in a useful manner.
- Using a combination of Images, Java, Flash, and maybe other technologies would be another way to make it more difficult for the bots. This would be a little more difficult to manage, as administrators would have to know many different platforms.
- There are other ways to obfuscate this information. Using a combination of client-side scripting (javascript, etc) and server-side obfuscation (random image names) would be the most likely way to do it without affecting the user experience. Adding some obfuscating Java and/or Flash, or similar would make it more difficult, while possibly minimally impacting some users.
- Combine some of these tactics with some that were mentioned above: if a page is reloaded more than x times per minute, then change the image name (if you had a static image name suggested above), or give them a two minute old cached page.
- There are some very sophisticated things you could do on the back end with user behavior tracking that might not take too much processing. You could off-load that work to a dedicated server to minimize the performance impact. Take some data from the request and send it to a dedicated server that can process that data. If it finds a suspected bot, based on its behavior, it can send a hook to another server (front end routing firewall, server, router, etc OR back-end web or content server) to add some additional security to these users. maybe add Java applets for these users, or require additional information from the user (do not pre-fill all fields in the order page, making a different field empty each time randomly, etc).
-
Why dont you just change the name and picture of the BOC every time you offer it? It would become part of the fun of wooting to see the latest iteration of the BOC.
-
Thanks this is really helpful
-
Create a simple ip firewall rule that blacklists the IP-address if you detect more than a max. number of requests coming in per second.
-
You are making this way to hard. I will probably kick myself since I just won a BOC from the site today with a bot site, but just put the RANDOM CRAP text in captchas on the site main page. The bots all look for the text "RANDOM CRAP". So you basically just avoid triggering them in the first place. Anyone looking with their eyes will see that it says "Random Crap".
-
A rather simple solution is to track the time difference between rendering the forms and getting the response: bots usually have extreme short response times of milliseconds, no user could do that; or extreme long response times of several hours.
There's a django snippet doing it, along with a more detailed description:
-
You know, if you published your RSS feed using pubsubhubbub, people wouldn't have to hit your web page over and over again to see the next thing in the Woot-off, they'd just wait for it to show up on their Google Reader.
0 comments:
Post a Comment