[ACCEPTED]-Stopping scripters from slamming your website-detection
How about implementing something like SO 70 does with the CAPTCHAs?
If you're using the 69 site normally, you'll probably never see 68 one. If you happen to reload the same page 67 too often, post successive comments too 66 quickly, or something else that triggers 65 an alarm, make them prove they're human. In 64 your case, this would probably be constant 63 reloads of the same page, following every 62 link on a page quickly, or filling in an 61 order form too fast to be human.
If they 60 fail the check x times in a row (say, 2 59 or 3), give that IP a timeout or other such 58 measure. Then at the end of the timeout, dump 57 them back to the check again.
Since you have 56 unregistered users accessing the site, you 55 do have only IPs to go on. You can issue 54 sessions to each browser and track that 53 way if you wish. And, of course, throw up 52 a human-check if too many sessions are being 51 (re-)created in succession (in case a bot 50 keeps deleting the cookie).
As far as catching 49 too many innocents, you can put up a disclaimer 48 on the human-check page: "This page may 47 also appear if too many anonymous users 46 are viewing our site from the same location. We 45 encourage you to register or login to avoid 44 this." (Adjust the wording appropriately.)
Besides, what 43 are the odds that X people are loading the 42 same page(s) at the same time from one IP? If 41 they're high, maybe you need a different 40 trigger mechanism for your bot alarm.
Edit: Another 39 option is if they fail too many times, and 38 you're confident about the product's demand, to 37 block them and make them personally CALL 36 you to remove the block.
Having people call 35 does seem like an asinine measure, but it 34 makes sure there's a human somewhere behind the computer. The key is to have the block only be in 33 place for a condition which should almost 32 never happen unless it's a bot (e.g. fail 31 the check multiple times in a row). Then 30 it FORCES human interaction - to pick up 29 the phone.
In response to the comment of 28 having them call me, there's obviously that 27 tradeoff here. Are you worried enough about 26 ensuring your users are human to accept 25 a couple phone calls when they go on sale? If 24 I were so concerned about a product getting 23 to human users, I'd have to make this decision, perhaps 22 sacrificing a (small) bit of my time in 21 the process.
Since it seems like you're 20 determined to not let bots get the upper 19 hand/slam your site, I believe the phone 18 may be a good option. Since I don't make 17 a profit off your product, I have no interest 16 in receiving these calls. Were you to share 15 some of that profit, however, I may become 14 interested. As this is your product, you 13 have to decide how much you care and implement 12 accordingly.
The other ways of releasing 11 the block just aren't as effective: a timeout 10 (but they'd get to slam your site again 9 after, rinse-repeat), a long timeout (if 8 it was really a human trying to buy your 7 product, they'd be SOL and punished for 6 failing the check), email (easily done by 5 bots), fax (same), or snail mail (takes 4 too long).
You could, of course, instead 3 have the timeout period increase per IP 2 for each time they get a timeout. Just make 1 sure you're not punishing true humans inadvertently.
You need to figure a way to make the bots 5 buy stuff that is massively overpriced: 12mm 4 wingnut: $20. See how many bots snap up 3 before the script-writers decide you're 2 gaming them.
Use the profits to buy more 1 servers and pay for bandwidth.
My solution would be to make screen-scraping 11 worthless by putting in a roughly 10 minute 10 delay for 'bots and scripts.
Here's how I'd 9 do it:
- Log and identify any repeat hitters.
You don't need to log every IP address 8 on every hit. Only track one out of every 7 20 hits or so. A repeat offender will still 6 show up in a randomized occassional tracking.
Keep a cache of your page from about 10-minutes earlier.
When a repeat-hitter/bot hits your site, give them the 10-minute old cached page.
They 5 won't immediately know they're getting an 4 old site. They'll be able to scrape it, and 3 everything, but they won't win any races 2 anymore, because "real people" will have 1 a 10 minute head-start.
Benefits:
- No hassle or problems for users (like CAPTCHAs).
- Implemented fully on server-side. (no reliance on Javascript/Flash)
- Serving up an older, cached page should be less performance intensive than a live page. You may actually decrease the load on your servers this way!
Drawbacks
- Requires tracking some IP addresses
- Requires keeping and maintaining a cache of older pages.
What do you think?
Take a look at this article by ned Batchelder here. His article is about stopping 49 spambots, but the same techniques could 48 easily apply to your site.
Rather than stopping 47 bots by having people identify themselves, we 46 can stop the bots by making it difficult for 45 them to make a successful post, or by 44 having them inadvertently identify themselves 43 as bots. This removes the burden from 42 people, and leaves the comment form free 41 of visible anti-spam measures.
This technique 40 is how I prevent spambots on this site. It 39 works. The method described here doesn't 38 look at the content at all.
Some other 37 ideas:
- Create an official auto-notify mechanism (RSS feed? Twitter?) that people can subscribe to when your product goes on sale. This reduces the need for people to make scripts.
- Change your obfuscation technique right before a new item goes on sale. So even if the scripters can escalate the arms race, they are always a day behind.
EDIT: To be totally clear, Ned's 36 article above describe methods to prevent 35 the automated PURCHASE of items by preventing 34 a BOT from going through the forms to submit 33 an order. His techniques wouldn't be useful 32 for preventing bots from screen-scraping 31 the home page to determine when a Bandoleer 30 of Carrots comes up for sale. I'm not sure 29 preventing THAT is really possible.
With 28 regard to your comments about the effectiveness 27 of Ned's strategies: Yes, he discusses honeypots, but 26 I don't think that's his strongest strategy. His 25 discussion of the SPINNER is the original reason 24 I mentioned his article. Sorry I didn't 23 make that clearer in my original post:
The 22 spinner is a hidden field used for a few 21 things: it hashes together a number of 20 values that prevent tampering and replays, and 19 is used to obscure field names. The spinner 18 is an MD5 hash of:
- The timestamp,
- The client's IP address,
- The entry id of the blog entry being commented on, and
- A secret.
Here is how you could 17 implement that at WOOT.com:
Change the "secret" value 16 that is used as part of the hash each time 15 a new item goes on sale. This means that 14 if someone is going to design a BOT to auto-purchase 13 items, it would only work until the next item comes on sale!!
Even if someone is able to quickly 12 re-build their bot, all the other actual 11 users will have already purchased a BOC, and 10 your problem is solved!
The other strategy 9 he discusses is to change the honeypot technique 8 from time to time (again, change it when 7 a new item goes on sale):
- Use CSS classes (randomized of course) to set the fields or a containing element to display:none.
- Color the fields the same (or very similar to) the background of the page.
- Use positioning to move a field off of the visible area of the page.
- Make an element too small to show the contained honeypot field.
- Leave the fields visible, but use positioning to cover them with an obscuring element.
- Use Javascript to effect any of these changes, requiring a bot to have a full Javascript engine.
- Leave the honeypots displayed like the other fields, but tell people not to enter anything into them.
I guess my overall 6 idea is to CHANGE THE FORM DESIGN when each 5 new item goes on sale. Or at LEAST, change 4 it when a new BOC goes on sale.
Which is 3 what, a couple times/month?
If you accept 2 this answer, will you give me a heads-up 1 on when the next one is due? :)
Q: How would you stop scripters from slamming 40 your site hundreds of times a second?
A: You 39 don't. There is no way to prevent this behavior 38 by external agents.
You could employ a vast 37 array of technology to analyze incoming 36 requests and heuristically attempt to determine 35 who is and isn't human...but it would fail. Eventually, if 34 not immediately.
The only viable long-term 33 solution is to change the game so that the site is not 32 bot-friendly, or is less attractive to scripters.
How 31 do you do that? Well, that's a different 30 question! ;-)
...
OK, some options have been 29 given (and rejected) above. I am not intimately 28 familiar with your site, having looked at 27 it only once, but since people can read 26 text in images and bots cannot easily do 25 this, change the announcement to be an image. Not a CAPTCHA, just 24 an image -
- generate the image (cached of course) when the page is requested
- keep the image source name the same, so that doesn't give the game away
- most of the time the image will have ordinary text in it, and be aligned to appear to be part of the inline HTML page
- when the game is 'on', the image changes to the announcement text
- the announcement text reveals a url and/or code that must be manually entered to acquire the prize. CAPTCHA the code if you like, but that's probably not necessary.
- for additional security, the code can be a one-time token generated specifically for the request/IP/agent, so that repeated requests generate different codes. Or you can pre-generate a bunch of random codes (a one-time pad) if on-demand generation is too taxing.
Run time-trials of real people 23 responding to this, and ignore ('oops, an 22 error occurred, sorry! please try again') responses 21 faster than (say) half of this time. This 20 event should also trigger an alert to the 19 developers that at least one bot has figured 18 out the code/game, so it's time to change 17 the code/game.
Continue to change the game 16 periodically anyway, even if no bots trigger 15 it, just to waste the scripters' time. Eventually 14 the scripters should tire of the game and 13 go elsewhere...we hope ;-)
One final suggestion: when 12 a request for your main page comes in, put it in a queue and 11 respond to the requests in order in a separate 10 process (you may have to hack/extend the 9 web server to do this, but it will likely 8 be worthwhile). If another request from 7 the same IP/agent comes in while the first 6 request is in the queue, ignore it. This 5 should automatically shed the load from 4 the bots.
EDIT: another option, aside from 3 use of images, is to use javascript to fill 2 in the buy/no-buy text; bots rarely interpret 1 javascript, so they wouldn't see it
I don't know how feasible this is: ... go 21 on the offensive.
Figure out what data the 20 bots are scanning for. Feed them the data 19 that they're looking for when you're NOT 18 selling the crap. Do this in a way that 17 won't bother or confuse human users. When 16 the bots trigger phase two, they'll log 15 in and fill out the form to buy $100 roombas 14 instead of BOC. Of course, this assumes 13 that the bots are not particularly robust.
Another 12 idea is to implement random price drops 11 over the course of the bag o crap sale period. Who 10 would buy a random bag o crap for $150 when 9 you CLEARLY STATE that it's only worth $20? Nobody 8 but overzealous bots. But then 9 minutes 7 later it's $35 dollars ... then 17 minutes 6 later it's $9. Or whatever.
Sure, the zombie 5 kings would be able to react. The point 4 is to make their mistakes become very costly 3 for them (and to make them pay you to fight 2 them).
All of this assumes you want to piss 1 off some bot lords, which may not be 100% advisable.
So the problem really seems to be: the 41 bots want their "bag 'o crap" because it 40 has a high perceived value at a low perceived 39 price. You sometimes offer this item and 38 the bots lurk, waiting to see if it's available 37 and then they buy the item.
Since it seems 36 like the bot owners are making a profit 35 (or potentially making a profit), the trick 34 is to make this unprofitable for them by 33 encouraging them to buy the crap.
First, always offer the 32 "bag 'o crap".
Second, make sure that crap 31 is usually crap.
Third, rotate the crap frequently.
Simple, no?
You'll 30 need a permanent "why is our crap sometimes 29 crap?" link next to the offer to explain 28 to humans what's going on.
When the bot sees 27 that there's crap and the crap is automatically 26 purchased, the recipient is going to be 25 awfully upset that they've paid $10 for 24 a broken toothpick. And then an empty trash 23 bag. And then some dirt from the bottom 22 of your shoe.
If they buy enough of this 21 crap in a relatively short period of time 20 (and you have large disclaimers all over 19 the place explaining why you're doing this), they're 18 going to lose a fair "bag 'o cash" on your 17 "bag 'o crap". Even human intervention 16 on their part (checking to ensure that the 15 crap isn't crap) can fail if you rotate 14 the crap often enough. Heck, maybe the 13 bots will notice and not buy anything that's 12 been in the rotation for too short a time, but 11 that means the humans will buy the non-crap.
Heck, your 10 regular customers might be so amused that 9 you can turn this into a huge marketing 8 win. Start posting how much of the "crap" carp 7 is being sold. People will come back just 6 to see how hard the bots have been bitten.
Update: I 5 expect that you might get a few calls up 4 front with people complaining. I don't 3 think you can stop that entirely. However, if 2 this kills the bots, you can always stop 1 it and restart it later.
Sell the item to non-scripting humans.
Keep 23 the site running at a speed not slowed by 22 bots.
Don't hassle the 'normal' users with 21 any tasks to complete to prove they're human.
You 20 probably don't want to hear this, but #1 19 and #3 are mutually exclusive.
Well, nobody 18 knows you're a bot either. There's no programatic 17 way to tell the whether or not there's a 16 human on the other end of the connection 15 without requiring the person to do something. Preventing 14 scripts/bots from doing stuff on the web 13 is the whole reason CAPTCHAs were invented. It's 12 not like this is some new problem that hasn't 11 seen a lot of effort expended on it. If 10 there were a better way to do it, one that 9 didn't involve the hassle to real users 8 that a CAPTCHA does, everyone would be using 7 it already.
I think you need to face the 6 fact that if you want to keep bots off your 5 ordering page, a good CAPTCHA is the only 4 way to do it. If demand for your random 3 crap is high enough that people are willing 2 to go to these lengths to get it, legitimate 1 users aren't going to be put off by a CAPTCHA.
The method Woot uses to combat this issue 132 is changing the game - literally. When 131 they present an extraordinarily desirable 130 item for sale, they make users play a video 129 game in order to order it.
Not only does 128 that successfully combat bots (they can 127 easily make minor changes to the game to 126 avoid automatic players, or even provide 125 a new game for each sale) but it also gives 124 the impression to users of "winning" the 123 desired item while slowing down the ordering 122 process.
It still sells out very quickly, but 121 I think that the solution is good - re-evaluating 120 the problem and changing the parameters 119 led to a successful strategy where strictly 118 technical solutions simply didn't exist.
Your 117 entire business model is based on "first 116 come, first served." You can't do 115 what the radio stations did (they no longer 114 make the first caller the winner, they make 113 the 5th or 20th or 13th caller the winner) - it 112 doesn't match your primary feature.
No, there 111 is no way to do this without changing the 110 ordering experience for the real users.
Let's 109 say you implement all these tactics. If 108 I decide that this is important, I'll simply 107 get 100 people to work with me, we'll build 106 software to work on our 100 separate computers, and 105 hit your site 20 times a second (5 seconds 104 between accesses for each user/cookie/account/IP 103 address).
You have two stages:
- Watching front page
- Ordering
You can't put 102 a captcha blocking #1 - that's going to 101 lose real customers ("What? I have 100 to solve a captcha each time I want to see 99 the latest woot?!?").
So my little group 98 watches, timed together so we get about 97 20 checks per second, and whoever sees the 96 change first alerts all the others (automatically), who 95 will load the front page once again, follow 94 the order link, and perform the transaction 93 (which may also happen automatically, unless 92 you implement captcha and change it for 91 every wootoff/boc).
You can put a captcha 90 in front of #2, and while you're loathe 89 to do it, that may be the only way to make 88 sure that even if bots watch the front page, real 87 users are getting the products.
But even 86 with captcha my little band of 100 would 85 still have a significant first mover advantage 84 - and there's no way you can tell that we 83 aren't humans. If you start timing our 82 accesses, we'd just add some jitter. We 81 could randomly select which computer was 80 to refresh so the order of accesses changes 79 constantly - but still looks enough like 78 a human.
First, get rid of the simple bots
You need to have an adaptive firewall 77 that will watch requests and if someone 76 is doing the obvious stupid thing - refreshing 75 more than once a second at the same IP then 74 employ tactics to slow them down (drop packets, send 73 back refused or 500 errors, etc).
This should 72 significantly drop your traffic and alter 71 the tactics the bot users employ.
Second, make the server blazingly fast.
You really 70 don't want to hear this... but...
I think 69 what you need is a fully custom solution 68 from the bottom up.
You don't need to mess 67 with TCP/IP stack, but you may need to develop 66 a very, very, very fast custom server that 65 is purpose built to correlate user connections 64 and react appropriately to various attacks.
Apache, lighthttpd, etc 63 are all great for being flexible, but you 62 run a single purpose website, and you really 61 need to be able to both do more than the 60 current servers are capable of doing (both 59 in handling traffic, and in appropriately 58 combating bots).
By serving a largely static 57 webpage (updates every 30 seconds or so) on 56 a custom server you should not only be able 55 to handle 10x the number of requests and 54 traffic (because the server isn't doing 53 anything other than getting the request, and 52 reading the page from memory into the TCP/IP 51 buffer) but it will also give you access 50 to metrics that might help you slow down 49 bots. For instance, by correlating IP addresses 48 you can simply block more than one connection 47 per second per IP. Humans can't go faster 46 than that, and even people using the same 45 NATed IP address will only infrequently 44 be blocked. You'd want to do a slow block 43 - leave the connection alone for a full 42 second before officially terminating the 41 session. This can feed into a firewall 40 to give longer term blocks to especially 39 egregious offenders.
But the reality is that 38 no matter what you do, there's no way to 37 tell a human apart from a bot when the bot 36 is custom built by a human for a single 35 purpose. The bot is merely a proxy for 34 the human.
Conclusion
At the end of the day, you can't 33 tell a human and a computer apart for watching 32 the front page. You can stop bots at the 31 ordering step, but the bot users still have 30 a first mover advantage, and you still have 29 a huge load to manage.
You can add blocks 28 for the simple bots, which will raise the 27 bar and fewer people with bother with it. That 26 may be enough.
But without changing your 25 basic model, you're out of luck. The best 24 you can do is take care of the simple cases, make 23 the server so fast regular users don't notice, and 22 sell so many items that even if you have 21 a few million bots, as many regular users 20 as want them will get them.
You might consider 19 setting up a honeypot and marking user accounts 18 as bot users, but that will have a huge 17 negative community backlash.
Every time I 16 think of a "well, what about doing 15 this..." I can always counter it with 14 a suitable bot strategy.
Even if you make 13 the front page a captcha to get to the ordering 12 page ("This item's ordering button 11 is blue with pink sparkles, somewhere on 10 this page") the bots will simply open 9 all the links on the page, and use whichever 8 one comes back with an ordering page. That's 7 just no way to win this.
Make the servers 6 fast, put in a reCaptcha (the only one I've 5 found that can't be easily fooled, but it's 4 probably way too slow for your application) on 3 the ordering page, and think about ways 2 to change the model slightly so regular 1 users have as good a chance as the bot users.
-Adam
I say expose the price information using 17 an API. This is the unintuitive solution 16 but it does work to give you control over 15 the situation. Add some limitations to the 14 API to make it slightly less functional 13 than the website.
You could do the same for 12 ordering. You could experiment with small 11 changes to the API functionality/performance 10 until you get the desired effect.
There are 9 proxies and botnets to defeat IP checks. There 8 are captcha reading scripts that are extremely 7 good. There are even teams of workers in 6 India who defeat captchas for a small price. Any 5 solution you can come up with can be reasonably 4 defeated. Even Ned Batchelder's solutions 3 can be stepped past by using a WebBrowser 2 control or other simulated browser combined 1 with a botnet or proxy list.
Disclaimer: This answer is completely non-programming-related. It 29 does, however, try to attack the reason 28 for scripts in the first place.
Another idea 27 is if you truly have a limited quantity 26 to sell, why don't you change it from a 25 first-come-first-served methodology? Unless, of 24 course, the hype is part of your marketing 23 scheme.
There are many other options, and 22 I'm sure others can think of some different 21 ones:
an ordering queue (pre-order system) - Some 20 scripts might still end up at the front 19 of the queue, but it's probably faster to 18 just manually enter the info.
a raffle system 17 (everyone who tries to order one is entered 16 into the system) - This way the people with 15 the scripts have just the same chances as 14 those without.
a rush priority queue - If 13 there is truly a high perceived value, people 12 may be willing to pay more. Implement an 11 ordering queue, but allow people to pay 10 more to be placed higher in the queue.
auction 9 (credit goes to David Schmitt for this one, comments 8 are my own) - People can still use scripts 7 to snipe in at the last minute, but not 6 only does it change the pricing structure, people 5 are expecting to be fighting it out with 4 others. You can also do things to restrict 3 the number of bids in a given time period, make 2 people phone in ahead of time for an authorization 1 code, etc.
No matter how secure the Nazi's thought 66 their communications were, the allies would 65 often break their messages. No matter how 64 you try to stop bots from using your site 63 the bot owners will work out a way around 62 it. I'm sorry if that makes you the Nazi 61 :-)
I think a different mindset is required
- Do not try to stop bots from using your site
- Do not go for a fix that works immediately, play the long game
Get 60 into the mindset that it doesn't matter 59 whether the client of your site is a human 58 or a bot, both are just paying customers; but 57 one has an unfair advantage over the other. Some 56 users without much of a social life (hermits) can 55 be just as annoying for your site's other 54 users as bots.
Record the time you publish an offer and the time an account opts to buy it.
This gives you a record of 53 how quickly the client is buying stuff.
Vary the time of day you publish offers.
For 52 example, have a 3 hour window starting 51 at some obscure time of the day (midnight?) Only 50 bots and hermits will constantly refresh 49 a page for 3 hours just to get an order 48 in within seconds. Never vary the base 47 time, only the size of the window.
Over 46 time a picture will emerge.
01: You can see 45 which accounts are regularly buying products 44 within seconds of them going live. Suggesting 43 they might be bots.
02: You can also look 42 at the window of time used for the offers, if 41 the window is 1 hour then some early buyers 40 will be humans. A human will rarely refresh 39 for 4 hours though. If the elapsed time 38 is quite consistent between publish/purchase 37 regardless of the window duration then that's 36 a bot. If the publish/purchase time is 35 short for small windows and gets longer 34 for large windows, that's a hermit!
Now instead 33 of stopping bots from using your site you 32 have enough information to tell you which 31 accounts are certainly used by bots, and 30 which accounts are likely to be used by 29 hermits. What you do with that information 28 is up to you, but you can certainly use 27 it to make your site fairer to people who 26 have a life.
I think banning the bot accounts 25 would be pointless, it would be akin to 24 phoning Hitler and saying "Thanks for the 23 positions of your U-boats!" Somehow you 22 need to use the information in a way that 21 the account owners wont realise. Let's 20 see if I can dream anything up.....
Process orders in a queue:
When 19 the customer places an order they immediately 18 get a confirmation email telling them their 17 order is placed in a queue and will be notified 16 when it has been processed. I experience 15 this kind of thing with order/dispatch on 14 Amazon and it doesn't bother me at all, I 13 don't mind getting an email days later telling 12 me my order has been dispatched as long 11 as I immediately get an email telling me 10 that Amazon knows I want the book. In your 9 case it would be an email for
- Your order has been placed and is in a queue.
- Your order has been processed.
- Your order has been dispatched.
Users think 8 they are in a fair queue. Process your 7 queue every 1 hour so that normal users 6 also experience a queue, so as not to arouse 5 suspicion. Only process orders from bot 4 and hermit accounts once they have been 3 in the queue for the "average human ordering 2 time + x hours". Effectively reducing bots 1 to humans.
We are currently using the latest generation 9 of BigIP load balancers from F5 to do this. The 8 BigIP has advanced traffic management features 7 that can identify scrapersand bots based 6 on frequency and patterns of use even from 5 amongst a set of sources behind a single 4 IP. It can then throttle these, serve them 3 alternative content or simply tag them with 2 headers or cookies so you can identify them 1 in your application code.
How about introducing a delay which requires 18 human interaction, like a sort of "CAPTCHA 17 game". For example, it could be a little 16 Flash game where during 30 seconds they 15 have to burst checkered balls and avoid 14 bursting solid balls (avoiding colour blindness 13 issues!). The game would be given a random 12 number seed and what the game transmits 11 back to the server would be the coordinates 10 and timestamps of the clicked points, along 9 with the seed used.
On the server you simulate 8 the game mechanics using that seed to see 7 if the clicks would indeed have burst the 6 balls. If they did, not only were they human, but 5 they took 30 seconds to validate themselves. Give 4 them a session id.
You let that session id 3 do what it likes, but if makes too many 2 requests, they can't continue without playing 1 again.
First, let me recap what we need to do here. I 161 realize I'm just paraphrasing the original 160 question, but it's important that we get 159 this 100% straight, because there are a 158 lot of great suggestions that get 2 or 3 157 out of 4 right, but as I will demonstrate, you 156 will need a multifaceted approach to cover 155 all of the requirements.
Requirement 1: Getting rid of the 'bot slamming':
The rapid-fire 'slamming' of 154 your front page is hurting your site's performance 153 and is at the core of the problem. The 'slamming' comes 152 from both single-IP bots and - supposedly 151 - from botnets as well. We want to get rid 150 of both.
Requirement 2: Don't mess with the user experience:
We could fix the bot situation pretty 149 effectively by implementing a nasty verification 148 procedure like phoning a human operator, solving 147 a bunch of CAPTCHAs, or similar, but that 146 would be like forcing every innocent airplane 145 passenger to jump through crazy security 144 hoops just for the slim chance of catching 143 the very stupidest of terrorists. Oh wait 142 - we actually do that. But let's see if 141 we can not do that on woot.com.
Requirement 3: Avoiding the 'arms race':
As you mention, you 140 don't want to get caught up in the spambot 139 arms race. So you can't use simple tweaks 138 like hidden or jumbled form fields, math 137 questions, etc., since they are essentially 136 obscurity measures that can be trivially 135 autodetected and circumvented.
Requirement 4: Thwarting 'alarm' bots:
This may be 134 the most difficult of your requirements. Even 133 if we can make an effective human-verification 132 challenge, bots could still poll your front 131 page and alert the scripter when there is 130 a new offer. We want to make those bots 129 infeasible as well. This is a stronger version 128 of the first requirement, since not only 127 can't the bots issue performance-damaging 126 rapid-fire requests -- they can't even issue 125 enough repeated requests to send an 'alarm' to 124 the scripter in time to win the offer.
Okay, so 123 let's se if we can meet all four requirements. First, as 122 I mentioned, no one measure is going to 121 do the trick. You will have to combine a 120 couple of tricks to achieve it, and you 119 will have to swallow two annoyances:
- A small number of users will be required to jump through hoops
- A small number of users will be unable to get the special offers
I realize 118 these are annoying, but if we can make the 117 'small' number small enough, I hope you will agree the 116 positives outweigh the negatives.
First measure: User-based throttling:
This one 115 is a no-brainer, and I'm sure you do it 114 already. If a user is logged in, and keeps 113 refreshing 600 times a second (or something), you 112 stop responding and tell him to cool it. In 111 fact, you probably throttle his requests 110 significantly sooner than that, but you 109 get the idea. This way, a logged-in bot 108 will get banned/throttled as soon as it 107 starts polling your site. This is the easy 106 part. The unauthenticated bots are our real 105 problem, so on to them:
Second measure: Some form of IP throttling, as suggested by nearly everyone:
No matter what, you 104 will have to do some IP based throttling to 103 thwart the 'bot slamming'. Since it seems 102 important to you to allow unauthenticated 101 (non-logged-in) visitors to get the special 100 offers, you only have IPs to go by initially, and 99 although they're not perfect, they do work 98 against single-IP bots. Botnets are a different 97 beast, but I'll come back to those. For 96 now, we will do some simple throttling to 95 beat rapid-fire single-IP bots.
The performance 94 hit is negligable if you run the IP check 93 before all other processing, use a proxy 92 server for the throttling logic, and store 91 the IPs in a memcached lookup-optimized 90 tree structure.
Third measure: Cloaking the throttle with cached responses:
With rapid-fire single-IP 89 bots throttled, we still have to address 88 slow single-IP bots, ie. bots that are specifically 87 tweaked to 'fly under the radar' by spacing 86 requests slightly further apart than the 85 throttling prevents.
To instantly render 84 slow single-IP bots useless, simply use 83 the strategy suggested by abelenky: serve 82 10-minute-old cached pages to all IPs that 81 have been spotted in the last 24 hours (or 80 so). That way, every IP gets one 'chance' per 79 day/hour/week (depending on the period you 78 choose), and there will be no visible annoyance 77 to real users who are just hitting 'reload', except 76 that they don't win the offer.
The beauty 75 of this measure is that is also thwarts 'alarm 74 bots', as long as they don't originate from 73 a botnet.
(I know you would probably prefer 72 it if real users were allowed to refresh 71 over and over, but there is no way to tell 70 a refresh-spamming human from a request-spamming 69 bot apart without a CAPTCHA or similar)
Fourth measure: reCAPTCHA:
You 68 are right that CAPTCHAs hurt the user experience 67 and should be avoided. However, in _one_ situation 66 they can be your best friend: If you've 65 designed a very restrictive system to thwart 64 bots, that - because of its restrictiveness 63 - also catches a number of false positives; then 62 a CAPTCHA served as a last resort will allow those real 61 users who get caught to slip by your throttling 60 (thus avoiding annoying DoS situations).
The 59 sweet spot, of course, is when ALL the bots 58 get caught in your net, while extremely 57 few real users get bothered by the CAPTCHA.
If 56 you, when serving up the 10-minute-old cached 55 pages, also offer an alternative, optional, CAPTCHA-verified 54 'front page refresher', then humans who 53 really want to keep refreshing, can still do so 52 without getting the old cached page, but 51 at the cost of having to solve a CAPTCHA 50 for each refresh. That is an annoyance, but an optional one just 49 for the die-hard users, who tend to be more 48 forgiving because they know they're gaming the 47 system to improve their chances, and that 46 improved chances don't come free.
Fifth measure: Decoy crap:
Christopher 45 Mahan had an idea that I rather liked, but 44 I would put a different spin on it. Every 43 time you are preparing a new offer, prepare 42 two other 'offers' as well, that no human 41 would pick, like a 12mm wingnut for $20. When 40 the offer appears on the front page, put 39 all three 'offers' in the same picture, with 38 numbers corresponding to each offer. When 37 the user/bot actually goes on to order the 36 item, they will have to pick (a radio button) which 35 offer they want, and since most bots would 34 merely be guessing, in two out of three 33 cases, the bots would be buying worthless 32 junk.
Naturally, this doesn't address 'alarm 31 bots', and there is a (slim) chance that 30 someone could build a bot that was able 29 to pick the correct item. However, the risk 28 of accidentally buying junk should make 27 scripters turn entirely from the fully automated 26 bots.
Sixth measure: Botnet Throttling:
[deleted]
Okay............ I've now 25 spent most of my evening thinking about 24 this, trying different approaches.... global 23 delays.... cookie-based tokens.. queued 22 serving... 'stranger throttling'.... And 21 it just doesn't work. It doesn't. I realized 20 the main reason why you hadn't accepted 19 any answer yet was that noone had proposed 18 a way to thwart a distributed/zombie net/botnet 17 attack.... so I really wanted to crack it. I 16 believe I cracked the botnet problem for 15 authentication in a different thread, so I had high hopes 14 for your problem as well. But my approach 13 doesn't translate to this. You only have 12 IPs to go by, and a large enough botnet 11 doesn't reveal itself in any analysis based 10 on IP addresses.
So there you have it: My sixth measure is naught. Nothing. Zip. Unless 9 the botnet is small and/or fast enough to 8 get caught in the usual IP throttle, I don't 7 see any effective measure against botnets that 6 doesn't involve explicit human-verification 5 such as CAPTHAs. I'm sorry, but I think 4 combining the above five measures is your 3 best bet. And you could probably do just 2 fine with just abelenky's 10-minute-caching 1 trick alone.
There are a few other / better solutions 22 already posted, but for completeness, I 21 figured I'd mention this:
If your main concern 20 is performance degradation, and you're looking 19 at true hammering, then you're actually dealing with 18 a DoS attack, and you should probably try 17 to handle it accordingly. One common approach 16 is to simply drop packets from an IP in 15 the firewall after a number of connections 14 per second/minute/etc. For example, the 13 standard Linux firewall, iptables, has a 12 standard operation matching function 'hashlimit', which 11 could be used to correlate connection requests 10 per time unit to an IP-address.
Although, this 9 question would probably be more apt for 8 the next SO-derivate mentioned on the last 7 SO-podcast, it hasn't launched yet, so I 6 guess it's ok to answer :)
EDIT:
As pointed out 5 by novatrust, there are still ISPs actually 4 NOT assigning IPs to their customers, so 3 effectively, a script-customer of such an 2 ISP would disable all-customers from that 1 ISP.
- Provide an RSS feed so they don't eat up your bandwidth.
- When buying, make everyone wait a random amount of time of up to 45 seconds or something, depending on what you're looking for exactly. Exactly what are your timing constraints?
- Give everyone 1 minute to put their name in for the drawing and then randomly select people. I think this is the fairest way.
- Monitor the accounts (include some times in the session and store it?) and add delays to accounts that seem like they're below the human speed threshold. That will at least make the bots be programmed to slow down and compete with humans.
0
First of all, by definition, it is impossible 119 to support stateless, i.e. truly anonymous, transactions 118 while also being able to separate the bots 117 from legitimate users.
If we can accept a 116 premise that we can impose some cost on 115 a brand-spanking-new woot visitor on his 114 first page hit(s), I think I have a possible 113 solution. For lack of a better name, I'm 112 going to loosely call this solution "A visit 111 to the DMV."
Let's say that there's a car 110 dealership that offers a different new car 109 each day, and that on some days, you can 108 buy an exotic sports car for $5 each (limit 107 3), plus a $5 destination charge.
The catch 106 is, the dealership requires you to visit 105 the dealership and show a valid driver's 104 license before you're allowed in through 103 the door to see what car is on sale. Moreover, you 102 must have said valid driver's license in 101 order to make the purchase.
So, the first-time 100 visitor (let's call him Bob) to this car 99 dealer is refused entry, and is referred 98 to the DMV office (which is conveniently 97 located right next door) to obtain a driver's 96 license.
Other visitors with a valid driver's 95 license is allowed in, after showing his 94 driver's license. A person who makes a 93 nuisance of himself by loitering around 92 all day, pestering the salesmen, grabbing 91 brochures, and emptying the complimentary 90 coffee and cookies will eventually be turned 89 away.
Now, back to Bob without the license 88 -- all he has to do is endure the visit 87 to the DMV once. After that, he can visit 86 the dealership and buy cars anytime he likes, unless 85 he accidentally left his wallet at home, or 84 his license is otherwised destroyed or revoked.
The 83 driver's license in this world is nearly 82 impossible to forge.
The visit to the DMV 81 involves first getting the application form 80 at the "Start Here" queue. Bob has to take 79 the completed application to window #1, where 78 the first of many surly civil servants will 77 take his application, process it, and if 76 everything is in order, stamp the application 75 for the window and send him to the next 74 window. And so, Bob goes from windows to 73 window, waiting for each step of his application 72 to go through, until he finally gets to 71 the end and receives his drivere's license.
There's 70 no point in trying to "short circuit" the 69 DMV. If the forms are not filled out correctly 68 in triplicate, or any wrong answers given 67 at any window, the application is torn up, and 66 the hapless customer is sent back to the 65 start.
Interestingly, no matter how full 64 or empty the office is, it takes about the 63 same amount of time to get serviced at each 62 successive window. Even when you're the 61 only person in line, it seems that the personnel 60 likes to make you wait a minute behind the 59 yellow line before uttering, "Next!"
Things 58 aren't quite so terrible at the DMV, however. While 57 all the waiting and processing to get the 56 license is going on, you can watch a very 55 entertaining and informative infomercial 54 for the car dealership while you're in the 53 DMV lobby. In fact, the infomerical runs 52 just long enough to cover the amount of 51 time you spend getting your license.
The 50 slightly more technical explanation:
As I 49 said at the very top, it becomes necessary 48 to have some statefulness on the client-server 47 relationship which allows you to separate 46 humans from bots. You want to do it in 45 a way that doesn't overly penalize the anonymous 44 (non-authenticated) human visitor.
This approach 43 probably requires an AJAX-y client-side 42 processing. A brand-spanking-new visitor 41 to woot is given the "Welcome New User!" page 40 full of text and graphics which (by appropriate 39 server-side throttling) takes a few seconds 38 to load completely. While this is happening 37 (and the visitor is presumably busy reading 36 the welcome page(s)), his identifying token 35 is slowly being assembled.
Let's say, for 34 discussion, the token (aka "driver's license) consists 33 of 20 chunks. In order to get each successive 32 chunk, the client-side code must submit 31 a valid request to the server. The server 30 incorporates a deliberate delay (let's say 29 200 millisecond), before sending the next 28 chunk along with the 'stamp' needed to make 27 the next chunk request (i.e., the stamps 26 needed to go from one DMV window to the 25 next). All told, about 4 seconds must elapse 24 to finish the chunk-challenge-response-chunk-challenge-response-...-chunk-challenge-response-completion 23 process.
At the end of this process, the 22 visitor has a token which allows him to 21 go to the product description page and, in 20 turn, go to the purchasing page. The token 19 is a unique ID to each visitor, and can 18 be used to throttle his activities.
On the 17 server side, you only accept page views 16 from clients that have a valid token. Or, if 15 it's important that everyone can ultimately 14 see the page, put a time penalty on requests 13 that is missing a valid token.
Now, for this 12 to be relatiely benign to the legitimate 11 human visitor,t make the token issuing process 10 happen relatively non-intrusively in the 9 background. Hence the need for the welcome 8 page with entertaining copy and graphics 7 that is deliberately slowed down slightly.
This 6 approach forces a throttle-down of bots 5 to either use an existing token, or take 4 the minimum setup time to get a new token. Of 3 course, this doesn't help as much against 2 sophisticated attacks using a distributed 1 network of faux visitors.
Write a reverse-proxy on an apache server 13 in front of your application which implements 12 a Tarpit (Wikipedia Article) to punish bots. It would simply 11 manage a list of IP addresses that connected 10 in the last few seconds. You detect a burst 9 of requests from a single IP address and 8 then exponentially delay those requests 7 before responding.
Of course, multiple humans 6 can come from the same IP address if they're 5 on a NAT'd network connection but it's unlikely 4 that a human would mind your response time 3 going for 2mS to 4mS (or even 400mS) whereas 2 a bot will be hampered by the increasing 1 delay pretty quickly.
I'm not seeing the great burden that you 23 claim from checking incoming IPs. On the 22 contrary, I've done a project for one of 21 my clients which analyzes the HTTP access 20 logs every five minutes (it could have been 19 real-time, but he didn't want that for some 18 reason that I never fully understood) and 17 creates firewall rules to block connections 16 from any IP addresses that generate an excessive 15 number of requests unless the address can 14 be confirmed as belonging to a legitimate 13 search engine (google, yahoo, etc.).
This 12 client runs a web hosting service and is 11 running this application on three servers 10 which handle a total of 800-900 domains. Peak 9 activity is in the thousand-hits-per-second 8 range and there has never been a performance 7 issue - firewalls are very efficient at 6 dropping packets from blacklisted addresses.
And, yes, DDOS 5 technology definitely does exist which would 4 defeat this scheme, but he's not seeing 3 that happen in the real world. On the contrary, he 2 says it's vastly reduced the load on his 1 servers.
My approach would be to focus on non-technological 16 solutions (otherwise you're entering an 15 arms race you'll lose, or at least spend 14 a great deal of time and money on). I'd 13 focus on the billing/shipment parts - you 12 can find bots by either finding multiple 11 deliveries to same address or by multiple 10 charges to a single payment method. You 9 can even do this across items over several 8 weeks, so if a user got a previous item 7 (by responding really really fast) he may 6 be assigned some sort of "handicap" this 5 time around.
This would also have a side 4 effect (beneficial, I would think, but I 3 could be wrong marketing-wise for your case) of 2 perhaps widening the circle of people who 1 get lucky and get to purchase woot.
You can't totally prevent bots, even with 29 a captcha. However you can make it a pain 28 to write and maintain a bot and therefore 27 reduce the number. Particularly by forcing 26 them to update their bots daily you'll be 25 causing most to lose interest.
Here are a 24 some ideas to make it harder to write bots:
Require 23 running a javascript function. Javascript 22 makes it much more of a pain to write a 21 bot. Maybe require a captcha if they aren't 20 running javascript to still allow actual 19 non-javascript users (minimal).
Time the 18 keystrokes when typing into the form (again 17 via javascript). If it's not human-like 16 then reject it. It's a pain to mimic human 15 typing in a bot.
Write your code to update 14 your field ID's daily with a new random 13 value. This will force them to update their 12 bot daily which is a pain.
Write your code 11 to re-order your fields on a daily basis 10 (obviously in some way that's not random 9 to your users). If they're relying on the 8 field order, this will trip them up and 7 again force daily maintenance to their bot 6 code.
You could go even further and use Flash 5 content. Flash is totally a pain to write 4 a bot against.
Generally if you start taking 3 a mindset of not preventing them, but making 2 it more work for them, you can probably 1 achieve the goal you're looking for.
Stick a 5 minute delay on all product announcements 3 for unregistered users. Casual users won't 2 really notice this and noncasual users will 1 be registered anyhow.
Time-block user agents that make so-many 11 requests per minute. Eg if you've got somebody 10 requesting a page exactly every 5 seconds 9 for 10 minutes, they're probably not a user... But 8 it could be tricky to get this right.
If 7 they trigger an alert, redirect every request 6 to a static page with as little DB-IO as 5 possible with a message letting them know 4 they'll be allowed back on in X minutes.
It's 3 important to add that you should probably 2 only apply this on requests for pages and 1 ignore all the requests for media (js, images, etc).
Preventing DoS would defeat #2 of @davebug's 7 goals he outlined above, "Keep the site 6 at a speed not slowed by bots" but wouldn't 5 necessary solve #1, "Sell the item to non-scripting 4 humans"
I'm sure a scripter could write something 3 to skate just under the excessive limit 2 that would still be faster than a human 1 could go through the ordering forms.
All right so the spammers are out competing 5 regular people to win the "bog of crap" auction? Why 4 not make the next auction be a literal "bag 3 of crap"? The spammers get to pay good money 2 for a bag full of doggy do, and we all laugh 1 at them.
The important thing here is to change the 24 system to remove load from your server, prevent 23 bots from winning the bag of crap WITHOUT 22 letting the botlords know you are gaming 21 them or they will revise their strategy. I 20 don't think there is any way to do this 19 without some processing at your end.
So 18 you record hits on your home page. Whenever 17 someone hits the page that connection is 16 compared to its last hit, and if it was 15 too quick then it is sent a version of the 14 page without the offer. This can be done 13 by some sort of load balancing mechanism 12 that sends bots (the hits that are too fast) to 11 a server that simply serves cached versions 10 of your home page; real people get sent 9 to the good server. This takes the load 8 off the main server and makes the bots think 7 that they are still being served the pages 6 correctly.
Even better if the offer can 5 be declined in some way. Then you can still 4 make the offers on the faux server but when 3 the bot fills out the form say "Sorry, you 2 weren't quick enough" :) Then they will 1 definitely think they are still in the game.
Most purely technical solutions have already 28 been offered. I'll therefore suggest another 27 view of the problem.
As I understand it, the 26 bots are set up by people genuinely trying to buy 25 the bags you're selling. The problem is 24 -
- Other people, who don't operate bots, deserve a chance to buy, and you're offering a limited amount of bags.
- You want to attract humans to your site and just sell the bags.
Instead of trying to avoid the bots, you 23 can enable potential bag-buyers to subscribe 22 to an email, or even SMS update, to get 21 notified when a sell will take place. You 20 can even give them a minute or two head 19 start (a special URL where the sell starts, randomly 18 generated, and sent with the mail/SMS).
When 17 these buyers go to buy they're in you're 16 site, you can show them whatever you want 15 in side banners or whatever. Those running 14 the bots will prefer to simply register 13 to your notification service.
The bots runners 12 might still run bots on your notification 11 to finish the buy faster. Some solutions 10 to that can be offering a one-click buy.
By 9 the way, you mentioned your users are not 8 registered, but it sounds like those buying 7 these bags are not random buyers, but people 6 who look forward to these sales. As such, they 5 might be willing to register to get an advantage 4 in trying to "win" a bag.
In essence what 3 I'm suggesting is try and look at the problem 2 as a social one, rather than a technical 1 one.
Asaf
You could try to make the price harder for 13 scripts to read. This is achieved most 12 simply by converting it to an image, but 11 a text recognition algorithm could still 10 get around this. If enough scripters get 9 around it, you could try applying captcha-like 8 things to this image, but obviously at the 7 cost of user experience. Instead of an 6 image, the price could go in a flash app.
Alternately, you 5 could try to devise a way to "shuffle" the 4 HTML pf a page in some way that doesn't 3 affect the rendering. I can't think of 2 a good example off the top of my head, but 1 I'm sure it's somehow doable.
How about this: Create a form to receive 10 an email if a new item is on sale and add 9 a catching system that will serve the same 8 content to anyone refreshing in less than 7 X seconds.
This way you win all the escenarios: you 6 get rid of the scrapers(they can scrape 5 their email account) and you give chance 4 to the people who wont code something just 3 to buy in your site! Im sure i would get 2 the email in my mobile and log in to buy 1 something if i really wanted to.
How do you know there are scripters placing orders?
The crux of your problem is that you can't 6 separate the scripters from the legitimate 5 users and therefore can't block them, so 4 how is it that you know there are scripters 3 at all?
If you have a way to answer this 2 question, then you have a set of characteristics 1 you can use to filter the scripters.
Let's turn the problem on its head - you 20 have bots buying stuff that you want real 19 people to buy, how about making a real chance 18 that the bots will buy stuff that you don't want 17 the real people to buy.
Have a random chance 16 for some non displayed html that the scraping 15 bots will think is the real situation, but 14 real people won't see (and don't forget 13 that real people includes the blind, so 12 consider screen readers etc as well), and 11 this travels through to purchase something 10 exorbitantly expensive (or doesn't make 9 the actual purchase, but gets payment details 8 for you to put on a banlist).
Even if the 7 bots switch to 'alert the user' rather than 6 'make the purchase', if you can get enough 5 false alarms, you may be able to make it 4 sufficiently worthless for people (maybe 3 not everyone, but some reduction in the 2 scamming is better than none at all) not 1 to bother.
Just a side-remark: it seems to me that 5 the problem is, that your user expected 4 behaviour is very similar to a bot (come 3 in big waves, unautheticated, click every 2 button :)), so the Captcha might be the 1 only turing test able to discern it :)).
Not a complete fix, but I didn't see it 8 here yet.
Track the "slamming" addresses, and 7 put up a disclaimer saying that BOC/ items 6 will not be shipped to any address that 5 is not following your TOS.
This will have 4 psych impact on some, and others who want 3 to take advantage of your site will have 2 to switch up methods, but you will have 1 negated one avenue for them.
As suggested above, I did some work on non-captcha 6 forms by using a pre-calculated hash of 5 the expected value of a result stored in 4 the form. The idea works for two Wordpress 3 anti-spam plugins: WP-Morph and WP-HashCash. The only drawback 2 is the client browser having to be able 1 to interpret JavaScript.
So your problem is too much business? People 12 are sniping your sales? This is assuming 11 that these scripters are generating qualified 10 sales? And the issue is they are snapping 9 up all your product before everyone else 8 does?
How about you make a full webservice 7 API for 'scripters' to interface with. Then 6 offer a slight discount or some kind of 5 perk to make them play by your rules. Double 4 your business and have your web sales and 3 API sales.
Either that or just get WAY more 2 inventory - you can't fight it - embrace 1 and adapt to it.
Here's my take. Attack the ROI of the bot 103 owners, so that they'll instead do the legitimate 102 thing you want them to do instead of cheating. Let's 101 look at it from their point of view. What 100 are their assets? Apparently, an unlimited 99 number of disposable machines, IP addresses, and 98 perhaps even a large number of unskilled 97 people willing to do inane tasks. What do 96 they want? To always get the special deal 95 you are offering before other legitimate 94 people get it.
The good news is that they 93 only have a limited window of time in which 92 to win the race. And what I don't think 91 they have is an unlimited number of smart 90 people who are on call to reverse engineer 89 your site at the moment you unleash a deal. So 88 if you can make them jump through a specific 87 hoop that is hard for them to figure out, but 86 automatic for your legitimate customers 85 (they won't even know it's there), you can 84 delay their efforts just enough that they 83 get beat by the massive number of real people 82 who are just dying to get your hot deal.
The 81 first step is to make your notion of authentication 80 non-binary, by which I mean that, for any 79 given user, you have a probability assigned 78 to them that they are a real person or a 77 bot. You can use a number of hints to build 76 up this probability, many of which have 75 been discussed already on this thread: suspicious 74 rate activity, IP addresses, foreign country 73 geolocation, cookies, etc. My favorite is 72 to just pay attention to the exact version 71 of windows they are using. More importantly, you 70 can give your long-term customers a clear 69 way to authenticate with strong hints: by 68 engaging with the site, making purchases, contributing 67 to forums, etc. It's not required that you 66 do those things, but if you do then you'll 65 have a slight advantage when it comes time 64 to see special deals.
Whenever you are called 63 upon to make an authentication decision, use 62 this probability to make the computer you're 61 talking to do more-or-less work before you 60 will give them what they want. For example, perhaps 59 some javascript on your site requires the 58 client to perform a computationally expensive 57 task in the background, and only when that 56 task completes will you let them know about 55 the special deal. For a regular customer, this 54 can be pretty quick and painless, but for 53 a scammer it means they need a lot more 52 computers to maintain constant coverage 51 (since each computer has to do more work). Then 50 you can use your probability score from 49 above to increase the amount of work they 48 have to do.
To make sure this delay doesn't 47 cause any fairness problems, I'd recommend 46 making it be some kind of encryption task 45 that includes the current time of day from 44 the person's computer. Since the scammer 43 doesn't know what time the deal will start, he 42 can't just make something up, he has to 41 use something close to the real time of 40 day (you can ignore any requests that claim 39 to come in before the deal started). Then 38 you can use these times to adjust the first-come-first-served 37 rule, without the real people ever having 36 to know anything about it.
The last idea 35 is to change the algorithm required to generate 34 the work whenever you post a new deal (and 33 at random other times). Every time you do 32 that, normal humans will be unaffected, but 31 bots will stop working. They'll have to 30 get a human to get to work on the reverse-engineering, which 29 hopefully will take longer than your deal 28 window. Even better is if you never tell 27 them if they submitted the right result, so 26 that they don't get any kind of alert that 25 they are doing things wrong. To defeat this 24 solution, they will have to actually automate 23 a real browser (or at least a real javascript 22 interpreter) and then you are really jacking 21 up the cost of scamming. Plus, with a real 20 browser, you can do tricks like those suggested 19 elsewhere in this thread like timing the 18 keystrokes of each entry and looking for 17 other suspicious behaviors.
So for anyone 16 who you know you've seen before (a common 15 IP, session, cookie, etc) you have a way 14 to make each request a little more expensive. That 13 means the scammers will want to always present 12 you with your hardest case - a brand-new 11 computer/browser/IP combo that you've never 10 seen before. But by putting some extra work 9 into being able to even know if they have 8 the bot working right, you force them to 7 waste a lot of these precious resources. Although 6 they may really have an infinite number, generating 5 them is not without cost, and again you 4 are driving up the cost part of their ROI 3 equation. Eventually, it'll be more profitable 2 for them to just do what you want :)
Hope 1 that's helpful,
Eric
Use hashcash.
Hashcash is a denial-of-service counter 4 measure tool. Its main current use is to 3 help hashcash users avoid losing email due 2 to content based and blacklist based anti-spam 1 systems.
Why not make the content the CAPTCHA?
On 13 the page where you display the prize, always 12 have an image file in the same location 11 with the same name, when a bag o crap sale 10 is on, dynamically generate and load an 9 image with the text etc advertising the 8 prize, when no sale is on just have some 7 default image that integrates well with 6 the site. Seems like its the same concept 5 as CAPTCHA... if the bot cannot figure out 4 the meaning of the image they will not be 3 able to "win" it, if they can they would 2 have been able to figure out your CAPTCHA 1 images anyways.
I don't know if this has been suggested 7 yet, but rather than keeping a list of IP's 6 of the bots, which you would need to scan 5 through on every single page request, why 4 not set a cookie or a session var to keep 3 track of the bots? Here's an example in 2 PHP:
<?php
// bot check
$now = microtime(true);
// bot counter var
$botCounter = 0;
if (array_key_exists('botCheck_panicCounter', $_REQUEST))
{
$botCounter = $_REQUEST['botCheck_panicCounter'];
}
// if this seems to be a bot
if ($botCounter > 5)
{
die('Die()!!');
}
// if this user visited before
if (array_key_exists('botCheck_lastVisit', $_REQUEST))
{
$lastVisit = $_SESSION['botCheck_lastVisit'];
$diff = $now - $lastVisit;
// if it's less than a second
if ($diff < 1)
{
// increase the bot counter
$botCounter += 1;
// and save it
$_REQUEST['botCheck_panicCounter'] = $botCounter;
}
}
// set the var for future use
$_SESSION['botCheck_lastVisit'] = $now;
// ---------------
// rest of the content goes here
?>
I didn't check for syntax errors, but 1 you get the idea.
First of all don't try to use technology 48 to defeat technology.
Your issues:
- Usability of the site
- List making the site exciting and fun
- Load on server caused by scripters.
Your Goals:
- Keep the site running at a speed not slowed by bots.
- Sell the item to non-scripting humans.
- Don't hassle the 'normal' users with any tasks to complete to prove they're human.
Goal #1: Keep the site running at a speed not slowed by bots.
This 47 is actually pretty simple. Have someone 46 else host the page. Instead of the front 45 page being hosted on your servers, have 44 Amazon S3 / Akamai host the page. Most of 43 the page is 'static' anyhow. Regenerate 42 the page every 5 minutes or so the more 41 dynamic items get refreshed. (Hell, regenerate 40 it every 1 minute if you want). But now 39 the bots are not hitting your server - they 38 are hitting Akamai's CDN which can certainly 37 take the load.
Of course do this for RSS 36 feeds as well. There is no reason why some 35 other service can't take the bandwidth / load 34 hit for you. On a related note, have all 33 images served by Akamai, etc. Why take the 32 hit?
Goal #2: Sell the item to non-scripting humans
I am in agreement with others that say 31 make it so that scripting gives no real 30 advantage. However, scripting is also a 29 sign of a passionate woot customer, so you 28 don't want to be an a*hole either.
So I would 27 say let them buy but make them pay an inflated 26 amount (or more preferably) just slow them 25 down so that others have a chance.
So each 24 time a user hits the site offer the bag 23 of crap at $29.99 and have a timer at a 22 random speed drop or raise the price. Have 21 an image or some other indicator that tells 20 humans if the price will go lower if they 19 are patient.
The user has a "Buy now!" button 18 that they click when they see price/# items 17 being what they want.
Example:
User:
- 0 sec $29.99 (1 item) Image says:"Wait for a lower price!"
- 7 sec $31.99 (1 item) Image says:"Wait for a lower price!"
- 13 sec $27.99 (1 item) Image says:"Bet you can do better!"
- 16 sec $1.99 (0 item) Image says:"You would be nuts to pay us something for nothing!"
- 21 sec $4.99 (two items) Image says:"Thats getting better!"
- 24 sec $4.99 (tres itemos) Image says:"It doesn't get any better than that!"
- 26 sec $8.99 (2 items) Image says:"Bet you can do better!"
repeat....
on 16 a gradually tightening cycle that will lengthen 15 the time the correct "$4.99 (tres itemos)" is 14 displayed
If the bot hits refresh then the 13 cycle restarts. If the user, misses and 12 selects the wrong # of items / price -- decide 11 if you want to let them buy at that price.
If 10 they "overspend" for example, they 9 pay $24.99 for 3 items and woot was only 8 going to charge them $4.99 for 3 items then 7 include a coupon for $20 off their next 6 woot purchase.
Goal #3: Don't hassle the 'normal' users with any tasks to complete to prove they're human.
You are making a logical 5 fallacy here. You are assuming that any 4 Turing test (http://en.wikipedia.org/wiki/Turing_test ) has to be irritating. This 3 is not true!
Here are some ideas:
- Create a game. The reward for playing the game is a $5 off coupon on the next order.
- Pair up 2 random users and have them chat with each other. Each user is told to answer 2 questions to the other user : "Ask what color is the your hair ?" and "What are you going to do next weekend?" Some users get paired with a woot random sentence generator. Each user is then asked if the other user is a human. If a user says the woot random sentence generator is human then reply "No I am not and may be you are from Mars as well. Do you want to try again?"
- Simple flash game that requires the user to maneuver through an obstacle course to get a discount coupon.
- Ask what city they are in. The reverse geo-code the ip address to see if they are close to being correct.
- Ask silly questions - "Do you think John McCain is a great president?" "Whose picture is on your driver's license?"
Only ask 2 3 times since all you really want to do 1 is slow down the script kidees.
I agree with the poster above who said about 19 sometimes selling really 'crap' bags of 18 crap.
You appear to have come up with a business 17 model which is serverly limited by the technology 16 through which you are trying to deliver 15 it. Yet like most tech minded individuals 14 (not a crticism, after all that is the what 13 this site is for) you are trying to come 12 up with a technical solution. BUT THIS 11 IS A BUSINESS PROBLEM. This is being caused 10 by a failure in the technology, but that 9 does not mean that technology is the answer. And 8 most all solutions that anyone comes up 7 with (and there will be many options) will 6 in the end by bypassed by those determined 5 to 'auto-buy' (for want of a better short 4 description) your 'bags of crap'.
IMHO you 3 are asking the wrong people the wrong question 2 and you are going to waste a lot of time 1 and resource on the wrong solution.
I'm in agreement with OP here - no captcha's 25 please - it's not a very woot way of doing 24 things.
Firstly set a few bot traps. I'd 23 mention BOC more often on the home page, to 22 trap the bots into looking as bots aren't 21 intelligent, so again wording different 20 each time e.g. "BOC complaints up!" - so 19 bots just scanning for keywords will get 18 trapped.
However, I think the real issue 17 here is twofold, firstly the performance 16 issues that you have need to be addressed, today 15 it's bots causing a problem, but it indicates 14 to me that there is a performance issue 13 to be addressed.
Secondly it's a business 12 opportunity to shift some real crap at a 11 profit. So I'd keep with the overall woot 10 style and state "we check for bots. If we 9 think you are a bot you will get a box of 8 botcrap."
The bot checking would be done 7 offline sometime after the sale has been 6 made, using bot traps, IP numbers, cookies, sessions, browser 5 strings etc. Do some serious analysis with 4 the data that you've got of purchasers to 3 decide who gets botcrap. If you decide to 2 ship botcrap - then you can free up some 1 normal crap to sell to someone else.
Some ideas:
Simple: don't name it "Random 28 Crap." Change the name of the item every 27 time so that the bots will have a harder 26 time identifying it. They may still look 25 for the $1.00 items, in which case I suggest 24 occasionally selling $1 sticks of gum for 23 a few minutes. The $5 shipping should make 22 it worth your while.
Harder: don't make the 21 users do anything extra - make the users' computers 20 do something extra. Write a JavaScript 19 function that performs an intensive calculation 18 taking a good amount of processing power 17 - say, the ten-millionth prime number - and 16 have the user's computer calculate that 15 value and pass it back before you accept 14 the order (perhaps even to create the "place 13 order" URL). Change the function for every 12 BoC so that bots can't pre-calculate and 11 cache results (but so that you can). The 10 calculation overhead might just slow down 9 the bots enough to keep them off your backs 8 - if nothing else, it would slow the hits 7 on your servers so that they could breathe. You 6 could also vary the depth of the calculation 5 - ten-millionth prime versus hundred-millionth 4 - at random so that the ordering process 3 is no longer strictly first-come, first 2 served, and to avoid penalizing customers 1 with slower computers.
- E
Upfront caveats:
I'm not script-literate; I 30 haven't read many of the other comments 29 here.
I stumbled on this from the Woot 28 description this morning. I thought a few 27 comments from a moderate user of the woot 26 sites (and two-time manual purchaser of 25 BOCs) might be helpful.
Woot is in a unique 24 position where it is both a commerce site and 23 a destination with loyal users, and I understand 22 the perceived delicacy of that balance. But 21 personally I feel your concern about "negative 20 user impact" of a Crap-CAPCHA ("CRAPCHA" - somehow 19 I doubt I'm the first to make that gag) on 18 users is way overstated. As a user I'd 17 be happy to prove I'm human. And I trust 16 Woot to make the process fun and interesting, integrating 15 it into the overall experience.
Will this 14 lead to the "arms race" posited? I dunno, but 13 it can only help. If, say, key information 12 to purchase is included in the product image 11 or implied in the product description (in 10 a different way each time), about the best 9 a script could do would be to open a purchase 8 page on detection of the C-word. Actually, I 7 think this is fine: you are still required 6 to be on-line and first-come-first-served 5 still applies -- Wootalyzer and similar 4 tools just increase awareness rather than automating 3 purchase while I sleep or work.
Good luck 2 figuring this out, and keep up the good 1 work.
JGM
How about selling RSA keys to each user 3 :) Hey, if they can do it for WoW, you guys 2 should be able to do it.
I expect a BoC for 1 my answer ;)
Two solutions, one high-tech, one low-tech.
First 34 the high-tech: The BOC offerings sell out 33 in a seconds because bots get many of them 32 in the first few milliseconds. So instead 31 of trying to defeat the bots, sell them 30 what they are scanning for: a bag of crap. Worthless 29 crap, of course: bent paper clips and defiled 28 photos of Rosie O'Donnell. Then have built-in 27 random delays on the server for a few seconds 26 at a time. As the sale continues, the actual 25 value of the product sold will increase 24 while the sell price does not. That way 23 the first buyers (bots in the first few 22 milliseconds) will get something worth much 21 less than what they paid (brown onion cakes?), the 20 next buyers (slower bots or faster humans) will 19 get something unspectacular but worth the 18 purchase price (bought on consignment?), and 17 the last buyers (almost all humans) will 16 get something worth more than the purchase 15 price (break out champagne?). That flat-screen 14 TV might be in the very last BOC purchased.
Anyone 13 that waits too long will miss out, but at 12 the same time anyone who buys too quickly 11 will get hosed. The trick is to wait for 10 some amount of time...but not too much. There's 9 some luck involved, which is as it should 8 be.
The low-tech solution would be to change 7 up the name of the BOC to something humans 6 can interpret but bots can't. Wineskin of 5 excrement? Sack containing smelliness? Topologically 4 flat surface adjacent to assorted goods? Never 3 use the same name twice, use marginally 2 different pictures, and explain in the product 1 description what is actually being sold.
I probably don't understand the problem 23 fully, but this idea occurred to me. Use 22 AJAX to draw and update the dynamic content 21 at a fixed interval while making the full 20 page deliberately slow to load using refresh.
For 19 example, make the whole page take a full 18 15 seconds to draw the first time it is 17 visited, after which dynamic content is 16 automatically refreshed using AJAX after 15 a set time of, say, 5 seconds. It would 14 be a major disadvantage to do a full page 13 reload. The page may regularly display new 12 information (including ads), but a full 11 page redraw using reload would be considerably 10 slower.
It will be possible for script kiddies 9 to figure out the AJAX query and automate 8 it but, then, it would also be very easy 7 to rate-limit those requests from the same 6 IP. Since there is no typical method for 5 a standard human user to initiate those 4 requests from the browser, it would be obvious 3 that high-rate requests to the AJAX URL 2 from the same IP would be initiated by some 1 form of automated system.
Instead of blocking suspected IPs it may 8 be effective to reduce the amount of data 7 you give to an address as its hits/min goes 6 up. So if the bot hits you up more than 5 a secret randomly changing threshold it 4 will not see the data. Logged in users would 3 always see the data. Logged in users that 2 hit the server too often would be forced 1 to re-authenticate, or be given a captcha.
The solution to this may be to attach a 9 little bit of client side processing to 8 actions of logging in and buying. The processing 7 can be a negligible amount so that individuals 6 are not affected but bots attempting to 5 do the tasks many times will be hampered 4 by the extra work load.
The processing can 3 be a simple equation to solve done in javascript, unless 2 you don't want to have to require javascript 1 on your site.
Hm I remember having read "Linux Firewalls" Attack 6 Detection and Response with ... The situations 5 there seem to be very comparable. And someone 4 else has suggested that also. Just block 3 a client temporarily or in progressive steps 2 to throttle them down. If it's realyl from 1 a few sites this must be quite efficient
Regards
Use JavaScript to dynamically write the 3 info into the page. Without a JS rendering 2 engine, surely the screen-scrapers & bots 1 won't be able to read the information.
The method I will describe has two requirements. 1) Javascript 76 is enforced 2) a web browser with a valid 75 http://msdn.microsoft.com/en-us/library/bb894287.aspx browser session.
With out either of these 74 you are "by design" out of luck. The 73 internet is built by design to allow anonymous 72 clients view content. There is no way around 71 this with simple HTML. Oh and I just wanted 70 to say that simple, image based CAPTCHA 69 can be defeated easily, even the authors 68 admit to this.
Moving along to the problem 67 and the solution. The problem is in two 66 parts. The first is that you cannot block 65 out an individual for "doing bad things". To 64 fix this you setup a method that takes in 63 the browsers valid session and generate 62 a md5sum + salt + hash (of your own private 61 device) and send it back to the browser. The 60 browser then is REQUIRED to return that 59 hashed key back during every post / get. If 58 you do not ever get a valid browser session, then 57 you reply back with "Please use a valid 56 web browser blah blah blah". All popular 55 browsers have valid browser session id's.
Now 54 that we have an identity at least for that 53 browser session (I know it does not lock 52 out permanently, but it is quite difficult 51 to "renew" a browser session through 50 simple scripting) we can effectively lock 49 out a session (ie; make it annoyingly hard 48 for scripters to actually visit your site 47 with no penalty to valid users).
Now this 46 next part is why it requires javascript. On 45 the client you build a simple hash for each 44 character that comes from the keyboard versus 43 the value of the text in the textarea. That 42 valid key comes over to the server as a 41 simple hash and has to be validated. While 40 this method could easily be reverse engineered, it 39 does make it one extra hoop that individuals 38 have to go through before they can submit 37 data. Mind you this only prevents auto posting 36 of data, not DOS with constant visits to 35 the web site. If you even have access to 34 ajax there is a way to send a salt and hash 33 key across the wire and use javascript with 32 it to build the onkeypress characters "valid 31 token" that gets sent across the wire. Yes 30 like I said it could easily be reversed 29 engineered, but you see where I am going 28 with this hopefully.
Now to prevent constant 27 abuse via traffic. There are ways to establish 26 patterns once given a valid session id. These 25 patterns (even if Random is used to offset 24 request times), have a lower epsilon than 23 if say a human was attempting to reproduce 22 that same margin of error. Since you have 21 a session ID, and you have a pattern that 20 "appears to be a bot", then you 19 can block out that session with a simple 18 lightweight response that is 20 bytes instead 17 of 200000 bytes.
You see here, the goal 16 is to 1) make the anonymous non-anonymous 15 (even if it's only per session) and 2) develop 14 a method to identify bots vs. normal people 13 by establishing patterns in the way they 12 use your system. You can't say that the 11 latter is impossible, because I have done 10 it before. While, my implementations were 9 for tracking video game bots I would seem 8 to think that those algorithms for identifying 7 a bot vs. a user can be generalized to the 6 form of web site visits. If you reduce the 5 traffic that the bots consume you reduce 4 the load on your system. Mind you this still 3 does not prevent DOS attacks, but it does 2 reduce the amount of strain a bot produces 1 on the system.
I think that sandboxing certain IPs is worth 15 looking into. Once an IP has gone over 14 a threshold, when they hit your site, redirect 13 them to a webserver that has a multi-second 12 delay before serving out a file. I've written 11 Linux servers that can handle open 50K connections 10 with hardly any CPU, so it wouldn't be too 9 hard to slow down a very large number of 8 bots. All the server would need to do is 7 hold the connection open for N seconds before 6 acting as a proxy to your regular site. This 5 would still let regular users use the site 4 even if they were really aggressive, just 3 at a slightly degraded experience.
You 2 can use memcached as described here to cheaply 1 track the number of hits per IP.
To solve the first problem of the bots slamming 22 your front page, try making the honeypot 21 exactly the same as a real bag of crap. Make 20 the html markup for the front page include 19 the same markup as if it were for a bag 18 of crap, but make it hidden. This would 17 force the bots to include CSS engines to 16 determine if the bag of crap code is displayed 15 or hidden. Alternatively, you could only 14 output this 'fake' bag of crap html a random 13 amount of time (hours?) before a real bag 12 of crap goes up. This would cause the bots 11 to sound the alarm too soon (but not know 10 how soon).
To cover the second step of actually 9 purchasing the bag of crap, add simple questions. I 8 prefer common sense questions to the math 7 questions suggested above. Things like, "Is 6 ice hot or cold?" "Are ants big or small"? Of 5 course, these would need to be randomized 4 and pulled from a never-ending supply of 3 questions, else the bots could be programmed 2 to answer them. These questions, though, are 1 still much less of an annoyance than CAPTCHAs.
What about using Flash?
Yes, I know the overhead 24 of using Flash, plus the fact that some 23 users will be locked out of buying the bag-o-crap 22 (i.e.: iPhone users) might make this detrimental, but 21 it seems to me that Flash would prevent 20 screenscraping or at least make it difficult.
Am 19 I wrong?
Edited to add
What about including a couple of 18 "hidden" fields on your submissions form 17 like what I found below:
Actually, best practice 16 seems to be to use two hidden fields, one 15 with an initial value, and one without. It's the 14 rare bot which can ignore both fields. Check 13 for one field to be blank, and the other 12 to have the initial value. And hide them 11 using CSS, not by making them "hidden" fields:
.important 10 { display : none ; }
Please don't change 9 the next two fields.
Bots tend to like fields 8 with names like 'address'. The text in 7 the paragraph is for those few rare human beings 6 who have a non-CSS capable browser. If 5 you're not worried about them, you can 4 leave it out.
In the logic for processing 3 the form, you'd do something like:
if (address2 2 == "xyzzy" and address3 == "") { /* OK 1 to send / } else { / probably have a bot */ }
- Go after the money stream. It is much easier than tracking the IP side. Make bots pay too much a few times (announcement with white text on white background and all variants of it) kills their business case quickly. You should prepare this carefully, and make good use of the strong points of bots: their speed. Did you try a few thousand fake announcements a few seconds apart? If they are hitting ten times/second you can go even faster. You want to keep this up as long as they keep buying, so think carefully about the moment of the day/week you want to start this. Ideally, they will stop paying, so you can hand over your case to a bank.
- Make sure your site is fully generated, and each page access returns different page content (html, javascript and css). Parsing is more difficult than generating, and it is easy to build-in more variation than bot developers can handle. Keep on changing the content and how you generate it.
- You need to know how fast bots can adapt to changes you make, and preferably the timezone they are in. Is it one botnet or more, are they in the same timezone, a different one, or is it a worldwide developer network? You want your counterattack to be timed right.
- Current state of the art bots have humans enter captcha's (offered against porn/games).
- Make it unattractive to react very fast.
- Use hashes and honeypots, as Ned Batchelder explains.
[edit] It is simply not true that you cannot 7 defend against botnets. Especially my second 6 suggestion provides for adequate defense 5 against automated buyers. it requires a 4 complete rethinking about the technology 3 you're using, though. You might want to 2 do some experiments with Seaside, or alternatively 1 directly in c.
Assumed non-negotiables:
The first screen 69 needs to be dead simple low overhead HTML, with 68 a single easily identiable (bot-wise or 67 people-wise) button to click or equivalent 66 to indicate unambiguously "I want my Crap". Because 65 we assume worst-case - you have the equivalent 64 of a DOS attack from a combination of bots 63 and nonbots, all first click on the site 62 (as far as identfiability). So let's hand 61 these out as quickly as we can from caches, benign 60 echobots, etc.
(Note: As far as wooters are 59 concerned, this is what happens anyway; it's 58 just as painful for users as for Woot, so 57 anything that helps absorb or mitigate the 56 first screen acquisition is in the interests 55 of all of the 3 parties involved.)
Then, the 54 process needs to be no more aggravating 53 for non-bots than it currently is, with 52 no additional steps (or pain) for legits. (Background 51 note on current design: Current wooters 50 usually will be already signed on, or can 49 sign on during the purchase process. New 48 buyers need to register during purchase. So 47 it's practically quicker to be already registered, and 46 quicker yet to already be logged on.)
To 45 complete the crap sale, a progression of 44 transaction screens need to be navigated 43 (say 5, plus or minus, depending on circumstances). The 42 winners are the first who complete the full 41 navigation. The current process rewards 40 bots (or anyone else) who complete the entire 39 sequence of 5 screens the most quickly; but 38 the entire progression is biased toward 37 fast responses (i.e. bots).
No question the 36 bots will have the advantage for the first 35 screen; and whatever edge they have achieved 34 from that point, they keep through the rest 33 of the screens, plus whatever advantage 32 botness provides at other stages as well.
What 31 if Woot were to intentionally decouple the 30 queuing process after the first screen, and 29 feed every session from that point into 28 a sequence of fixed-minimum-time steps? The 27 second screen wouldn't even be presented 26 until 30 seconds had passed; after it was 25 submitted, same for the following screens. I 24 bet wooters would have no problem if they 23 were told that, after the first screen, they 22 would wait in a queue (which is already 21 true) that would spread the load over time 20 in a way that should take no longer than 19 before, be more robust, and help weed out 18 the bots. At this point you can throw in 17 some of the bot speedbumps listed above 16 (subtle variations in DOM objects, etc.) Just 15 the benefit from the perception that Woot 14 is a little more in control of things would 13 help.
If a much higher proportion of the 12 BOC initial hits could segue into a bot-unfriendlier 11 non-time-critical process on their first 10 hit (or close to it), rather than retrying, then 9 real people who get past that point would 8 have more confidence. For sure it would 7 be less hostile than the current situation. It 6 might cut down on the background-noise-ambient-bot-rate 5 that's going on all the time even under 4 normal Woot-Off circumstances. And the bots 3 would lay off the main page and sit in the 2 queue with each other (and everyone else) where 1 they have no advantage.
Hmmm... The concept "apartment-threaded" comes to mind. I wonder if the pattern is approximately useful?
A useful core concept here is being able, after the first screen, to track accumulated total time in queue and be able to adjust to standard. As a bot-mitigation strategy, you would have a little bit of flexibility to maybe fudge the very earliest sessions by maybe 5-10 seconds; doing so would probably be undetectable, but would result in a richer non-bot purchase mix. I'm sure you have statistics to help evaluate stuff like this after the fact.
Just for fun, you could (at least for one wootoff) put together your own bot that combines the best features you've seen, and then hand it out to everyone the day before. Then at least everyone would be equally armed. (Then duck ... incoming ...)
I like BradC's answer (using the suggestions in Ned Batchelder's 31 article), but I want to add another level 30 to it. You may be able to randomize not 29 only the field names, but also the field 28 positions and the code that makes them invisible.
Now, this last bit is hard 27 part and I don't know exactly how to do 26 it, but someone with more JavaScript and 25 CSS experience might be able to figure it 24 out. Of course, you can't just keep the 23 same positions all the time, because the 22 scripters will just figure out that the 21 element with position (x,y) is the real 20 one. You would have to have some code that 19 changes the positioning of form elements 18 relative to other elements in order to move 17 them off the page, overlay them on each 16 other, etc. Then obfuscate the code that does this with some randomness introduced 15 into it. Automatically change the obfuscation 14 daily, before a new item is made available. The 13 idea is that without a proper CSS and JavaScript 12 implementation (and code to read layout 11 of the page as a human would) a bot won't 10 be able to figure out which elements are 9 being shown to the user. Your server-side 8 code, of course, knows which fields are 7 real and which are fake.
In summary:
- The field names are random
- The field order is random
- The field hiding code is complex
- The field hiding code is obfuscated - randomly
- The random factors are automatically changed every day by server-side code
With 6 the constraints you've given I don't think 5 there is a way to avoid an "arms race" of 4 some kind, but that doesn't mean all is 3 lost. If you can automate your side of the 2 arms race and the scripters cannot then 1 you would win it every time.
Make it unprofitable for the bot users and 3 they'll go away pretty quickly - that is, occasionally 2 sell something that no human being could 1 possibly ever want (a bag of literal crap maybe).
How about a delay page where the user must 10 wait for a delay that is shown in an image?
You 9 only do the ordering from the page they 8 get to if they click within a short enough 7 time period of that specified in the image, maybe 6 the image could be doing a countdown within 5 an animated gif or very small javascript 4 or flash timer.
If they jump to the details 3 page outside the time limit, they see an 2 expensive item as discussed in previous 1 answers.
I am not 100% sure this would work, at least 23 not without trying.
But it seems as if it 22 should be possible, although technically 21 challenging, to write a server-side HTML/CSS 20 scrambler that takes as its input a normal 19 html page + associated files, and outputs 18 a more or less blank html page, along with 17 an obfuscated javascript file that is capable 16 of reconstructing the page. The javascript 15 couldn't just print out straightforward 14 DOM nodes, of course... but it could spit 13 out a complex set of overlapping, absolute-positioned 12 divs and paragraphs, each containing one 11 letter, so it comes out perfectly readable.
Bots 10 won't be able to read it unless they have 9 employ a complete rendering engine and enough 8 AI to reconstruct what a human would be 7 seeing.
Then, because it's an automated process, you 6 can re-scramble the site as often as you 5 have the computational power for - every 4 minute, or every ten minutes, or every hour, or 3 even every page load.
Granted, writing such 2 an obfuscater would be difficult, and probably 1 not worth it. But it's a thought.
There's a lot of suggestions here so pardon 22 me if this has already been posted.
The first 21 thing I would do is make the ordering a 20 two step process. The first step would pass 19 back a GUID while logging the IP Address. The 18 second step would receive the GUID and compare 17 it against IP Addresses that have been logged. In 16 conjunction with blocking IP Addresses which 15 are spamming the site (IE: faster than a 14 human can click refresh) this technique 13 could stop spammers from successfully making 12 purchases thereby solving 1 & 3.
The 11 second item is problematic but I would keep 10 a running list of your regular user's IP 9 addresses and throttle traffic for any newcomers. This 8 could leave first time visitors and dial 7 up users (due to changing IP addresses) out 6 in the cold, but I think it's just making 5 the best out of a bad situation by giving 4 preference to repeat business... and dialup 3 users, well it's questionable whether they'd 2 "win" even if there weren't any spammers 1 anyway.
Why don't you block the credit cards of 8 users you identify as bots?
- Publish that using bots is illegal on your website
- Find certain heuristics that identify bots (this can be done for example by short-term IP tracking or by the time it takes them to feel up the form)
- If someone you tagged as a bot purchased the item, block his credit card for future use
- Next time he tries to make a purchase, disallow it and return the item to stock
I guess even 7 the professionals will run out of credit 6 cards eventually.
Your server load should 5 decrease with time once the botters give 4 up on you. Another idea is to separate your 3 pages between servers - e.g., RSS feed on 2 one server, homepage on another, checkout 1 on another one.
Good luck.
I'm pretty sure your server already logs 20 all the IPs of incoming requests (most do) - so 19 the data is already there.
Maybe you could:
Just validate 18 the "winner" by verifying that it's IP shows 17 up less than a certain threshold value in 16 the logs (I use "grep | wc -l" to get the 15 count). If it's over your threshold, temporarily 14 block that IP (hour or so?).
Disqualify any 13 "winner" with the same shipping address 12 or payment info as the "last" winner, or 11 that has won within a certain time frame 10 to spread the "winning" around.
The bots 9 won't get 'em all that way.
To annoy the crap out of the scrapers: When the "random 8 crap" item goes up, run the HMTL output 7 for that page through a "code obfuscator" ... which 6 doesn't change the "display" of the page 5 ... just scrambles the code with randomly 4 generated Ids etc.
More insidious:
Increase the price charged 3 for the "won" item based on how many times 2 the winning IP shows up in the logs. Then 1 even if the bots win, so do you. :-)
Trying to target the BOTs themselves will 12 never solve the problem - whoever is writing 11 them will figure out a new way around whatever 10 you've put in place. However forcing the 9 user to think before buying would be a much 8 more effective solution. The best way of 7 doing this that I can think of is run a 6 Dutch auction. Start the price high (2x 5 what you buy it for in the shop) and decrease 4 it over time. The first person to hit buy 3 gets it. I don't think any bot is intelligent 2 enough to workout what the best price is 1 for the item.
Restrict the times at which you release offers: For example: only from 7 minutes to 8 minutes 26 past the start of an hour. Do not deviate 25 from this, and give penalties on the order 24 of a couple seconds to IPs which check a 23 lot in the half hour before the release 22 time. It then becomes advantageous for bot 21 owners to only screen scrape for a couple 20 minutes every hour instead of all. the. time. Also, because 19 a normal person can check a site once every 18 hour but not every second, you put normal 17 people on a much more even footing with 16 the bots.
Cookies: Use a tracking cookie composed 15 of only a unique ID (a key for a database 14 table). Give "release delays" to clients 13 with no cookie, invalid cookies, clients 12 which use the same cookie from a new IP, or 11 cookies used with high frequency.
Identify likely bots: Cookies 10 will cause the bots to request multiple 9 cookies for each IP they control, which 8 is behavior which can be tracked. IPs with 7 only a single issued cookie are most likely 6 normal clients. IPs with many issued cookies 5 are either large NAT-ed networks, or a bot. I'm 4 not sure how you would distinguish those, but 3 companies are probably more likely to have 2 things like DNS servers, a web page, and 1 things of that nature.
Perhaps you need a solution that makes it 8 totally impossible for a bot to distinguish 7 between the bag-o-crap sales and all other 6 content.
This is sort of a variation on 5 the captcha theme, but instead of the user 4 authenticating themselves by solving the 3 captcha, the captcha is instead the description 2 of the sale, rendered in a visually pleasing 1 (but perhaps somewhat obscured by the background) manner.
I think your best bet is to watch IP's coming 9 in, but to mitigate the issues you mention 8 in a couple of ways. First, use a probabilistic 7 hash (eg, a Bloom Filter) to mark IP's which have been 6 seen before. This class of algorithm is 5 very fast, and scales well to absolutely 4 massive set sizes. Second, use a graduated 3 response, whereby a server delay is added 2 to each request, predicated by how much 1 you've seen the IP 'recently'.
At the expense of Usability by those with 16 screen readers you could just, on 90% of 15 the pages use unlabelled, undenotable picture 14 buttons. Rotate the pictures regularly and 13 use a random generator and random sorting 12 to lay out two buttons that say "I want 11 this" and "I am a bot". Place them side 10 by sort in a different order. At each stage 9 a user can make progress torwards their 8 target but a bot is more likely to make 7 a mistake (50% * number of steps). It's 6 like a capture at every stage on easier 5 for the user and slower for bots who need 4 to prompt their master at EVERY single step. Put 3 the price, the confirm button, the item 2 description in pictures. It sucks but likely 1 more successful.
Just make the bots compete on even ground. Encrypt 7 a timestamp and stick it in a hidden form 6 field. When you get a submission decrypt 5 it and see how much time has passed. If 4 it surpasses the threshold of human typing 3 ability reject it. Now bots and humans can 2 only try to buy the bag of crap at the same 1 speed.
If you can't beat them... Change the rules!
Why 8 not provide a better system than the scripters 7 have made for themselves?
Modify your site 6 to be fairer for people not using bot scripts. People 5 register (CAPTCHA or email verification) and 4 effectively enter a lottery competition 3 to win!
'Winning' makes it more fun. and 2 each person pays a small entry fee so the 1 Winner gets the product for EVEN less
I'm not a web developer, so take this with 26 a pinch of salt, but here's my suggestion 25 -
Each user has a cookie (containing a random 24 string of data) that determines whether 23 they see the current crap sale.
(If you don't 22 have a cookie, you don't see them. So users 21 who don't enable cookies never see crap 20 sales; and a new user will never see them 19 the first time they view the page, but will 18 thereafter).
Each time the user refreshes 17 the website, he passes his current cookie 16 to the server, and the server uses that 15 to decide whether to give him a new cookie 14 or leave the current one unchanged; and 13 based on that, decides whether to show the 12 page with or without the crap sale.
To keep 11 things simple on the server side, you could 10 say at any given time, there's only ever 9 one cookie that will let you see crap sales; and 8 there are a couple of other cookies that 7 are labelled "generated in the last 2 seconds", which 6 will always be kept unchanged. So if you 5 refresh the page faster than that, you can't 4 get a new one.
(...ah, well, I guess that 3 doesn't stop a bot from restoring an older 2 cookie and passing it back to you. Still, maybe 1 there's a solution here somewhere.)
Stopping all bots would be quite difficult, especially 22 without using a CAPTCHA. I think you should 21 approach this from the standpoint of implementing 20 a wide variety of measures to make life 19 harder for the scripters.
I believe this 18 is one measure that would weed out some 17 of them:
You could try randomizing the IDs and class names of your tags with 16 each response. This would force bots to 15 rely on the position and context of important 14 tags, which requires a more sophisticated 13 bot. Furthermore, you could randomize the 12 position of the tags if you want to use 11 relative or absolute positioning in your 10 CSS.
The biggest drawback with this approach 9 is that you would have to take steps to 8 ensure your CSS file is not cached client-side, because 7 it would of course need to contain the randomized 6 IDs & class names. One way to overcome 5 this is to not use external CSS files and 4 instead put the CSS with the randomized 3 selectors in the <head></head>
section of the page. This 2 would allow the randomized CSS to be client-side 1 cached along with the rest of the page.
Steps:
(combining ideas from another poster and 7 gif spammers)
Display the entire offer page as an image, ad-copy and all.
Encrypt the price in the URL.
Attacks:
Bots going to the URL to view the price on the checkout page
turn the checkout price tag into an image, or
apply a captcha 6 before users can go to the order page.
chewing up bandwidth
- Serve special offers using images, normal offers using HTML.
reckless bot ordering
- some of the special "image" offers are actually at normal prices.
RSS Scraping
RSS feeds must be paid for by 5 hashcash or captchas.
This has to be on a 4 per-request basis.
It can be pre-paid, for instance user can 3 enter 20 captchas for 200 RSS look ups
Once 2 the threat of DDOS has been mitigated, you 1 can implement e-mail notification of offers
How about coming up with a way to identify 5 bots, probably IP based, but not block them 4 from accessing the site, just don't allow 3 them to actually buy anything. That is, if 2 they buy, they don't actually get it, since 1 bots are against the terms of use.
The problem with CAPTCHA is that when you 29 see a crap sale on Woot, you have to act 28 VERY fast as a consumer if you hope to receive 27 your bag of crap. So, if you are going 26 to use a form of CAPTCHA , it must be very 25 quick for the customer.
What if you had 24 a large image, say 600 x 600 that was just 23 a white background and dots of different 22 colors or patterns randomly placed on the 21 image. The image would have an image map 20 on it. This map would have a link mapped 19 to small chunks of the image. Say, 10 x 18 10 blocks. The user would simply have to 17 click on the specific type of dot. It would 16 be quick for end the user and it would somewhat 15 difficult for a bot developer to code. But 14 this alone may not be that difficult for 13 a good bot creator to get past. I would 12 add ciphered URLs.
I was developing a system 11 some time back that would cipher URLs. If 10 every URL on these pages is ciphered with 9 a random IV, Then they all appear to be 8 unique to the bot. I was designing this 7 to confuse probing bots. I have not completed 6 the technique yet, but I did have a small 5 site coded that functioned in this manor.
While 4 these suggestions are not a full solution, they 3 would make it way harder to build a working 2 bot while still being easy for a human to 1 use.
There's probably no good solution as long 41 as the surprise distribution of the bag 40 o' crap is tied only to a point in time 39 - since bots have plenty of time, and the 38 resources to keep slamming the site at short 37 time intervals.
I think you'd have to add 36 an extra criterion that bots can't screen-scrape 35 or manipulate from their end. For instance, say 34 at any time there's 5000 humans hitting 33 the page a few times a minute looking for 32 the bag of crap, and 50 bots slamming it 31 every second. In the first few seconds 30 after it appears, the 50 bots are going 29 to snap it all up.
So, you could add a 28 condition that the crap appears first to 27 any users where the modulus 30 of their 26 integer IP is a random number, say 17. Maybe 25 another random number is added every second, so 24 the crap is revealed incrementally to all 23 clients over 30 seconds.
Now imagine what 22 happens in the first several seconds: currently, all 21 50 bots are able to snap up all the crap 20 immediately, and the humans get 0. Under 19 this scheme, after 6 seconds only 10 bots 18 have made it through, while 1000 humans 17 have gotten through, and most of the crap 16 goes to the humans. You could play with 15 the timings and the random modulus to try 14 and optimize that interval, depending on 13 user counts and units available.
Not 12 a perfect solution, but an improvement. The 11 upside is many more humans than bots will 10 benefit. There are several downsides, mainly 9 that not every human gets an equal shot 8 at the crap on any particular day - though 7 they don't have much of a shot now, and 6 I'd guess even without bots, most of them 5 get shut out at random unless they happen 4 to refresh at just the right second. And, it 3 wouldn't work on a botnet with lots of distributed 2 IPs. Dunno if anyone's really using a botnet 1 just for woot crap though.
Your end goal is to spread out to a larger 31 user base who gets to buy stuff.
What if 30 you did something like releasing your bags 29 of w00t over a period of an hour or two, and 28 over a range of IP addresses, instead of 27 releasing them all at the same time and 26 to any IP address.
Let's say you have 255 25 bags of w00t. 1.0.0.0 can buy in the first 24 minute, 2.0.0.0 can buy in the second minute 23 (potentially 2 bags of w00t available), etc, etc.
Then, after 22 255 minutes, you have made bags of w00t 21 available to everybody, although it is highly 20 likely that not all 255 bags of w00t are 19 left.
This limits a true attack to users 18 who have >255 computers, although a bot 17 user might be able to "own" the bag of w00t 16 assigned to their IP range.
There is no requirement 15 that you match up bags to IP's fairly (and 14 you definitely should use some type of MD5 13 / random seed thing)... if you distribute 12 10 bags of w00t incrementally, you just 11 have to make sure that it gets distributed 10 ~evenly~ across your population.
If IP's 9 are bad then you can use cookies and exclude 8 the use case where a non-cookied user gets 7 offered a bag of w00t.
If you notice that 6 a particular IP, cookie, or address range 5 has an extreme amount of traffic, make the 4 bag of w00t available to them proportionally 3 later / last, so that occasional / steady 2 / slow visitors are given opportunities 1 before heavy / rapid / probable bot users.
--Robert
I would recommend a firewall-based solution. Netfilter/iptables, as 10 most firewalls, allows you to set a limit 9 to the maximum number of new page requests 8 per unit time.
For example, to limit the 7 number of page views dispensed to something 6 human -- say, 6 requests every 30 second 5 -- you could issue the following rules:
iptables -N BADGUY
iptables -t filter -I BADGUY -m recent --set --name badguys
iptables -A INPUT -p tcp --dport http -m state --state NEW -m recent --name http --set
iptables -A INPUT -p tcp --dport http -m state --state NEW -m recent --name http --rcheck --seconds 30 --hitcount 6 -j BADGUY
iptables -A INPUT -p tcp --dport http -m state --state NEW -m recent --name http --rcheck --seconds 3 --hitcount 2 -j DROP
Note 4 that this limit would apply to each visitor 3 independently, so one user's misuse of the 2 site wouldn't affect any other visitor.
Hope 1 this helps!
You could reduce the load on your server 30 by having the RSS and HTML update at the 29 same time, so there's no incentive for the 28 bots to screenscrape your site. Of course 27 this gives the bots and advantage in buying 26 your gear.
If you only accept payments via 25 credit card (might be the case, might not 24 be, but it shows my line of thinking) only 23 allow a user to buy a BOC once every 10 22 sales with the same account and/or credit 21 card. It's easy for a script kiddie to 20 get a swarm of IPs, less easy for them to 19 get a whole heap of credit cards together. And 18 as you've said IPs are really hard to ban, while 17 temporary bans on credit cards should be 16 a walk in the park.
You could let everyone 15 know what the limit is, or you could just 14 tell them that because of the high demand 13 and/or bot interest there's throttling implemented 12 on the purchasing while being unspecific 11 about the mechanism.
Each attempt to purchase 10 during the throttling period could trigger 9 an exponential backoff - you buy a BOC, you 8 have to what for 10 sales to pass before 7 you try again. You try again anyway on 6 the next sale, and now you have to wait 5 20 sales, then 40, then 80...
This is only 4 really useful if it's really unlikely that 3 a human user would manage to get a BOC twice 2 in less than 10 sales. Tune the number 1 as appropriate.
There are a few solutions you could take, based 107 on the level of complexity you want to get 106 into.
These are all based on IP tracking, which 105 falls apart somewhat under botnets and cloud 104 computing, but should thwart the vast majority 103 of botters. The chances that Joe Random 102 has a cloud of bots at his disposal is far 101 lower than the chance that he's just running 100 a Woot bot he downloaded somewhere so he 99 can get his bag of crap.
Plain Old Throttling
At a very basic, crude 98 level, you could throttle requests per IP 97 per time period. Do some analysis and determine 96 that a legitimate user will access the site 95 no more than X times per hour. Cap requests 94 per IP per hour at that number, and bots 93 will have to drastically reduce their polling 92 frequency, or they'll lock themselves out 91 for the next 58 minutes and be completely 90 blind. That doesn't address the bot problem 89 by itself, but it does reduce load, and 88 increases the chance that legitimate users 87 will have a shot at the item.
Adaptive Throttling
An variant 86 on that solution might be to implement a 85 load balancing queue, where the number of 84 requests that one has made recently counts 83 against your position in the queue. That 82 is, if you keep slamming the site, your 81 requests become lower priority. In a high-traffic 80 situation like the bag of crap sales, this 79 would give legitimate users an advantage 78 over the bots in that they would have a 77 higher connection priority, and would be 76 getting pages back more quickly, while the 75 bots continue to wait and wait until traffic 74 dies down enough that their number comes 73 up.
End-of-the-line captcha
Third, while you don't want to bother 72 with captchas, a captcha at the very end 71 of the process, right before the transaction 70 is completed, may not be a bad idea. At 69 that point, people have committed to the 68 sale, and are likely to go through with 67 it even with the mild added annoyance. It 66 prevents bots from completing the sale, which 65 means that at a minimum all they can do 64 is hammer your site to try to alert a human 63 about the sale as quickly as possible. That 62 doesn't solve the problem, but it does mean 61 that the humans have a far, far better chance 60 of obtaining sales than the bots do currently. It's 59 not a solution, but it's an improvement.
A combination of the above
Implement 58 basic, generous throttling to stop the most 57 abusive of bots, while taking into account 56 the potential for multiple legitimate users 55 behind a single corporate IP. The cutoff 54 number would be very high - you cited bots 53 hitting your site 10x/sec, which is 2.16 52 million requests/hour, which is obviously 51 far above any legitimate usage, even for the 50 largest corporate networks or shared IPs.
Implement 49 the load balancing queue so that you're 48 penalized for taking up more than your share 47 of server connections and bandwidth. This 46 penalizes people in the shared corporate 45 pools, but it doesn't prevent them from 44 using the site, and their violation should 43 be far less terrible than your botters, so 42 their penalization should be less severe.
Finally, if 41 you have exceeded some threshold for requests-per-hour 40 (which may be far, far, far lower than the 39 "automatically drop the connection" cutoff), then 38 require that the user validate with a captcha.
That 37 way, the users who are legitimately using 36 the site and only have 84 requests per hour, even 35 when they're mega-excited, don't notice 34 a change in the site's slow at all. However, Joe 33 Botter finds himself stuck with a dilemma. He 32 can either:
- Blow out his request quota with his current behavior and not be able to access the site at all, or
- Request just enough to not blow the request quota, which gives him realtime information at lower traffic levels, but causes him to have massive delays between requests during high-traffic times, which severely compromises his ability to complete a sale before inventory is exhausted, or
- Request more than the average user and end up getting stuck behind a captcha, or
- Request no more than the average user, and thus have no advantage over the average user.
Only the abusive users suffer 31 degradation of service, or an increase in 30 complexity. Legitimate users won't notice 29 a single change, except that they have an 28 easier time buying their bags of crap.
Addendum
Throttle 27 requests for unregistered users at rates 26 far below registered users. That way, a 25 bot owner would have to be running a bot 24 via an authenticated account to get past 23 what should be a relatively restrictive 22 throttling rate.
The inventive botters will 21 then register multiple user IDs and use 20 those to achieve their desired query rate; you 19 can combat that by considering any IDs that 18 show from the same IP in a given period 17 to be the same ID, and subject to shared 16 throttling.
That leaves the botter with no 15 recourse but to run a network of bots, with 14 one bot per IP, and a registered Woot account 13 per bot. This is, unfortunately, effectively 12 indistinguishable from a large number of 11 unassociated legitimate users.
You could 10 use this strategy in conjunction with one 9 or more of the above strategies with the 8 goal to produce the overall effect of providing 7 the best service to registered users who 6 do not engage in abusive usage patterns, while 5 progressively penalizing other users, both 4 registered and unregistered, according to 3 their status (anon or registered) and the 2 level of abuse as determined by your traffic 1 metrics.
my first thought was that you say the bots 5 are scraping your webpage, which would suggest 4 they are only picking up the HTML content. So 3 having your order screen verify (from the 2 http-logs) that an offer-related graphic 1 was loaded from the bot
Develop a front page component and shopping 6 cart that do not run natively in the brower. If 5 you use something like Flex/Flash or Silverlight, it 4 is much more difficult to scrape, and you 3 have full control over the server communication, and 2 thus can shield the content completely from 1 scripters.
This only needs to be a problem if the bot 7 users are paying with invalid credit cards 6 or something. So how about a non-technical 5 solution?
Treat the bot users as normal 4 users as long as their payments are valid 3 and make sure you have enough in stock to 2 satisfy the total demand.
Result: more sales. You're 1 in business to make money, right?
To guarantee selling items only to non-scripted 22 humans, could you detect inhumanly quick 21 responses between the item being displayed 20 on the front page and an order being made? This 19 turns the delay tactic on its head, instead 18 of handicapping everyone artificially through 17 a .5 second delay, allow requests as fast 16 as possible and smack bots that are clearly 15 superhuman:)
There is some physical limit 14 to how fast a user can click and make a 13 decision, and by detecting after all the requests 12 have gone through (as opposed to purposely 11 slowing down all interacts), you don't effect 10 performance of non-scripted humans.
If only 9 using CAPTCHAs some of the time is acceptable, you 8 could increase the delay time to fast-human 7 (as opposed to superhuman) and require a 6 post confirmation CAPTCHA if someone clicks 5 really fast. Akin to how some sites require 4 CAPTCHA confirmation if someone posts multiple 3 posts quickly.
Sadly I don't know of any 2 good ways to stop screen scrapers of your 1 product listings :(
I'm just wondering if there might be a simple 27 solution to this.
I assume that the message 26 indicating the crap sale is posted in text 25 and this is the bit of information the scrapers 24 look for.
What if you made the announcement 23 using an image instead? Doing so might 22 pose some design problems but they could 21 be overcome and possibly serve as the impetus 20 for some ingenious creativity.
Issue #1
There 19 would have to be some design space dedicated 18 to an image. (Want to be really tricky? Rotate 17 a local ad through this slot. Of course 16 the image's name would need to be static 15 to avoid giving scrapers a scent. That's 14 one slot that would never have to worry 13 about ad-blindness...)
Issue #2
RSS. I'm 12 not sure if everyone can view images in 11 their feed readers. If enough of your users 10 can, then you could start sending a daily 9 feed update consisting of an image. You 8 could send whatever miscellaneous stuff 7 you wanted on most days and then switch 6 it for your crap sale alert as desired.
I 5 don't know... would they just program their 4 bots to hit your site every time a feed 3 item went out?
Other issues? Probably a 2 lot. Maybe this will help with some brainstorming, though.
Take 1 care,
Brian
Here are some valid assumptions for you 2 to make:
- Any automated solution can and will be broken.
- Making the site completely require human input (eg CAPTCHA) will greatly increase the difficulty of logging in/checking out/etc.
- You have a limited number of Bandoliers of Cabbage to sell.
- You can track users by session via a client-side cookie.
- You aren't dealing with extremely hardcore criminals here; these are simply technical people who are bending, but not breaking, the law. Successful orders via bots will go to the person's home, and likely not some third-party mail drop.
The solution isn't a technical one. It's 1 a policy one.
- Log all client session ids on your webserver.
- Enact a "limited bots" policy; say, one screen scrape every X seconds, to give people with regular browsers the ability to hit refresh. Any user found to be going over this limit doesn't win the woot.
- Follow this up by sending known bot owners a bunch of Leakfrogs.
Here is what I'd do:
- Require all bidders for bag of crap sales to register with the site.
- When you want to start a sale, post "BOC sale starting soon, check your email to see if you are eligible" on your main page.
- Send out invitations to a random selection of the registered players, with a url unique to that particular sale when sale starts.
- Ensure the URL used is different for each sales event.
- Tweak the random selection invitation algorithm to pull down elibiblity for frequent winners, based upon Credit Card used for purchase, paypal account, or shipping address.
This thwarts the bots, as 17 your main page only shows the pending BOC 16 event. The bots will not have access to 15 the URL without recieving it in email, and 14 have no guarantee they will recieve it at 13 all.
If you are concerned about sales impact, you 12 could also incentivize participation by 11 giving away one or two BOC's for each sale. If 10 you don't see enough uptake on an offer 9 in a given time interval, you automatically 8 mail additional registered users, increasing 7 the participant pool in each offer.
Viola. Level 6 playing field, without tons of heuristics 5 and web traffic analysis. System can still 4 be gamed by people setting up huge numbers 3 of email accounts, but tweaking participant 2 selection criteria by CC#, paypal account, shipping 1 address mitigates this.
What about the NoBot Control from the ASP.net AJAX control toolkit?
It does some automated javascript request 4 and timing tricks to prevent bots from accessing 3 the site with NO user interaction.
Sorry 2 if this doesn't meet some requirement, i'll 1 just have to call
tl;dr >D
Turn certain parts of the page into images 9 so the bots can't understand them.
For 8 example create small images of the integers 7 0-9, the dollar sign, and the decimal point. Cache 6 the images on the client's computer when 5 the page loads... then display the price 4 using images chosen via code running server-side. Most 3 human users won't notice the difference 2 and the bots won't know the prices of any 1 items.
My Opinion as a longtime WOOTer
I would be happy to have a CAPTCHA on ordering, turned on only for the BOC. I 8 think most wooters would agree. Plus, 99.9% of 7 the time you don't even get to the order 6 screen because it sells out so fast, so 5 hardly anybody would even know!!
If you make 4 the CAPTCHA a really hard math problem, I'll 3 be able to finally explain to my mom the 2 practical benefit of so many years of studying 1 math.
I don't see why IP address filtering HAS 7 to be prohibitively expensive. With IIS 6 you can build an ISAPI filter to do this 5 in native code. I am sure apache has similar 4 interfaces. Using the IP address of the 3 client, you can write a simple rate-limiter 2 for HTTP requests that does not depend on 1 a banned list or other such nonsense.
- Tarpit. Limiting page views to 1 per second won't bother human users.
- Links via JavaScript. Simple bots don't dig that. as of usability, statistics show, that less then 1% of users doesn't use JS. 2a. hard-core version of above. Links in Flash.
- parameters stored in session, rather then in query string. Most bot are stateless.
0
Never thought I'd recommend flash for anything, but 19 what about flash? Let your server send 18 down asynchronous, encrypted content to 17 the flash file signaling if it's deal time 16 or not. As long as the response is the 15 same size deal or no deal, the bot can't 14 tell which it is.
At a more general level, you 13 need to focus on the resources a human plus 12 a browser have that a scripted bot doesn't 11 and take advantage of things that are easy 10 for humans/browsers and hard for bots. Captcha 9 is obviously a simplistic attempt at doing 8 this, but doesn't suit your site as you 7 say. Flash would weed out a ton of bots, leaving 6 only the (slower) ones that drive a real 5 browser. The solution could be much simpler 4 than captcha if it just requires the user 3 to click in the right spot.
Take advantage 2 of humans' massively parallel image processing 1 power!
Make scanning the site expensive.
There 15 is no way I know that can keep a bot out 14 of your site. I even know a service, where 13 there are humans that scan sites for you. How 12 would you handle that?
The worst thing 11 for bots is, when a site changes. After 10 a while it gets to expensive or to boring 9 to keep the bot running. There might be 8 updates on the your site that look like 7 a new product, but actually are not. If 6 you update unregularly and undpredictable 5 things are getting realy hard to the bot.
Banning 4 IPs might be a countermeasure, as long as 3 it is a known IP. The offender needs to 2 use a proxy. The proxies I know work well, but 1 slow you down a lot.
My thoughts (I haven't checked all the others, so 20 I don't know if it's novel)
Dealing with swarming:
Convert the front-page 19 matter for each day's stuff to be a flash/flex 18 object.
- Yes, some people will complain, but we're looking for the common case here, not the ideal.
- You should also randomize the name of your flash objects, so they aren't in any predictable pattern of names.
Using Akamai or another CDN, deploy 17 this flash object in advance to the outside 16 world. Akamai produces what appears to 15 be random URLs, so it makes it hard to predict.
- When it is time for a new sale, you just have to change your URL locally to refer to the appropriate object at Akamai, and people will go fetch the flash object from them to discover if the deal is a BoC or not.
End-of-the-day 14 - you now have Akamai handling your swarms 13 of midnight traffic
Dealing with auto-buy
- Each of the flash objects you create can have lots and lots of content hidden inside - images, links, arbitrary ids, including 'bag of crap' in a thousand places. you should be able to obfuscate the flash as well.
- When the flash object "goes live", people will start to attack it. But there are so many false positives that a simple string scan is useless - they'll have to simulate running the flash locally.
- But the flash doesn't write text. It draws lines and shapes. Shapes in different colors, all connected to timers that make them appear and disappear at different times.
- If you've seen the Colbert Report, you know how the intro has hundreds of words describing Colbert. Imagine something like that for your intro, which will always include Bag O Crap.
- Now, imagine that the intro takes an arbitrary amount of time - sometimes a few seconds, sometimes as long as a minute or more (make it funny)
- Meanwhile, "Bag O Crap" is constantly showing up, but again, clearly as part of the intro.
- Finally, the actual deal of the day is revealed, with an active 'shimmer' effect that makes it difficult for any single snapshot of the canvas to reveal the actual product name. This is floating above an animated background that still says 'bag O crap' and is constantly in motion
- again, all of this is handled with lines and shapes, not with text strings
End result - your hacker 12 is forced to take lots of image snapshots 11 of the deal, figure out how to separate 10 all the false positives and identify the 9 actual deal. Meanwhile, humans just look 8 at it, and between eye fatigue and our ability 7 to fill in gaps in the text, we can read 6 the deal as is.
This won't work forever, but 5 it would work for a while.
Another idea is 4 to simply restrict people from buying BoCs 3 unless they've bought something before with 2 that account, and to never let them buy 1 a BoC again.
Identify bots via IP or a suit of other 6 mechanisms.
Always serve those identified 5 as bots the normal front page.
Real people 4 falsely identified as bots will not get 3 the specials, but they won't notice anyway.
Bot 2 owners won't realize you've identified them, so 1 they will stop adapting their scripts.
My solution is a combination of marketing 64 changes and technology changes.
Currently 63 the technical side of sellng portion of 62 bags of crap promotions are handled as a 61 normal woot sale. The sale starts, people 60 race to buy, all items are sold. The same 59 statistcal charts used for daily sales are 58 used bag of crap sales.
There are several 57 market goals involved:
- Get customers to visit the site once every day (impluse purchasing). The possiblility of a seeing a bag of crap sale is the reason/reward.
- Network/viral/gossipy effect where a customer sees a bag of crap sale is on they will IM/EMail/Telephone their friends.
- There is also what I'd call general "good will". Woot is a really cool place because it occasionally rewards its customers with amazing sales (bag of crap that included a flat panel tv)... AND its done in a fair "first comes first served" manner.
The first 2 seem to 56 be the most important. The sheer number 55 of visitors has an effect on how fast normal 54 deals sell (or sell out). New customers 53 have traditionally been attracted pretty 52 much by word of mouth, and having customers 51 sending their friends to woot.com is a win.
So... my 50 solution is to change the promotion delivery 49 into more of a lottery.
Occasionally users 48 can do something fun to see if they are 47 eligable for a bag of crap. The something 46 fun could be a silly flash game along the 45 lines of "punch the monkey" or Orbitz mini-puts, baseball, hockey. The 44 goal here is game that a bot can't script 43 so some considerable care will be needed. The 42 goal is also not to only award bag of crap 41 to game winners... but to all game players.
The 40 technical core of the game is that at the 39 end of the game a request is made to a server 38 that does an "instant lottery" to determine 37 if the user has won a bag of crap sale opportunity. The 36 server request will need to include something 35 calculated by the game itself (roughly speaking 34 "hash cash"... a complex, CPU cycle consuming, calculation, and 33 hopefully one that is difficult to reproduce). This 32 is to prevent a bot from repeatedly entering 31 the lottery just be querying the lottery 30 server/service.
The game itself can change 29 over time. You can do special event games 28 for halloween, christmas, valinties, easter, etc. There's 27 lots of room for fun marketing ideas that 26 can match woot's "wootiness".
If the user 25 wins they can purchase N bags of crap (in 24 a time limited window)... but they can also 23 send N friends a time limited invitation 22 to purchase a bag of crap (good for 24 hours). This 21 provides a super strong network effect... customers 20 will definately tell their friends. Or you 19 could also do it as "buy 1 give 1"... let 18 customers buy up to a total of N but force 17 every second one to be shipped to a friend. The 16 key here is to make the network/gossip effect 15 an full fledged part... help the customer 14 tell the world about the wonderfulness of 13 woot.
The promotional material arounnd bag 12 of crap sales concept will also need to 11 be revamped. The graphs of how quickly a 10 bag of crap sold out are no longer relevant. Something 9 along the lines how frequently through the 8 month people had the opportunity to purchase. How 7 many people told their friends. The marterials 6 should subtley emphasize the point that 5 a daily woot visit is a good idea.
You can 4 also promote the heck out of why bag of 3 crap sales are changing. Especially that 2 you hired the best bag of crap consultants 1 available for free.
Honestly, I think your best solution is 14 to make items during a Woot-Off only be 13 visible to logged in users, and limit each 12 logged-in user to one home page refresh 11 every 500ms or so. (Or possibly make only 10 a picture of the item be visible to unauthenticated 9 users during a Woot-Off, and make sure you 8 don't always use the same picture for Random 7 Crap.) I think Woot users would be willing 6 to accept this if you sell it as a measure 5 to help them get their Bowls of Creaminess, and 4 you can also point out that it'll help them 3 check out quicker. Anything else--even 2 using captchas--is subject to your typical 1 arms race.
Build a better bot
The market place is telling you something. They 21 want to get that bag o crap. So rather than 20 fight the scripts (RIAA v file-sharing anyone?) Build 19 a better bot.
Offer everyone an installed 18 app that is just as good or better than 17 anything a script kidee could put together. The 16 user installs your branded app and every time 15 the bag of crap is offered. The app will 14 automatically try to buy it. If the current 13 b-o-c is missed, the app has a "ticket" to 12 give it a better chance for the next b-o-c 11 sale. So if a user rolls their own script, they 10 don't get the "ticket" in line for the next 9 b-o-c sale, while users of the official 8 app do.
Between b-o-c sales the app can show 7 the current item for sale. Hell, make it 6 so that the user can tell the woot app to 5 look for "memory sticks"
Who will build their 4 own script, when the official woot b-o-c+ script 3 app is just as good or not better?
Additionally, woot 2 gets another way of connecting to the customer.
Your 1 customers are telling you what they want.
Give the user a choice between the original 10 price and a much higher price. You will 9 have to find some way to associate the buttons 8 with their respective prices - colour, position, perhaps 7 "emotional connotation" of the button - something 6 difficult to programmatically determine 5 but which only needs the user to connect 4 a button to a price. Easy, intuitive and 3 hassle free for the user, difficult and, more 2 importantly, risky for the scripter - especially 1 if you vary the method of association.
If you are willing to make javascript mandatory, you 20 can use a hashcash scheme to require, for example, ~30 19 seconds worth of client-side computation 18 for each request. (Of course that might 17 be 5 min on an iPhone or 1 second on a botnet 16 of 30 computers: a significant drawback.)
You 15 can also make scraping more difficult by 14 generating the page with (obfuscated) javascript 13 or (gag) flash.
You can also troll for bots 12 with invisible (via CSS and javascript) random 11 crap links.
You can detect 'bot-like' IP 10 addresses (by rate and by visits to honeypot 9 links) and redirect them to a special server 8 (e.g. one with extra CC verification such 7 as 'verified by visa' -- or merely one with 6 a captcha.)
But really, it's an arms race. :) And 5 one you may very well have to eventually 4 escalate even beyond captchas.
Which brings 3 me to: Why not change from a first-come, first-serve 2 model to a lottery model where bots don't 1 have such a large advantage over real shoppers?
Okay, I have a couple of questions more 44 than an answer because I have no experience 43 with the technology to know if it could/would 42 work or would help.
With the following goals:
1. Sell 41 the item to non-scripting humans.
2. Keep 40 the site running at a speed not slowed by 39 bots.
3. Don't hassle the 'normal' users 38 with any tasks to complete to prove they're 37 human.
My questions are:
-. Would a Flash 36 application, or Java applet, or Silverlight 35 or anything similar reduce the ease of screen 34 scraping enough to decrease the impact of 33 the bots?
I'm curious if these are as wide 32 open to external manipulation as typical 31 javascript/html.
While it is not standard 30 for web development and may not be 'good' from 29 an SEO point of view, it sounds like search 28 visibility isn't your problem if you have 27 millions of users. I believe that any of 26 these could still offer a very good looking 25 interface so your humans wouldn't be put 24 off by the design.
-. Could you put all of 23 your information in an Image? I've never 22 seen the part of woot you are referring 21 too, but what I'm suggesting is to place 20 any text that a human needs to know in a 19 human friendly image instead of a bot-friendly 18 textbox.
Oh, and to second something alluded 17 to in some of the other responses. Don't 16 miss the big opportunity you have: You have 15 LOTS of Demand from Bots, and those people 14 with Bots really buy right? Do you still 13 want their money? (Cause if not, I'll take 12 it.)
Do these people with the Bots have any 11 alternative to buy from you? Separate out 10 your bags of crap.
Have a woot subsite built 9 for bots, geared towards bots and let the 8 scripters have lots of fun AND pay you money 7 for it. Sell them crap and let them challenge 6 themselves against other scripters. It's 5 a whole separate market available to you.
If 4 they have an alternative where they can 3 win something AND get bragging rights about 2 it, they might be a little less inclined 1 to beat up on the little old human.
Forgive me if this answer was already submitted. There 20 are a lot of answers to try to read & understand 19 all of them.
Why couldn't you just change your purchasing API every 18 once in a while? Wouldn't that be completely 17 transparent to the human users and pretty 16 much kill most of the bot purchasers?
One 15 implementation would be to change the names 14 of the fields that the user has to fill 13 in and submit on the page after hitting 12 the "I Want One" button. How many times 11 a year do you actually sell BOC? Not that 10 often. So this would not be a huge programming 9 burden to have a different purchasing API 8 programmed, tested and ready for use every 7 time a BOC goes on sale.
Just make sure 6 the bots that are using the old and incorrect 5 API don't bring your server down. Maybe 4 host the BOC purchase API on a different 3 server each time too. That way the bots 2 can bring down a server that is not actually 1 being used by us human BOC purchasers.
If I understand right, your biggest problem 16 is with the screen scraping, not the automated 15 purchase itself.
If so, your most effective 14 step would be to defeat screen scraping 13 by randomly encoding the page so that it 12 look the same (kind of) but is always different 11 at code level. (use hex codes, java encoding, pictures, change 10 surrounding code structure...)
That would 9 force them to constantly rewrite their scraping 8 code and therefore make it that much more 7 expensive for them to buy your "crap" automatically. If 6 they can manage. They would probably continue 5 to hit your website for a while until they 4 realize they can't gain anything from it 3 and drop it.
The downside of confusing the 2 hell out of bots is that it will also confuse 1 the hell out of search engine crawlers.
use concurrent connection limiting per IP 3 address via either iptables on the server 2 (if it is Linux based) or use a dedicated 1 "router"
You should have some record of the users 34 who have purchased BOC most often, why not 33 just ban those accounts or something. Sure 32 legit users will be banned in this process 31 but you are a business providing a product 30 and if your are being abused by a group 29 of users and such you have the right to 28 refuse service to them. You have a lot of 27 info on your users including paypal and 26 bank accounts, you could ban those accounts 25 forcing the bot users to get new accounts. Certainly 24 I could come up with a script to buy BOC 23 all the time or just download one from the 22 net, but I have better morals than that. Never 21 actually having successfully purchased BOC, I 20 know the frustration of legit users who 19 would like to receive a BOC in the hopes 18 of getting a great deal. Perhaps instead 17 of offering a BOC as an individual item 16 every once and awhile, you could just give 15 it to random users every day. When they 14 receive an item they get a little note and 13 and an additional item saying they also 12 received a BOC. Then the only way someone 11 could get a BOC is if they legitimately 10 purchased something that only an actual 9 human would have wanted. There would be 8 nothing better than purchasing a coffee 7 maker or something and also receiving a 6 42" tv or something in addition to your 5 legitimate purchase. I think the majority 4 of script kiddies would no longer be interested 3 in your site if in order to get a BOC they 2 would also have to commit to a purchase 1 of more than 10 dollars.
Why not make the front page just an image-mapped 5 graphic (all one picture with no labels, tags, etc)? Easy 4 for a human to read and understand on pretty 3 much any device, but impossible for a bot 2 to interrogate. In essence make the whole 1 front page a captcha.
You will make enough on the lights today 39 to pay for the CAPTCHA program from Cisco!! We 38 are all used to them from buying concert 37 tickets and other things.. It only seems 36 fair. The way it is being done today is 35 upsetting some and raising questions about 34 a lottery or sweeps. I am sure you checked 33 into that before you tried but it is not 32 really a fun way to buy BOCs... It takes 31 all the excitement out!
Getting the BOC 30 first or a great product even by being on 29 the sight draws people to Woot. If there 28 is no reason to hang around and buy tons 27 of stuff you don't need while waiting for 26 the random BOC to come up, sales will drop 25 off. The CAPTCHA may be the only way to 24 defeat these people and still keep the excitement 23 of Woot.
I was one of the first to get it 22 to order a BOC last time and my first order 21 was taken dumped with the million shipping 20 and the second went through but was taken 19 out of my account later. I was upset. I 18 left Woot and have not purchased items like 17 I did in the past on other days. I was willing 16 to try it again, this way, today. I doubt 15 I will in the future without a CAPTCHA for 14 the fun stuff.
There are many sites trying 13 to be like Woot. Of course they are not 12 up to your level. I find myself reading 11 a product description, not because I want 10 the product, but I check in even for a laugh. I 9 would hate to see someone come in with a 8 fairer program and take away most of your 7 business.
Just my opinion. I know almost 6 nothing about bots and computers since I 5 am a nurse.. But my vote is to upgrade to 4 the higher level... The guys with the bots 3 would just have to get in line with the 2 rest of us and that is the way it should 1 be:) Lori
As for CAPTCHAing everyone, why not use 13 the Google solution of only requiring CAPTCHAs 12 from IPs you suspect as being bots, or even 11 just users that hammer the site? I'm sure 10 asking someone for a CAPTCHA when they purchase 9 isn't so bad if they've been hammering the 8 site anyway, its just about the same as 7 staying up and hitting F5 repeatedly. That 6 or maybe require a periodic CAPTCHA when 5 hammering, say every hundred (maybe smaller?) or 4 so refreshes, to stop alarm-bots from working. You 3 need some sort of CAPTCHA to prevent botting, but 2 you also need to account for the fact that 1 your real users will act like bots.
As a long time (4 year) user of Woot.com 54 and purchaser of a few bags of crap, amongst 53 the many other items now taking up space 52 in my garage, it seems that the solution 51 should be part of the overall Woot theme.
Use 50 captcha, but in a humorous vein. Much like 49 the $1,000,000 promotion, make a game out 48 of identifying yourself as a person. This 47 has, in the past delayed the "sell out" of 46 the BOC for a reasonable amount of time, while 45 people, like myself, scramble to figure 44 out the fairly simple but humorous puzzle 43 to enter a coupon code.
Also, while people 42 complain endlessly about the server errors, they 41 don't stop coming back. Part of the thrill 40 of a BOC in my opinion is the fact there 39 are a gazillion other people trying to get 38 one. If the servers throw an error, or 37 a funky page, it's a sign that I'm somewhere 36 in a group of way too many people trying 35 to get one of 1500 products.
If you put as 34 much creativity into building the puzzle, and 33 it is original enough, it will delay the 32 bots long enough to give everyone else a 31 chance. Incorporating a random word that's 30 captured as a code, putting an interim page 29 between the "I Want One" and the purchase 28 page, that requires some uniquely human 27 interaction, you've stopped the bots there, until 26 they figure out what needs to happen.
• You 25 haven't implemented a boring, and sometimes 24 painfully difficult to read captcha
• you've 23 made the process more fun,
• you've reduced 22 the load on the actual secure purchase server
• You'll 21 train the users that they will need to "DO" something 20 to get a BOC
• You'll stopped the bots at 19 the interim page, delaying their purchases 18 until most people have at least had a chance 17 to try and figure out the funny, but not 16 terribly difficult puzzle.
• Since being 15 random is what a BOC is all about, a random, and 14 changing puzzle/task would fit in simply 13 with the whole pitch of a BOC.
As you experiment, the 12 technology behind the interim page can become 11 more advanced, with random information that 10 can be captured for use in the purchase 9 page. Since
I have purchased, without the 8 aid of bots, or any scripts other than wootalyzer, which 7 I feel is an acceptable aid, 7 BOC's since 6 5/31/05. The best one, which I didn't get, was 5 the Please Please Me BOC. The B&D batteries 4 were also fun, but I'm guessing didn't stump 3 the bots, only frustrated the regular users.
Sometimes 2 the best solution for a technology issue, isn't 1 more technology.
A potential solution to your particular 23 problem (and not the general one) would 22 be to require users to be signed in if they 21 want to see the 'crap'. Only display the 20 crap prizes to users that happen to be logged 19 in. All other items can remain viewable 18 to non-logged in users as they always have 17 been. Then your loyal users are given first 16 priority to the crap.
You'd obviously have 15 to notify your users of this, perhaps with 14 a notification that this is being done to 13 increase the chances of real users finding 12 the crap.
If your specific problem is bots 11 harvesting for one particular type of item, then 10 take the least restrictive alternative and 9 only defend against that particular attack. This 8 option would then prevent captchyas and 7 the userability hit that you're concerned 6 about.
If the bots log in and start spamming, you 5 could force their log out and lock the account.
If 4 they're only there to get the bag o' crap, they 3 will leave fairly quickly and your page 2 won't be taking the massive hits. Forget 1 the highly technical solutions.
2 things:
server layer solution: mod_evasive 2 (if you use apache)
http://www.zdziarski.com/projects/mod_evasive/
front layer solution: reverse 1 captcha, or other non intrusive captcha
What if you randomized or encrypted the 4 form names and IDs, randomized the order 3 of the form fields, and made the form labels 2 a random captcha image, that'd make a script 1 attack a lot harder :-D
Make the whole bloody page a CAPTCHA!
Sorta 17 like Sesame Street... eight of these things, don't 16 belong here...
Put 9 items, 9 HTML forms, 9 15 I WANT ONE buttons on the screen.
(9's just 14 the number for the day... pick whatever 13 number you want to make the layout still 12 look good. 12 perhaps. Maybe customize 11 it some for the resolution of the loading 10 browser...)
And scramble them for each person.
Make 9 sure the BOC has to be "seen" to know which 8 one it is... of course this means the other 7 8 have to bee "seen only" also, to know 6 they are NOT the item to buy.
Make sure 5 you only use crazy-ass numbers to reference 4 everything behind the scenes on the page's 3 source.
Fine, so the BOT sees its BOC time... but 2 it'll be a wild guess to pick the right 1 HTML form to submit back for processing.
There is probably not a magic silver bullet 5 that will take care of Bots, but a combination 4 of these suggestions may help deter them, and 3 reduce them to a more manageable number.
Please 2 let me know if you need any clarification 1 on any of these suggestions:
- Any images that depict the item should be either always the same image name (such as "current_item.jpg") or should be a random name that changes for each request. The server should know what the current item is and will deliver the appropriate image. This image should also have a random amount of padding to reduce bots comparing image sizes. (Possibly changing a watermark of some sort to deter more sophisticated bots).
- Remove the ALT text from these images. This text is usually redundant information that can be found elsewhere on the pages, or make them generic alt text (such as "Current item image would be here").
- The description could change each time a Bag of Crap comes up. It could rotate (randomly) between a number of different names: "Random Crap", "BoC", "Crappy Crap", etc...
- Woot could also offer more items at the "Random Crap" price, or have the price be a random amount between $0.95 and $1.05 (only change price once for each time the Crap comes up, not for each user, for fairness)
- The Price, Description, and other areas that differentiate a BoC from other Woots could be images instead of text.
- These fields could also be Java (not javaScript) or Flash. While dependent on a third-party plug-in, it would make it more difficult for the bots to scrape your site in a useful manner.
- Using a combination of Images, Java, Flash, and maybe other technologies would be another way to make it more difficult for the bots. This would be a little more difficult to manage, as administrators would have to know many different platforms.
- There are other ways to obfuscate this information. Using a combination of client-side scripting (javascript, etc) and server-side obfuscation (random image names) would be the most likely way to do it without affecting the user experience. Adding some obfuscating Java and/or Flash, or similar would make it more difficult, while possibly minimally impacting some users.
- Combine some of these tactics with some that were mentioned above: if a page is reloaded more than x times per minute, then change the image name (if you had a static image name suggested above), or give them a two minute old cached page.
- There are some very sophisticated things you could do on the back end with user behavior tracking that might not take too much processing. You could off-load that work to a dedicated server to minimize the performance impact. Take some data from the request and send it to a dedicated server that can process that data. If it finds a suspected bot, based on its behavior, it can send a hook to another server (front end routing firewall, server, router, etc OR back-end web or content server) to add some additional security to these users. maybe add Java applets for these users, or require additional information from the user (do not pre-fill all fields in the order page, making a different field empty each time randomly, etc).
Why dont you just change the name and picture 3 of the BOC every time you offer it? It would 2 become part of the fun of wooting to see 1 the latest iteration of the BOC.
Create a simple ip firewall rule that blacklists 2 the IP-address if you detect more than a 1 max. number of requests coming in per second.
You are making this way to hard. I will 8 probably kick myself since I just won a 7 BOC from the site today with a bot site, but 6 just put the RANDOM CRAP text in captchas 5 on the site main page. The bots all look 4 for the text "RANDOM CRAP". So you basically 3 just avoid triggering them in the first 2 place. Anyone looking with their eyes will 1 see that it says "Random Crap".
A rather simple solution is to track the 7 time difference between rendering the forms 6 and getting the response: bots usually have 5 extreme short response times of milliseconds, no 4 user could do that; or extreme long response 3 times of several hours.
There's a django 2 snippet doing it, along with a more detailed 1 description:
You know, if you published your RSS feed 5 using pubsubhubbub, people wouldn't have 4 to hit your web page over and over again 3 to see the next thing in the Woot-off, they'd 2 just wait for it to show up on their Google 1 Reader.
for this purpose i use Cloudflare as it 3 doesn't affect my site but blocks any malicious 2 user with CAPTCHA automatically and gives 1 you more features .
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.