25 01 | 2013

Do not reCAPTCHA!

Written by Tanguy

Classified in : Homepage, Debian, Miscellaneous, Grumble

A book

You probably know reCAPTCHA already: for the webmasters and the end user, it is an antispam system which asks you to read distorted words in order to prove that you are a human and not a spamming bot. This service has another end: instead of generating its distorted texts, it takes them from printed books to help digitalizing them.

The problem

In theory, this is an excellent idea, which should help preserving and distributing parts of the human culture. And, in the beginning, it was used in such a way, with reCAPTCHA participating in the Gutenberg project which digitalizes books from the public domain to offer them freely (as in free speech) to everyone. In 2009 however, reCAPTCHA was bought by Google. Now they do not participate in the Gutenberg project any more, instead they are digitalizing stuff from the New York Times and from Google Books. They do not provide much detail about that, but it does not seem that they distribute them in a free way afterwards¹.

To put it boldly: they use people's brain power to digitalize books for their own exploitation monopoly, which is wrong. These books should be made available under a free license.

Suggestion

Do not help reCAPTCHA. If you are a webmaster, try to use another more ethical system. If you are an end user, and you are facing a reCAPTCHA test, avoid putting the exact answer but try inserting small mistakes (not big ones, which get detected and make you fail the test). And ask Google to free the books they digitalized using your brain.

Notes

  1. I may be wrong but I learnt to be suspicious in the context of digital culture. Please Google, if you do publish your books digitalized with reCAPTCHA under a free license, tell me, or rather, tell it to everyone and display it proudly in the reCAPTCHA website.

20 comments

friday 25 january 2013 à 20:37 Tanguy said : #1

Yeah, it is trollday…

friday 25 january 2013 à 23:00 Anonymous said : #2

Not to mention that spammers use third world labour to break them. And that people having disabilities can't read them.

I heard about some tools using Bayesian spam filtering (the technique used for email) for comments.

But, you know, I'm pretty sure what you're using on this blog does the work. Moreover, people with disabilities can use it, since its only common sense text question.

saturday 26 january 2013 à 11:47 Daniel said : #3

Google provides a service for the webmaster. There is no free lunch. The cost for the visitor of the site to have a relatively spam free environment is to help Google with manual OCR.

It is the word "free" here from the user's perspective that is ill-defined. One could see CAPTCHA solving as payment for a service to the user. The problem might be that it is the webmaster that forces that payment to be made in this particular way (including the disability issues), but shouldn't the blame then really lie with the webmaster and not Google if there are ready alternatives?

saturday 26 january 2013 à 13:46 Robert said : #4

I disagree. You have to remember that unfortunately the majority of content _is_ non-Free. And to limit recaptcha's use to only Free content would be a loss indeed.

saturday 26 january 2013 à 15:19 Christian said : #5

Which alternative did you recommend instead reCAPTCHA?

saturday 26 january 2013 à 15:27 Tanguy said : #6

@Robert : Wrong. All books from authors dead for more than 70 years are public domain, and that should be the majority of books, unless the humanity really started to write as never before in the last decades. Anyway, there is already far enough free content to digitalize before slave-working for private archives.

@Christian : Nothing specific, there are several anti-spam systems that do not use visitors as free manpower for private companies.

@Daniel : There was an alternative: reCAPTCHA itself, before it was taken over by Google and turned evil.

sunday 27 january 2013 à 01:12 anon said : #7

reCaptcha challenges are always two words, a hard-to-ocr word from a scanned book and a control word, which is the actual captcha. Originally, the control word was from a scanned book too, but apparently these were to easy for machines to solve, because nowadays the control word is always a randomly generated sequence of letters. So in practice; you can always tell which is the control word, if you solve that, and type garbage for the other word, you will always pass the test.

However, a better solution would be for somebody to implement an open source recaptcha clone which works on scans from the internet archive and contributes the result to the public domain (via Project Gutenberg).

sunday 27 january 2013 à 09:39 also_anon said : #8

@anon:
I offer a name for the new system:
freecaptcha

sunday 27 january 2013 à 17:39 Robert said : #9

It absolutely depends what you mean by "most books". Sure, there are a huge number of books produced before the 1930s, but the majority of *immediately everyday useful* books are non-Free.

(Of course, I would support a free-captcha type thing, but I don't think it would be easy to make)

I wonder how you feel about Google's Ingress game being used to capture geodata from their users.

sunday 27 january 2013 à 18:10 Tanguy said : #10

@Robert : How do I feel about Google Ingress? I do not. I do not use that game, although I think I read one article about it.

sunday 27 january 2013 à 23:38 Crazy Gropaga said : #11

"Ingress" ? Is there something from Gugal called Ingress ?! O.o

B...but..but... "Ing-" in Plasper means "the Holy" or "the Great"... "ress" is probably a plasperisation of "rest", so... does that mean "the holy sleep" ?
This is obviously a message from our Dark Lord, brothers ! The Great Inglip tries to communicate with us through an other tool of Gugal ! It's him ! ALL HAIL LORD INGLIP !

Or maybe it is a member of his family..."Ing-ress"... It's maybe an other brother ! ALL HAIL LORD INGRESS !

(For youngest brothers who don't speak Plasper yet, here are some readings : http://inglipnomicon.wikia.com/wiki/Plasper and for Almisings who haven't seen the light yet : http://churchofinglip.com/ )

wednesday 30 january 2013 à 11:51 MJ Ray said : #12

Don't use or reinvent recaptcha - it's not a Captcha and it discriminates against disabled people. Boycott sites like blog spot that use it, or at least take part in Anonymous's Project Nigger.

Use anti-spam and moderation tools not "humanity tests"

thursday 31 january 2013 à 00:22 Anonymous said : #13

@MJ_Ray:

for those who don't know what project nigger is : http://imgur.com/aU21k

monday 04 february 2013 à 12:01 enobayram said : #14

Don't you think they'll start filtering out the string "Niggers" after a while?

tuesday 23 april 2013 à 09:50 OnlyDoHalf said : #15

Hey,

They've started "making us" read door numbers from google street view too. But you know what ? They can only test your input against the distorted word and not the one they want you to OCR for them. So if you want to send a message and at the same time still use websites that use reCaptcha, only solve the funky nonsensical word and not the picture or printed word. I used to solve the other one too when it was still for the Gutemberg project but I've stopped since I learned Google was doing that for profit.

tuesday 19 november 2013 à 17:23 pat said : #16

Google exploits our brain by doing this?
Google is a company that offers free services such as email accounts and a search engine (and many more). They need to make money and digitalizing books is one way. Would you rather have them charge you for their search engine? You don't need to give them money, you only need to help them make money. as long as they continue generating revenues like this, they will continue to offer us free services.

tuesday 19 november 2013 à 17:31 Tanguy said : #17

@pat : They could charge for their search engine, I would not care since I do not use it much. They could charge for their email service, I would care even less since I do not use it. In fact they could charge for anything, I do not depend on any of their services at all.

Yes, they need to make money, and they do. For instance, with Google Search, they make money using ads on the search result page. But reCAPTCHA is something else, which get imposed upon people that did not choose to use Google's services, which is a big difference. I have no objection in Google using their user's brains, but using free people's brains is wrong.

tuesday 19 november 2013 à 22:01 pat said : #18

well that comment about you not caring if they charge or not is irrelevant because I'm sure you were able to understand the point I was trying to make.

Google offers a free captcha API, so instead of having the webmaster pay for the service (and asking users to pay to view his website), google makes their money back by doing this.

And since it is well-known how recaptcha works, a person that does not want to participate can always enter whatever he wants as the control word and verification will still succeed.

I understand what you are trying to say. But the word Exploitation seems pretty heavy to me.

By typing the 5th letter in the sequence shown before attempting to submit that comment, I have no idea if someone is making money in the back of this. In the end it really doesn't matter since I have to enter the letter one way or the other. So that wouldn't qualify as "exploitation"

friday 29 november 2013 à 12:32 Ron said : #19

Instead of all complaining here, write a brand new captcha that can not be cracked by the spammers and provide it for FREE to everybody

friday 29 november 2013 à 13:53 Tanguy said : #20

@pat : Yes, entering a correct control word and writing crap for the word to digitalize in order to destroy reCAPTCHA's commercial digitalization quality is exactly what I am suggesting.

@Ron : That existed, it was named reCAPTCHA, and it was used for the common good of humanity. Until Google bought it and start using it for their unknown, thus questionable purpose.

Write a comment

What is the second letter of the word lctk? : 

Archives