Battling Signup Spam on the Bill Tracker (Shallow Thoughts)

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Sun, 12 Dec 2021

Battling Signup Spam on the Bill Tracker

I've spent a lot of the past week battling Russian spammers on the New Mexico Bill Tracker.

The New Mexico legislature just began a special session to define the new voting districts, which happens every 10 years after the census. When new legislative sessions start, the BillTracker usually needs some hand-holding to make sure it's tracking the new session. (I've been working on code to make it notice new sessions automatically, but it's not fully working yet). So when the session started, I checked the log files...

and found them full of Russian spam.

Specifically, what was happening was that a bot was going to my new user registration page and creating new accounts where the username was a paragraph of Cyrillic spam. So the new user's username would be something like

Профuт даже неопытнoгo нoвичка нaчинaeтcя oт 799 3eленых. Чтo трeбyетcя? cовcем немного, всeгo пару шaгов Вы неpeaльно 6удетe yдивлeны нacколько вce нeслoжно u будете жaлеть тoлько oб однoм: почемy этoго всегo нe былo в прошлом https : / / forms.yandex.by/u/61b654952ec1745fdd6e4b68 hig
(I added the spaces in the URL.)

Google translates that as

The profit of even an inexperienced newcomer starts from 799 women. What is required? the whole a little, just a couple of steps you are incredibly 6you are surprised how easy it is u you will only regret one thing: why is it not in the past https : / / forms.yandex.by/u/61b654952ec1745fdd6e4b68 hig

These signups were happing once a minute. Yow, that's a lot of accounts being created while I thought the BillTracker was sitting around doing nothing.

I was initially mystified as to the point. What good does it do a spammer to create an account with spam as the username? But fortunately, some experts I knew on a Linux mailing list set me straight.

See, the registrations also included email addresses. An email address is optional when you sign up on the BillTracker: if you register an email address, it will email you when bills you're tracking come up for a hearing, or otherwise change status. But if you do provide an email address, the BillTracker sends you a confirmation message, and doesn't email you about bills unless you've confirmed your email address.

So when the Russian bot signs up an email address with a username of "Я спам, http://evilurl.ru" and an email address of victim@example.com, the BillTracker obligingly sends the victim an email that says, basically, please confirm your account, Я спам, http://evilurl.ru -- in other words, they have used my BillTracker to send their spam.

Of course, as soon as I realized that, I shut down the registration page, then quickly added some checks on username length and presence of suspicious character sequences like :// before I re-enabled it.

(I considered banning anything in Cyrillic. It's admittedly unlikely that a real user would want to sign up for the New Mexico Bill Tracker using a name in Cyrillic. But it seems like a blunt instrument; I'd prefer to be able to solve the problem without banning foreign character sets.)

Adventures in Captchas

Several people suggested that I should have a captcha on my signup page. But I was resistant. My opinion of captchas has been colored by the many hours I have lost to Google's ReCaptcha, which, if you're not signed in to a Google account, sometimes gives you page after page of images, regardless of whether you answered the previous one accurately. I've sometimes had to go through eight or more screens before finally being let through; more often I just gave up. Admittedly, Google has improved ReCaptcha recently, and now I seldom have to go through more than three screens, though for some reason their "traffic light" captchas never, ever work. I'd love to know what they think "traffic light" means, because it clearly isn't what I think it means.)

But Rick Moen pointed out that there are captchas and captchas. For instance, security legend Bruce Schneier's blog, Schneier on Security, has long included a captcha that says: Fill in the blank: the name of this blog is Schneier on ___________. Obviously any human can fill that out, but a bot can't unless it's specifically coded for that blog.

So I wrote a little captcha script that chooses questions and answers randomly from a file. The file with the questions and answers isn't checked in as part of the code, so you can't get the answers by looking on github, but I made the questions New Mexico specific, so anyone who lives here should be able to answer any of them easily. (For my few out-of-state users, the answers are easy to look up with a search engine.)

If a single question works for a high-profile target target like Bruce Schneier, hopefully whoever's trying to spam the New Mexico Bill Tracker won't be motivated to chase down the answers to state-specific questions.

So far it's working. I've blocked the handful of IPs the spam is coming from (they seem to add a new IP every day or so), and the new IPs that aren't yet blocked haven't been able to create a new account.

The Mystery Continues

That leaves one more mystery. Once I added the username filter that blocked names that were too long or contained suspicious characters, I noticed that a few accounts per day were still being created with random 10-character usernames like "zhxqbslrmu" (in the Roman alphabet, not Cyrillic).

Adding the captcha shut these down. But I'm still puzzled about why they were happening. I understand (once Ivan and Rick showed me) why a spammer might benefit from spam text and a link in the username, so the spam text gets emailed to the victim. But what were these 10-character usernames accomplishing?

I still have no idea. Maybe someone who knows will see this and enlighten me.

Tags: , , , , , ,
[ 18:50 Dec 12, 2021    More tech/web | permalink to this entry | ]

Comments via Disqus:

blog comments powered by Disqus