Skip to content

Microsoft search bots are in love with my blog.

For the past several days (today definitely included), I’ve had craploads of “visits” from a slew of connections with addresses all looking very similar. A few examples:

bl1sch4090511.phx.gbl
bl1sch4084504.phx.gbl
bl1sch4084204.phx.gbl

All of them end with this “.phx.gbl”, and when I searched the IP addresses via WHOIS, they all seem to belong to Microsoft. All these visits come via searches for inane terms which I know don’t return my site anywhere near the top. Really, do you even think my page is on the first 50 pages of search results for the word, “would”?!?

I know these hits are from a search bot, but why is my site registering hits from them at all? Do the bots load all the pages in the search results to confirm they’re reachable? It’s starting to piss me off because it throws off my stats quite a bit, and I wish there was a way to ignore them.

5 Comments

  1. you can’t exclude all “visits” with a .phx.gbl? Seems like something that any analytics software would allow in its filtration system.

    Friday, June 6, 2008 at 1:45 pm | Permalink
  2. I can have it ignore referrals, but I’m not sure if that means bot visits. Maybe it does.

    I just tried modifying the filter I had in there from “phx.gbl” to the following:
    *.phx.gbl

    But I don’t know if it looks at wildcards like that.

    Friday, June 6, 2008 at 2:04 pm | Permalink
  3. Make sure that the filters don’t take Regex–your dots will mean any non-newline character.

    If they DO take Regex, try this:

    (.*)\.phx\.gbl

    Friday, June 6, 2008 at 3:00 pm | Permalink
  4. The parens aren’t actually necessary. That’s what I get for dreaming about mod_rewrite.

    .*\.phx\.gbl

    Friday, June 6, 2008 at 3:01 pm | Permalink
  5. One of the default filters was typed just like this:

    images.google.com

    So I don’t think i need escapes or anything…but it would be nice if a wildcard worked. I can tell that simply using the asterisk does not.

    Friday, June 6, 2008 at 3:03 pm | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*