Tuesday, January 31st, 2006

foiled again

So I finished my earthquake widget about a week or so ago. I was waiting to send it to the Yahoo Widget Gallery so that the screen snapshot they take would have a nice big earthquake on it. Sure enough, on friday morning there was a 7.6 earthquake near Indonesia, which was big enough.

So I uploaded the widget on friday evening, and they only post new widgets on weekdays, so it went online yesterday afternoon. Today, there were several comments with vague comments about a load error (nobody actually bothered to specify what the error was). Since my desktop at work is Linux, I couldn't try it out there. I did notice that the USGS had redesigned the layout of their Earthquakes web site. The RSS feed seemed fine, though.

When I came home I found that my widget at home didn't work either. Further investigation showed that the USGS had set up a redirect from the old RSS URL to the new one, but the widget engine wasn't smart enough to automatically follow the redirect. So, I fixed the RSS URLs that it uses.

Furthermore, the RSS feed has links to the web pages for each specific earthquake. This is used by the widget to open a browser window when you request more info by double-clicking on the earthquake circle. However, the RSS feed currently points to .htm versions of the earthquake info pages - but they are now .php. I coded a workaround to simply strip off the ".htm" extension if present, because it appears that the pages work without having to explicitly include the extension.

Anyway, how frustrating that the very next day after uploading a shiny new widget to the gallery, and after over 3000 downloads, the USGS switches over to a new web site that breaks the widget. Oh well, I'll upload a fixed one right now that Yahoo should post tomorrow, that should be more resilient in the face of future changes.
(Leave a comment)

Tuesday, January 24th, 2006

email address obfuscation

At the bottom of each of my web pages, I have a footer that looks something like this:

Greg Hewgill greg@hewgill.com

Having my email address so easily accessible for so many years (hewgill.com celebrates its 10th anniversary next year) has undoubtedly helped contribute to the incredible volume of spam I get. However, I don't want to reduce the usability and accessibility of my site, and if people would like to click on my email address to email me, they should be able to. Therefore I have refused to obfuscate my email address as it is published on my own web site.

It occurred to me that the risk of an automated email harvester robot picking up my email address could be reduced significantly by using a bit of javascript. If I avoid mentioning the actual email address as literal text in the HTML source, then the email harvester robots will probably miss it. Here's the code I used:

function email_address(a) {
    for (i = 0; i < a.length; i++) {
        document.write(a[i]);
        if (i == 0) {
            document.write('@');
        } else if (i+1 < a.length) {
            document.write('.');
        }
    }
}
function email_link(a) {
    document.write("<a href=\"mailto:");
    email_address(a);
    document.write("\">");
    email_address(a);
    document.write("</a>");
}

Then, in the HTML source where you want to use a clickable email address, do something like the following:

Greg Hewgill <script type="text/javascript">email_link(['greg', 'hewgill', 'com'])</script>

In this way, users whose browsers understand Javascript will be able to see a clickable link just as the example at the top of this post. However, automated email harvester robots won't see anything resembling an email address, so they won't pick it up. It seems unlikely to me that an email harvester would go to the trouble of actually executing embedded javascript code (if they do, I can think of a lot more fun things to do to them).

One risk with this method is it makes the email address unusable for somebody whose browser does not understand Javascript. However, with so many sites using "Ajax" and other Javascript-based technologies to offer routine functionality, I think this is a small risk today.

Having said all that, I still haven't deployed this method on my own web site. But it's live on Amy's web site if you want to see it in action.

(11 comments | Leave a comment)

Thursday, January 19th, 2006

minilink reaches 10k links

The minilink (lnk.nu) link shortening service first went online in october 2004. Today, a milestone was passed - there have been more than 10,000 links created through the service. Although 3,291 of those links were created and never used, this is a milestone nonetheless. In that time there have been well over half a million hits (558,464 as of this writing) to shortened lnk.nu links.

In recognition of this milestone, I added a few more statistics to the minilink statistics page.

Almost all the nasa.gov links are "demo" links that I preloaded that show up on the minilink home page.

You might be curious about the large spike in link hits near the beginning of december 2005 (see the bottom graph). This was some kind of Flash animation that seemed to be hit by a lot of myspace.com users. Because of the volume of hits, I think it may have been some sort of worm-like behaviour but I didn't investigate.
(1 comment | Leave a comment)

Sunday, January 1st, 2006

othello

Many years ago (november 1997 to be exact), I was home sick from work for a few days, and decided to experiment with some of the new browser-side scripting capabilities that were on the leading edge of browser technology at the time. My project was to implement a game of Othello (or Reversi, also known as many other names) entirely in the browser.

I did this by implementing an Othello playing engine in Java, then loaded that applet into a web page and called methods in it using JavaScript embedded in the page. This worked pretty well; all browsers of that time eagerly supported Java applets without complaint. Because Internet Explorer was the foremost browser at the time, I also used a Microsoft ActiveX control to display an animated "genie" (similar to the infamous "Clippy" paperclip in Office), which did speech synthesis and recognition.

Times have changed, and today ActiveX controls are all but dead and Java applets are no longer acceptable for implementing client-side logic. With modern browsers not even including Java by default, my Othello page sat broken and neglected under the tide of progress.

After eight years of ignoring the problem, I finally decided to rework my Othello game implementation. The new version uses only Javascript (no Java applets), and should work in all modern browsers (I've tested it in Firefox, IE, Opera, and Safari). In the interest of standards compliance, the page also validates as XHTML 1.0 Strict.

Othello Game

Let me know if you have any trouble using this in your browser!
(2 comments | Leave a comment)

Friday, December 30th, 2005

livejournal backup copy online

One of the reasons I wrote ljdump was to use the backup copy to generate a simple HTML presentation of my journal directly on my web site. This is duplicating some of the functionality that is already available on livejournal itself, but I believe my presentation is much easier to navigate especially when you're looking in the archives for a particular post by topic or date.

My journal is here. The year links near the top permit navigation by date; the tag keywords below that permit navigation by topic. (I went back and added tags to all my previous posts.)

If you're interested in trying this out for yourself, let me know and I'll send you the code. You'll also need the following:

You will need to edit the XSLT to generate something that fits into your web site layout instead of looking like it came from mine.

(Leave a comment)

Tuesday, December 20th, 2005

dns loc with ajax

I've always used the LOC to Maps web page to check to see whether my DNS LOC records work properly. I decided to try to rewrite this application using modern web technologies to see how well it would work.

The result is my DNS LOC lookup page. This page is all done with Javascript and XML-RPC and Google Maps with a Python XML-RPC back-end.

Some names to try might be:

  • hewgill.net
  • macnugget.org
  • ivo.nu
  • yahoo.com

Let me know whether it works for you and if you find any problems!

(23 comments | Leave a comment)

Friday, November 18th, 2005

lnk.nu

Apparently somebody involved in the registration of the "lnk.nu" domain has got something out of whack. The domain currently points to a "Domain Available!" web page, while my registrar (namecheap.com) correctly reports an expiration date of 11/19/2006 (next year). I've sent a support request to namecheap and we'll see how effective their support system is.

Meanwhile, lnk.nu is offline but the minilink service can still be reached through the minilink.org name (although this is somewhat inconvenient). My apologies for the interruption in service. I'll try to get it restored as soon as possible.
(Leave a comment)

Monday, October 31st, 2005

domain hijacking

I recently transferred some domain names from register.com to namecheap.com. The way you do this is to go to the new registrar (namecheap.com), enter all the domain info, pay for one additional year, and they do all the work of transferring the domain name from the old registrar (register.com). Register.com sent me some emails during the transfer process to notify me that a requested transfer was taking place, and to go to an indicated URL if the transfer request was not correct. Not! What on earth are they thinking? I did nothing (as indicated) and the transfer request was successful. Anybody could have hijacked my domain names out from under me, if I had not seen the domain transfer emails. Now, namecheap.com supports locking the registration record so that this can't happen again.

For .org domains, transferring a domain requires a code called EPP or something that you get from your old registrar. So I had to log in to my register.com account to get that code. But .com and .net require no such thing. Remember who runs .com and .net? Verisign! They continue to be evil by default.
(3 comments | Leave a comment)

Wednesday, May 4th, 2005

per-process bandwidth accounting

Dear Lazyweb,

I want to track bandwidth used by processes on my colo server on a per-process basis. I'm running Apache, BitTorrent, Freenet, and miscellaneous other things like automated backups. I have a monthly bandwidth cap of 1800 GB. I'm running FreeBSD 5.3. What are my options?
(20 comments | Leave a comment)

Wednesday, April 20th, 2005

the accursed www

The common www part of web server addresses has annoyed me for a long time. It's unnecessary, ugly, and annoying to pronounce. There is even a site, no-www, that is dedicated to the effort of eliminating the www wart on web addresses.

[info]dbaker today pointed me to an easy way to eliminate the www using Apache. Instead of using the cumbersome mod_rewrite as suggested on the no-www site, do something like this:

<VirtualHost *>
    ServerName www.example.com
    Redirect permanent / http://example.com/
</VirtualHost>

Then, remove the ServerName www.example.com line from the real server configuration section. This assumes that you are using NameVirtualHost, which isn't the default but probably should be.

This configuration quietly redirects any URL with www to the corresponding URL without. This improves the way your URL looks, improves search engine efficiency, and promotes world peace. Well, maybe not that last thing.

(9 comments | Leave a comment)

Wednesday, March 23rd, 2005

broken paradigm

You know what I hate about Gallery? It breaks the Back button. For example, go here. Then select page 2 (right above the date). Then hit the Back button. You would expect that you would go back to page 1, right? Nope, you're left staring at page 2 wondering why the Back button didn't work.

(Note: For the same reason that this is broken, going here might take you directly to page 2. If that's the case, select page 1 then hit the Back button.)

This is, of course, just one of the many things I hate about Gallery.
(17 comments | Leave a comment)

Wednesday, March 9th, 2005

be careful what you wish for, you just might get it

A few weeks back I set up a new dedicated server with 800hosting.com. One of the attractive aspects of their dedicated server plan was the 1800 GB/month transfer limit. My web site currently serves about 4 GB/month. No problem!

So a couple of weeks ago I was trying to think up ways to use up all that extra bandwidth. I've still got some ideas for higher-bandwidth projects, but an easy one sort of fell into my lap. South By Southwest is offering a free download of music from over 750 bands via BitTorrent. The full download is 2.6 gigabytes. I thought I'd put some of my spare bandwidth to good use.

This seems to be a very popular download, as my server was uploading at a rate of 29.6 Mbits/s at its peak. It averaged about 24 Mbits/s over a full day. At that rate, I would transfer something like 7500 GB in a month, which would put me way over my transfer limit. I have already transferred 445 GB of my monthly limit, mostly in just the last few days. So, I have had to limit the upload rate to 500 kB/s to make sure I don't exceed my transfer limit.

1800 GB used to seem like a huge limit. Now it seems so small.
(5 comments | Leave a comment)

Friday, February 25th, 2005

time to move on

I've been thinking about setting up a dedicated web server for a while. A couple of days ago I was again browsing hosting solutions, and I tried to view to my analog statistics to see how much data my site transfers per month. Coincicentally, at that moment my web server was down. Two hours later, my DSL connection at home went down and I had to call SBC to make them fix it. It was down for an hour. Those two things together pushed me over the edge.

I've been using [info]moonwick's lasthome.net service for a couple of years, but recently he's decided to exit the hosting business. It's been great, I've had nothing but good service, but it's really too much for a one-man operation. I don't blame him.

So I set up a dedicated server at 800hosting.com and am in the process of moving over my web sites. So far it's working well. With the package I've got, the hardware is their problem but 100% of the software is my problem. It's nice because I have a whole box to myself, but it's also completely my fault if I screw it up. We'll see how that goes.

Once I get my main web site moved over, there are at least four or five other sites that I need to move. For example, minilink is currently running on my desktop machine at home. I have to set up PostgreSQL on the hosted server before I can move that. I think I know what I'm going to be working on this weekend!
(6 comments | Leave a comment)

Wednesday, February 16th, 2005

site identity and phishing

Netcraft is reporting that the next version of Firefox will turn off support for IDN by default. This support allows web sites to register their names with characters from the full Unicode character set, allowing names from any written language.

This support is being disabled in the name of the fight against phishing. It is possible to register a domain name that appears on the screen exactly like another domain name, but really has different character values. For example, http://pаypal.com looks exactly like http://paypal.com but uses the Unicode character U+0430 (Cyrillic Small Letter A) instead of the usual U+0061 (Latin Small Letter A). This different may or may not be apparent in your browser, and you may or may not be able to click on the first link.

The real problem here is that the process of verifying that a link really goes to where it claims to go, is expected to be performed by the end user's visual inspection of the link as displayed by the browser. The massive proliferation of phishing scams shows that end users will click on just about anything. The average end user cannot be expected to accurately discern whether a domain name is spelled correctly before clicking.

Since computers are so good at comparing data, site identity should be verified by the browser when requested by the user. For the user who doesn't look before clicking, there isn't much that can be done without impacting the normal browsing process. But for the user who today is expected to manually verify that the site name appears correctly in the status bar, we can do better. It's likely that every site that is subject to phishing attacks has an SSL certificate, so the browser should offer an easy method (perhaps a "Verify Link" option on the right-click menu) to make an SSL connection to the site in question and present the details to the user for inspection. The organizations charged with issuing SSL certificates have an obligation to ensure that they are not supporting the spoofing problem, ie. I hope they would not issue a certificate to a "M1crosoft Corporation".

There is indication that this feature will be restored sometime in the future. However, right now it's a reactionary response to the desire for a technical solution to the phishing problem. We can do better without disabling important browser features.
(3 comments | Leave a comment)

Monday, January 24th, 2005

search engine spider gone awry

So I happened to look at the log file for one of my web servers today, and noticed lots of requests for the Apache manual, which is installed and served by the default configuration. The requests were all coming from "msnbot/0.3", a search engine spider for MSN. I changed the Apache config to remove the manual, and it seems that the requests are coming in more frequently now that they return 404. I wonder how much time it will take before msnbot gives up.
(3 comments | Leave a comment)

Tuesday, March 23rd, 2004

image linking

As a followup to [info]goulo's post about other people linking directly to images on a different server, I finally put together some stats on image inlining from my own server.

The new Image Linking Hall of Shame is the result of my efforts at gathering some statistics on this. The amount of traffic that a single message board post can generate (such as the one at crapforum.nl) is quite surprising.

In comparison to the rest of my server, the amount of traffic generated by image inlining is small. So I'm not really concerned about that aspect of inlining. The thing that bothers me is that people seem to feel free to use my images without mentioning the source. I don't think a single one of the other sites currently listed in the Hall of Shame mention where the picture came from, or offer a link back to where it was found.

The above reason is why I added the copyright notices to all the images in my image gallery. Under United States copyright law, such a notice is not required to be present to ensure at least a minimal level of copyright protection. However, adding the notice serves a few purposes:

  • Remind people that the image is in fact a copyrighted work
  • Provide a URL link (though unfortunately it can't be clickable) back to the source
  • Include the image caption, if available.

A somewhat more audacious use of somebody else's images is to present them and then claim them as your own. Somehow, "HondaH8r" likes driving my car, and claims to have had problems with it. That's news to me!

Finally, I know that if I don't mention this myself, at least one reader will feel obliged to mention that it's easy to replace an existing image with a different one of, shall we say, questionable taste, or to use mod_rewrite to automatically augment or replace images served via an off-site referer. Hopefully my position is clearer now - I don't really mind people using my images as long as they say where it came from. My image annotations should help people do this without having to think about it.

(12 comments | Leave a comment)

Friday, February 27th, 2004

photo gallery copyright

While perusing my web server logs recently, I noticed several referer hits from a German train message board post. While discussing freight trains, somebody wrote a post that linked directly to one of my pictures taken at Tehachapi Loop. Now, I don't really have a big problem with this sort of thing, especially when the poster says things like "found a few spectacular pictures from Tehachapi Pass", but there was no credit given to the photographer (me) or a link back to my site.

So I finally got around to doing something I've been thinking about for a while. I modified my photo gallery software to add a copyright notice to the bottom of every image. The thumbnail images have just a copyright notice with the date and my name; the larger ones have the copyright, URL to the image page, and the description (if available). Now, the inlined image on the train message board shows the description, copyright, and URL to the original.

I still have a little bit of work to do. Currently all the images claim they are copyright 2004, but I'm going to add the ability to set a specific copyright date for each image or set of images. Also there are a couple of images which were not actually created by me, which shouldn't have my copyright. I'll get all this sorted out soon.

Initially I was unhappy about how the copyright notice looked on the thumbnail pages, but I reduced its size a little bit and I think it looks okay now. What do you think?
(7 comments | Leave a comment)

Wednesday, February 11th, 2004

internaciaj domajnaj nomoj

Lastan jaron, la IETF publikigis normon de internaciaj domajnaj nomoj, nomiĝas IDNA (angla: International Domain Names in Applications). Ĉi tiu normon konsistas el RFC 3490, RFC 3491, kaj RFC 3492.

Interreta domajnaj nomoj nur uzas la signojn A-Z, 0-9, kaj "-" (streketo). Internaciaj nomoj (en Unikodo) tradukas al tiuj signoj per algoritmoj nomiĝas Nameprep kaj Punycode. La algoritmo Nameprep unue preparas la nomon per normaligas la literojn por evitas konfuzon. Ekzemple, la germana litero "ß" tradukas al "ss"; la litero "æ" tradukas al la du literoj "ae".

La algoritmo Punycode estas malsimpla transformo kiu tradukas Unikodan nomon al ASCII-a nomo. Ekzemple, la nomo "ĝangalo.com" tradukas al "xn--angalo-o5a.com"; la japana nomo "理容ナカムラ.com" tradukas al "xn--lck1c3crb1723bpq4a.com". La prefikso "xn--" signas ke la nomo enkodiĝas kun Punycode. Ĉi tiu algoritmo minimumigas la grandeco de la ASCII-a nomo.

Do, mi lastatempe registris la domajnan nomon "ĝangalo.com"* ĉe joker.com, ĉar mi scivolis kiel ĉi tiu metodo funkcias. Oni devas registri la nomon kun ASCII (ekzemple, "xn--angalo-o5a.com"). Ili havas ilon por traduki Unikodan nomon al ASCII-a nomo. Ĝi bone funkcias, nun mi havas retpaĝon ĉe ĝangalo.com kiu redirektas al la vera gxangalo.com.

Tamen, ne ĉiu foliumilo subtenas internaciajn domajnajn nomojn. Mi scias ke ĉi tiu funkcias kun Mozilla Firefox, sed nek Internet Explorer nek Opera nek pli aĝa Mozilla funkcias. Ĉu ĝi funkcias kun via foliumo?

* Se la proprulo de gxangalo.com volas, mi transdonos ĝangalo.com al li.
(11 comments | Leave a comment)