hair

2013 in review

2013 is the year when everything changed.

The biggest event was the birth of our daughter Lily. She was born prematurely in Shanghai while we were on a short stopover on our way home from Europe in July. Due to the generosity of thousands of people in New Zealand and around the world, we were able to fly her home by private jet in September. The logistics of carrying such precious cargo by air is insanely complicated. We thank everybody who played a part, big or small, in her safe trip home.

Being a dad for the first time has revealed challenges and emotions that I was not previously aware of. But no matter what temporary difficulties I might have, it's all worth it when I get a spontaneous smile from my little girl.

Other things happened too, I guess.

  • Amy and I visited France, Italy, and Switzerland. I haven’t yet done anything with the photos we took.
  • Both my parents and Amy's mom visited shortly Lily came home from hospital. Everybody enjoyed meeting Lily and exploring the Auckland area.

New software this year:

hair

2012 in review

2012 has been fairly quiet. Maybe it just seems that way because I haven't actually written anything new in this blog since last year's annual summary.

  • Spent an entire year working from home (telecommuting). I miss the social aspect of the workplace, but the zero commute sure is nice.
  • Since I'm no longer cycling to work, I started running regularly. Logged 1052 km last year. I even ran 20 km (half marathon) before breakfast one day.
  • We tried to buy a house in Auckland, but were immediately priced out of the market at auction. Now renting a nice flat in a great location.
  • Quote of the year: "You don't stop learning because you get old; you get old because you stop learning."

New software this year:

  • NZSL Dictionary is a dictionary for New Zealand Sign Language.
  • Auckland Trains is a train timetable app for Auckland, New Zealand.
  • Free 15C is an HP-15C calculator simulator, which is now available for iOS.
hair

2011 in review

2011 was the year that was pretty much defined by the earthquakes in Christchurch. Everything else seems insignificant, but it turns out I did other things too.

  • Christchurch was devastated by the earthquakes of 22 February. Unlike the big one last year, this one was almost directly under the city and destroyed lives, property, and livelihoods. Again we were lucky and did not sustain any major damage.
  • My parents visited in September. I took a couple of weeks off work and we did a big tour of the South Island and they had a great time.
  • Received a Microsoft MVP for C++ award for a second time. This is based on my continuing participation answering programming questions on Stack Overflow.
  • Amy and I officially became New Zealand citizens on 7 November.
  • Passed my NCEA Level 1 Chinese assessment. This involved a one minute spoken presentation where I talked about our trip with my parents.
  • In December, moved to Auckland. Amy will be attending school at AUT for a three-year programme for sign language interpreting for the Deaf. I'll be working remotely from home, which is new for me. We plan to return to Christchurch after Amy is done.

And, in the software section:

  • Published the Stack Overflow ebooks, which is a library of the top rated questions and answers on Stack Overflow for the most popular tags.
  • Contributed a number of improvements to Cppcheck, a static analyser for C and C++.
  • Started on a Javascript implementation of a Java VM. Others have done this too, I just wanted to try.
  • Added the NOAA Weather Radio tools to Github (I had written that code years ago, but had never released the source).
  • picomath is a collection of standalone mathematical functions in many different programming languages.
hair

minilink performance improvement

My minilink/lnk.nu link shortener has been getting increasingly popular lately. There was a sharp jump up in activity in mid-October, which has now put the request rate at around 20k requests per day. There have been spikes up to 40k per day, too.

This increase in activity has taken a toll on my server, since minilink had been implemented as a simple Python CGI program. This was the original implementation I started with and I haven't needed to change it. However, CGI is only appropriate for low performance applications, and minilink is probably edging into medium performance at this point. A CGI program has to start up an entire Python runtime environment for every request.

Over the last two days, there have been big enough activity spikes that the load on my server goes through the roof and it can't really handle the requests anymore. This can cause things to crash and run out of memory and other impolite behaviours. This happened once last March, and twice in the last two days.

Today I did something I should have done long ago: I converted minilink from a CGI program to a WSGI program. WSGI improves performance because it keeps the Python runtime loaded in memory, and with Apache's mod_wsgi, the calls into Python code are much faster.

I also implemented a simple test suite so that I could refactor the implementation while ensuring that the behaviour stayed the same. That's something I should have done long ago, too.

You may see an increase in performance when using minilink starting today. If you see anything go wrong though, please let me know.
hair

stack overflow questions in ebook format

In my quest to create useful reference ebooks for the Kindle, I've created ebooks of the top rated programming questions on Stack Overflow for the top 20 tags.

Stack Overflow ebooks

These files are created from the monthly Stack Overflow Creative Commons data dump. I've got a combination of Python, Java, and XSLT scripts that process the raw XML database dumps into something usable. Then the Amazon kindlegen program creates the ebook file (in Mobipocket format).

Last year I got a new computer with a fast CPU and lots of memory and disk space. While working on the XML processor, I realised that I was doing a lot of work seeking around a big XML file (it's over 4 GB) and collecting questions and answers together. This was taking quite some time because of the sheer size of the files. Since I am running a 64-bit OS (FreeBSD 8 amd64), I memory mapped the entire 4 GB XML file into memory and then didn't have to think about seeking anymore. Letting the OS manage the caching is a much better approach, and the improved performance really shows.

The preprocessing step (that needs to run once per data dump) creates all the HTML files for each question and its set of answers. I was originally storing all the files in one directory, but a million files in a single directory wasn't working very well. I ended up splitting the question number into groups of three digits, so 1234567.html is actually stored in 123/456/7.html. This step takes about two hours to run.

Creating each ebook file is then a single XSLT transformation (taking about a minute), plus the kindlegen step which can take several minutes depending on the number of questions. The performance of kindlegen isn't very impressive and appears to be O(n2) in the number of pages.

The source for all this is available on Github.
hair

2010 in review

2010 was a pretty interesting year.

hair

chinese-english dictionary for kindle

Further to my previous entry on Chinese text in the Kindle, I've put some more work into the Chinese-English dictionary I mentioned.

I used the CC-CEDICT dictionary as the basis for my dictionary. I also used the radical stroke count database from the Unihan Database. The result is a traditional radical-stroke order dictionary that seems reasonably usable on the Kindle.

One of the great features of the Kindle is the ability to set a default dictionary so that when you're reading text, you can simply point to a word and the Kindle will automatically look it up and show the definition in a small popup window. I've been able to create such dictionary files successfully for English, but have not been successful for Chinese. I currently suspect that the Kindle simply doesn't recognise the Chinese characters as something it can look up in a dictionary, so it doesn't even try. (If anybody has any ideas on this, feedback would be most welcome.)

The current result can be found here: Chinese-English Dictionary.
hair

rss2mobi: google reader to mobipocket creator

I've been using my Kindle to read RSS feeds through Google Reader. Using the Kindle's built-in web browser, Google Reader works surprisingly well (as long as you enable keyboard shortcut commands). However, there are some annoyances:

  • The text is pretty small due to the layout of the Reader web page. It's possible to zoom in, but then you need to scroll left and right to read each line of text (which is awkward). The text is certainly not as readable as regular books on the Kindle.
  • Clicking on a link in Google Reader wants to open a new window, and the Kindle browser simply pops up a message box stating that multiple windows are not supported, with no option to continue.
  • You must be connected to the Internet to use Google Reader, and can't browse it offline.

In order to address these issues, I've created a program that creates proper e-books in Mobipocket format using kindlegen. I had to solve several problems to make this work:

  • I wanted to use the RSS feeds I've already configured in Google Reader. The Google Reader API is undocumented, so I had to rely on (out of date) descriptions of the API (eg. here and here).
  • Downloading just the text of feeds wasn't quite enough, so I had to download all the images and then rewrite the <img> tags in the HTML to refer to the included image files.
  • In order to tell the kindle each entry is a "chapter" and therefore allow navigation by skipping to the next entry, I had to create a table of contents in .ncx format.
  • Originally kindlegen refused to recognise my HTML files as UTF-8 encoded, and built Mobipocket files with the wrong encoding. Eventually I figured out through trial and error that the HTML file names must end in .html (I was using no extension).

Currently this rss2mobi program downloads all the new entries from Google Reader and packs them up into a file that I can manually copy to my Kindle. Next to do is:

  • Mark the downloaded entries as "read" in Google Reader so they aren't downloaded next time.
  • Automatically transfer the file to the Kindle. Amazon provides a service where I can email the document to a special email address, and then it will be automatically sent to the Kindle wirelessly.

Once all this is in place, I can set up a daily task that gets the new content from Google Reader, and transfers it to my Kindle without any manual interaction.

You can find the code for rss2mobi on Github.