Trac and Subversion

August 4th, 2007

I finally got around to setting up trac on my server. While I was at it I upgraded to Python 2.5 and got mod_wsgi to work with trac.

Moved my subversion base to svn.lolrus.org and my trac base to trac.lolrus.org.

Anyways, I’ve been working on a little project lately which is a vectorized implementation of Conway’s Game of Life.

You can check it out here

Here’s a little background before I get into the tutorial. I’m studying abroad in Germany this fall and was sent a bunch a documents in German to fill out and return. My German is pretty bad, and these are documents I can’t misinterpret else I’d probably mess things up and arrive and not have housing because I mailed something to the wrong address. My friend was in the same boat, so I did some thinking and came up with a solution:

  1. Scan hard copies to tif files
  2. Turn the images into text using OCR software
  3. Translate the German text into English. It didn’t have to be perfect because I know a small ammount of German.

I had some requirements too. Had to run on OS X, and the software I used had to be free. I had access to my roommate’s cheap CanoScan scanner which fortunately had functioning drivers for OS X (albeit a terrible interface).

The first piece of software you need is scanning software. Hopefully you can set that up on your own. Make tiffs of all the documents you want to translate. Black and white, no compression.

Next, we need some OCR software. I tried GOCR first, but I had no luck. Then I came across Tesseract OCR that is mostly released by Google. It’s open source, so I gave it a shot because it said it might work on OS X but didn’t give any promises. Here’s how to install it on OS X. It’s super easy.

Download the source from here. I used 2.0. Extract it, and fire up a terminal.

Navigate to the source directory
cd Desktop/tesseract-2.00

Configure it and make it
./configure
make
sudo make install

Hopefully that went smoothly. Now you’ll need language support for the language you’re scanning. Download the language tar.gz langauge pack from the download page, and extract it. You will want to copy the contents of the tessdata fold that extracted to /usr/local/share/tessdata.

Run /usr/local/bin/tesseract (or tesseract if /usr/local/bin is in your path) in the terminal just to see if it installed properly. It should spit out some usage info.

Let’s try running it on a file now. Since I used german my lang keyword is deu (so change this to your appropriate language). My files are also sequentially labeled OCRXX.tif. So run
/usr/local/bin/tesseract OCR01 1 -l deu
And if that succeeds, it will output to 1.txt, so you can make sure everything is okay by running
cat 1.txt

OS X outputs these files not very correctly if you open them in TextEdit.app so cutting and pasting them isn’t advisable unless you have a better editor that supports UTF-8 (which is what I think the output files are in). What I did since I have some webspace is I uploaded all the text files to my server and then passed the url to the .txt file to google language tools and it displayed perfectly in my browser, translated and all.

Tesseract is good stuff, especially for the price (free).

Even lazier? Here’s some bash code for you! Just save all your tifs in a single folder. I bolded the parts that you should change to suit your needs (changing the language).

for f in *.[tT][iI][Ff]*; do
out=`echo $f | sed ’s/\\.[tT][iI][fF].*//’`
/usr/local/bin/tesseract $f $out -l deu
done

Now after running that you should have a bunch of text files. If you’re even lazier and don’t want to upload each one and send them through Google language tools you can use cat or something and just sent one of them.

I’ve had some issues with multiple columns in documents. You can just use a simple image editor, even Preview.app to crop and rearrange columns into a linear format to help the OCR software along.

I think that’s all. The same methodology should work with other operating systems as well. You might need some MacPorts utilities installed. I didn’t test without them. Sorry!

So… today started pretty good. Accomplished some great stuff at work. I got some stuff threaded in a half a day that my boss thought would take weeks. It scales alright too :). n-threads. word.

That’s about all the good in my day. The macbook pro I use for work’s power connecter (known as a magsafe connector) kinda stopped working and then melted and sparked a bit. They were nice enough at the Apple store to replace it even though the MBP was out of warranty (I don’t think they really want to put up a fuss about something that could have started a fire)… and that was sweet because I don’t have to file an expense report and it’s a good story.

It gets worse. After I left the Apple Store I come back to my car and guess what? My window is smashed in and my iPod and GPS is gone. This is a really nice mall too. Like 5:00 pm. How random is that? It sucks a lot. Especially because I have glass all inside my car and no drivers side window…. and no iPod. Luckily I have a few CD’s I can throw in my car.

Yeah that sucks. I got my new Clap Your Hands and Say Yeah album on vinyl today. I’m a little disappointed the first song, Some Loud Thunder, still sounds like not so good. Half of the album is solid. There’s some songs that are just mediocre too. Am I disappointed? A little bit. The album was hella expensive because it had to be imported.

At least tomorrow is Friday.

test

February 25th, 2007

test

California

February 15th, 2007

Eh, finally something good happened to me as far as getting a co-op is concerned.  I got an offer from Intel today which means I will be moving to California in 3 weeks after finals (driving there).  I will be there for 6 months.

Yup.  No more snow for me this year!  No more -15 degree weather.  No more potholes.  No more 8:00 am classes.

If things work out right I will only have to spend one more quarter at RIT.  And by things working out right I mean going straight to Germany to study abroad for 5 months after my Intel internship.  I won’t be back for a year!  Weeeeee.  Rochester is cool and all in it’s own way, but it’s just not for me.  I need a change of scenery.

So what am I going to be doing at Intel?  It’s a bit vague now, but I do know a few things.  I will be working on a team that optimizes software for Apple.  They look at everything from the high level algorithms to the low level assembly.  I will be working on Mac’s.  And apparently, the second day I am there they are flying me to a conference in Phoenix, AZ, which I wasn’t told much about.

Believe it or not, I don’t have much to complain about, except for maybe a lonely Valentine’s day, but not everything always goes your way.

Greasemonkey and Craig’s List

February 11th, 2007

I’ve been searching for an apartment on Craig’s List lately near where I am probably getting an internship.

A couple things I notices is that the google maps link on the pages often were bad because they used “at” instead of “&” and google maps likes the ampersand more it seems. Also, I always changed it to a direction search to a specific address. Inefficient, right?

I was bored this morning and I realized that this was a job for Greasemonkey. I’m not super good at javascript (because I never have to use it), but I hacked together a script that would turn the links to the google maps into a link to a link to directions to the location you defined in the script.

It’s not super robust, but it does the job. Here’s what you need to run it.

  • Firefox
  • Greasemonkey add-on for firefox
  • My script
  • You’ll have to edit one line in the script where it says “Destination Here” and replace with a location (you pretty much have to search for the location once with google maps and paste the url)
    example:1+lomb+memorial+drive,+rochester,+ny

No warranty on this script at all. For all I know, it will eat your computer and your soul.

x86

January 23rd, 2007

Heh, so this quarter is killing me… in every aspect except for my grades (yay for trying and getting A’s for once).

Yesterday I started my second project for SysProg and it’s pretty much text manipulation in assembly.  Fun?  Hell yes… if you’re a masochist.  Basically I sat down for 5 hours straight in the DSL lab and cranked out 300 lines of x86 assembly.  That includes the learning curve of a new language and figuring out how to make system calls properly and getting file I/O to work and determining how I am storing my data and such.

I don’t really like the keyboards in the lab, or the way the computers are configured (old version of vim, zsh, and such) and this is a project that you’re more or less forced to work on in the lab.  It’s not all that bad, at least there are speakers for listening to tunes.
Yeah, I am only 1/3 done with this project.  It’s so worth it though.  I understand C and the x86 so much more now.   Also, GDB rocks my socks.

I need to learn to count

January 11th, 2007

So, I was visiting my friend in NYC over break and I decided to bring my 4×5.  So I thought I had shot 8 sheets of color print film.  Then when I emptied out the holders I counted 7.  When I got my negs back from the lab they told me that I only had 6.

I had some interesting results too.  One sheet of film was exposed twice.  It looks pretty cool, but the white balance in the two scenes is completely different, but that might be a good thing?  Also, one of the sheets I had processed was some Velvia 100 that I guess I had in one of the holders.  It got cross-processed.  The exposure seems pretty good, but I have a feeling it will take a very long time trying to get the white balance correct.  It’s somewhat of an abstract scene with no people, so it’s not a huge deal whether or not I get it perfect.  If not, I will just scan it when I get time.  I’m curious.  I never cross-processed before.
I also need to make sure I tighten my movements on my 4×5.  The rear tilt always seems to end up moving when I use my 90mm lens and I realize it after the fact.  It adds an interesting aesthetic by making the bottom of the exposure a little softer, but it’s not like I am going for it.

Downtime

January 7th, 2007

Well, skank was acting a bit funny Friday evening so I restarted it.  It didn’t start back up.  Also, I was on vacation and nobody was in the dorms and had physical access to the server room.  So that’s why there was some downtime.  Fortunately it started up right away when I finally got around to resetting it manually today.  All is well :)

I am updating gallery right now.  It’s probably been 3 or 6 months since I’ve updated it last.  I just hope it doesn’t break because I run SVN versions of it.

Grinding Aspheric Lenses

December 15th, 2006

For those of you that don’t know, I’ve been employed part-time at an optics company (well, they do more than just make lenses.  They actually make machines for making lenses as well as sell software.  That’s where I come in.  I’ve been working on software for the manufacturing of Aspheric lenses.

The Aspheric lenses I am referring to are not the kind you get in your Nikon kit lens, or even the large format Rodenstock or Schneider lenses, but the kind that go into aircraft, or the ones that are several inches in diameter that are used in some of the best “amateur” telescopes that cost tens of thousands of dollars.

Anyways, it’s a really cool process how they’re cut and polished.  Each lens taking lots of time to rough, polish, and finish the edges.

I have my camera so I will take pictures of the process and post them.