Archive for the ‘Computer’ Category

Rails, Django, Pylons, CakePHP, TuboGears, etc. Choosing the Right Web Framework.

Friday, August 17th, 2007

I’m planning on doing a web application that will have to scale immensely. It will require support for complex SQL queries (or using stored procedures in PostgreSQL), and it will have to scale.

The first framework I looked at was using Rails, which is an obvious choice. I’ve done some basic apps in it in the past, and I am very familiar with the Ruby language itself. Ruby is slow. On some benchmarks I’ve seen it 3-5x slower than Python. Also, I don’t like how it’s a struggle to make complex queries that are efficient. If there’s an easy way, it’s not well documented. I’m a little beyond the “Here’s a movie that will show you how we made a blog in 10 minutes using scaffolding.” Gems is pretty nice too, but I think I will pass on rails this time.

I found CakePHP next, but I really would like to stay away from PHP. I’d write my own lightweight framework if I were to use PHP. I’ve also heard that people have had issues with it.

After that, one of my friends suggested Django. From what I can tell, it would be great for an intranet or something like that because it can have many portals and such. It looks like the templating system is pretty solid too from what I’ve read. It hasn’t sold me. I want to try something different from rails that isn’t a full stack.

Next, my friend recommended I try TurboGears. Sure, it flaunts it’s 20 minute wiki, like Rails flaunts writing a blog in 10 minutes or whatnot. I like how it just ties existing components together. This means in the future if I need to write my own custom templating engine, or model to suit a specific need, it won’t be a huge pain to integrate it with the rest of the site. I’m kind of sold for now, but I still need to set it up. I also REALLY like how TG2.0 is converging with Pylons to make it play with WSGI nicely

After looking at TG, I was obviously lead to Pylons after reading about TG2, because it’s a web framework as well, that acts like a glue with certain Python web technologies to make an MVC framework.

While looking for the perfect framework, I keep going around in circles. I obviously can’t write my app in each one and see which one is the best. None of them are really proven. I mean, I keep consider just using PHP and smarty for templating. I really want to stay away from PHP though. I’d like to learn something new.

Also, it seems like all the reviews/comparisons on frameworks I have found are dated at least a year and subjective. I understand it’s not really possible to have an objective comparison on frameworks. At least with languages you can have benchmarks, but with frameworks it’s not always CPU cycles that wins.

I also looked at Zope, but it’s not really practical for what I am trying to accomplish.

Right now I am going to attempt TurboGears (using SQLAlchemy and Genshi). SQLAlchemy looks tempting because I read that you have more control over your queries with the “Data Mapper” paradigm versus the “Active Record” one. Genshi seems sweet too, even the author of Kid (the current default templating engine for TG) endorses Genshi. It seems like there is a lot of momentum behind it.

I will share my experiences with TurboGears or whichever framework I end up going with. Stay tuned!

Also, please feel free to correct me if any of my assumptions are wrong.

Translating Hard Copies of Documents for Lazy and Cheap Folks

Tuesday, July 31st, 2007

Here’s a little background before I get into the tutorial. I’m studying abroad in Germany this fall and was sent a bunch a documents in German to fill out and return. My German is pretty bad, and these are documents I can’t misinterpret else I’d probably mess things up and arrive and not have housing because I mailed something to the wrong address. My friend was in the same boat, so I did some thinking and came up with a solution:

  1. Scan hard copies to tif files
  2. Turn the images into text using OCR software
  3. Translate the German text into English. It didn’t have to be perfect because I know a small ammount of German.

I had some requirements too. Had to run on OS X, and the software I used had to be free. I had access to my roommate’s cheap CanoScan scanner which fortunately had functioning drivers for OS X (albeit a terrible interface).

The first piece of software you need is scanning software. Hopefully you can set that up on your own. Make tiffs of all the documents you want to translate. Black and white, no compression.

Next, we need some OCR software. I tried GOCR first, but I had no luck. Then I came across Tesseract OCR that is mostly released by Google. It’s open source, so I gave it a shot because it said it might work on OS X but didn’t give any promises. Here’s how to install it on OS X. It’s super easy.

Download the source from here. I used 2.0. Extract it, and fire up a terminal.

Navigate to the source directory
cd Desktop/tesseract-2.00

Configure it and make it
./configure
make
sudo make install

Hopefully that went smoothly. Now you’ll need language support for the language you’re scanning. Download the language tar.gz langauge pack from the download page, and extract it. You will want to copy the contents of the tessdata fold that extracted to /usr/local/share/tessdata.

Run /usr/local/bin/tesseract (or tesseract if /usr/local/bin is in your path) in the terminal just to see if it installed properly. It should spit out some usage info.

Let’s try running it on a file now. Since I used german my lang keyword is deu (so change this to your appropriate language). My files are also sequentially labeled OCRXX.tif. So run
/usr/local/bin/tesseract OCR01 1 -l deu
And if that succeeds, it will output to 1.txt, so you can make sure everything is okay by running
cat 1.txt

OS X outputs these files not very correctly if you open them in TextEdit.app so cutting and pasting them isn’t advisable unless you have a better editor that supports UTF-8 (which is what I think the output files are in). What I did since I have some webspace is I uploaded all the text files to my server and then passed the url to the .txt file to google language tools and it displayed perfectly in my browser, translated and all.

Tesseract is good stuff, especially for the price (free).

Even lazier? Here’s some bash code for you! Just save all your tifs in a single folder. I bolded the parts that you should change to suit your needs (changing the language).

for f in *.[tT][iI][Ff]*; do
out=`echo $f | sed ’s/\\.[tT][iI][fF].*//’`
/usr/local/bin/tesseract $f $out -l deu
done

Now after running that you should have a bunch of text files. If you’re even lazier and don’t want to upload each one and send them through Google language tools you can use cat or something and just sent one of them.

I’ve had some issues with multiple columns in documents. You can just use a simple image editor, even Preview.app to crop and rearrange columns into a linear format to help the OCR software along.

I think that’s all. The same methodology should work with other operating systems as well. You might need some MacPorts utilities installed. I didn’t test without them. Sorry!

Greasemonkey and Craig’s List

Sunday, February 11th, 2007

I’ve been searching for an apartment on Craig’s List lately near where I am probably getting an internship.

A couple things I notices is that the google maps link on the pages often were bad because they used “at” instead of “&” and google maps likes the ampersand more it seems. Also, I always changed it to a direction search to a specific address. Inefficient, right?

I was bored this morning and I realized that this was a job for Greasemonkey. I’m not super good at javascript (because I never have to use it), but I hacked together a script that would turn the links to the google maps into a link to a link to directions to the location you defined in the script.

It’s not super robust, but it does the job. Here’s what you need to run it.

  • Firefox
  • Greasemonkey add-on for firefox
  • My script
  • You’ll have to edit one line in the script where it says “Destination Here” and replace with a location (you pretty much have to search for the location once with google maps and paste the url)
    example:1+lomb+memorial+drive,+rochester,+ny

No warranty on this script at all. For all I know, it will eat your computer and your soul.

x86

Tuesday, January 23rd, 2007

Heh, so this quarter is killing me… in every aspect except for my grades (yay for trying and getting A’s for once).

Yesterday I started my second project for SysProg and it’s pretty much text manipulation in assembly.  Fun?  Hell yes… if you’re a masochist.  Basically I sat down for 5 hours straight in the DSL lab and cranked out 300 lines of x86 assembly.  That includes the learning curve of a new language and figuring out how to make system calls properly and getting file I/O to work and determining how I am storing my data and such.

I don’t really like the keyboards in the lab, or the way the computers are configured (old version of vim, zsh, and such) and this is a project that you’re more or less forced to work on in the lab.  It’s not all that bad, at least there are speakers for listening to tunes.
Yeah, I am only 1/3 done with this project.  It’s so worth it though.  I understand C and the x86 so much more now.   Also, GDB rocks my socks.

A Test for Adobe Photoshop Lightroom Beta 4

Wednesday, October 11th, 2006

Well, I had hard drive issues a while ago and I had to recover all the data off of it, and I got a bunch of randomly named files (with the proper extension at least).  Forty gigs of those files are 6 megapixel NEF files (Nikon’s RAW format).  Twenty out of the forty gigabytes of RAWs are from the classes I took this summer which I need to finish filtering out and editing by the end of this quarter.

For a while I have fiddled around a bit with Adobe Lightroom Beta 3 and liked the results and the flashy graphics.  It’s simple to use and does a pretty good job for what I used it for.

The other day I put Adobe Photoshop Lightroom Beta 4 (I think they are going to push the Photoshop part to entice people).   I imported the 40gb of pictures into Lightroom and let it move them to where it wanted and sort them by date (I figured this way at least I would be able to get a general idea what I was looking at by date).  I assumed that it would take a while so I wasn’t too concerned about performance so I let it run over night.

Today I finally decided I wanted to start editing some things, or at least sorting the stuff out so I could have stuff to show tomorrow.

I am completely disappointed.

Let me start off by saying this does not reflect how the final product will behave because I am using beta software.

Heres’s a brief description of the box I am running it on:
Athlon M 2500+ (running at 2200 mhz)
1GB of RAM
Vista RC2*

I load up Lightroom, I click on a “shoot” that has about 300 RAWs in it, and my system locks up for a good five minutes.  When I finally get my task manager open I can see it is sucking up about 500 or 600mb of memory.  Finally, I can use the system again.  Everything seems to be going smoothly.  I check my Task Manager.  Lightroom is only using 170mb of RAM.  Lookin’ good.  I scroll through some images.  Perhaps that was just a glitch or it just needed to cache some things…

Nope.  It locks up again.  This time I have task manager open and I see the memory jump back up to it’s peak before.  Obviously, Lightroom is eating up all my memory and caching all my images into swap space.  Efficient?  Heck no.  It takes longer for it to read the data cached off of swap space: A. because it is not in it’s compressed form. B. even though caching the manipulation that lightroom aplys to the RAW file may be fast when the cache is in memory, it’s probably faster to recalculate it for the freshly loaded RAW file.

It went through a few cycles like that.  By this point I had only been running it for about 30 or so minutes and it racked up a good 6 million page faults.  Why can’t Adobe manage it’s memory responsibly.  Relying on the OS for this when dealing with data of this magnitude is silly.  Do you think SQL servers that have gigabytes of data store it all in memory and rely on the OS to manage it? I hope (and know) not.  It makes me cringe to think of how this software would run on a default MacBook configuration with a mere 512mb of memory.  I hope they’re not just targeting professionals with large budgets that can afford fancy Mac Pros (or in my case, more RAM) and such.  Most students don’t get much of a choice when it comes to computers, if a choice at all… and if they even do use Adobe Lightroom, are they going to really want to use that piece of software that they perceive to be slow when they get out into the photography industry?

Conclusion:

Lightroom is a great product, with great features, a great interface and everything like that.  If only it worked well for me on large scales like what I am doing I would fall for it.  Unfortunately, this is not the case.  I think I’ll let it stay on my hard drive and use it for correcting pictures from parties and such.  One gigabyte of RAM should be sufficient enough and not cause this much of an issue.

*I’d like to address that due to the lack of my knowledge Lightroom works that there is very slim chance that it could be Vista’s fault for all this, but Adobe should have taken this into consideration, and I highly doubt this to be the case.  Had I more time, I would put XP on another hard drive and do the same tests.

GLens

Sunday, August 27th, 2006

I coded my project a bit more. I decided to name it GLens for now. It’s a lens cad program licensed under the GPL, so why not? I wrote a class for sperical lenses today that just draws them. You input R1, R2, d, and the clipping height, and the centerpoint for each one. I am trying to decide between using real math, or just approximated linearized stuff to find the refraction angles and intersections. Since these are sperical lenses, real calculations wouldn’t be too hard. Kinda wish I payed a bit more attention in calc.

Here’s a screenshot:

Pseudo Ray Tracer

Saturday, August 26th, 2006

Well, I have for a while been curious about modeling how light is effected by lenses with a somewhat realistic computer simulation. My curiosity about chromatic aberrations is what started this nonsense, but I don’t think I will implement code for different wavelengths for a while. It won’t actually create images, but it will trace rays.

It will be in just two dimensions (for now) because lenses can be modeled in 2D. Calculations in 2D are also much cheaper.

I plan on making this a dynamic program where you can move the focal plane, objects, and the lenses around. Eventually, I will set it up so lenses can be moved in an animation or something to demonstrate and experiment with zoom lenses.

Sperical lenses will be the first things that I implement, but I definitely plan on being able to have aspherical lenses as well, represented with NURBS eventually. I really should brush up on my calculus so I can find how rays intersect with these (Newton’s method)… either that, or I could just turn them into line segments and solve a bunch of equations. Either way, I need to learn more math.

I am thinking OpenGL and GLUT for now…. C++, not C because I am an OO whore. I really want to have menus and drag & dropping too, ideally blenderlike, but probably much simpler.

Edit: Wrote a little code and my repository is at:
http://ohpie.com/svn/lens/trunk/

Rails Woes

Wednesday, August 23rd, 2006

Getting Rails to work was such a bitch. Maybe I am just stupid or my server is a bit nutty, but it took me like two evenings just to get damn scaffolding to work.

I also realized I had used MEDIUMINT for my user ID’s in my database for the AOL searches, and the user id went beyond the MEDIUMINT limit…. so I am rebuilding the database as I sleep tonight.

On the good note, I found a nice little article on making screen have a tablike status bar on the bottom. It works pretty nicely. Check it out here I enjoy it thoroughly. Couldn’t live without my screen and my zshell *sigh*.

I’ve also been using sed a lot more. I had a lot of arduous tasks to do at work today and sed helped out. Maybe didn’t hurry things up very much, but it made it more fun for sure. Regexes are your friend. And so is REXML when you have to change a lot of Visual Studio vcproj files. I mean, sure, using ruby to alter a bunch of Visual Studio projects sounds like a terrible idea, but it works great. Project files are XML, so you can just read em right in to a document. The only issue I had is that we use post-build steps with newlines characters… VS.NET turns it into a type thing in one of the attributes. REXML reads it in as &amp#x0d…which is annoying. Yay for the function String.gsub!() I just put them back in their place like a good little attribute. Writing these scripts is turning projects that would take days of cut & paste work and carpol-tunnel syndrome into projects that take hours.

Writing a script to do something is always more fun than doing the actual something…..

AOL Search Logs

Tuesday, August 22nd, 2006

I was curious how many numbers were in there that were in the form of a phone number (###-###-####)

mysql> select COUNT(*) from srec where Query
> regexp "[0-9]{3}-[0-9]{3}-[0-9]{4}";
+----------+
| COUNT(*) |
+----------+
|    4626  |
+----------+

Well, that’s certainly interesting

OMGentoo

Tuesday, August 8th, 2006

Well, last week I finally got around to plugging my desktop in after two months and moving.  I had gentoo on it and a lot of packages were out of date and broken and what not.  It’s understandable because the installation is probably a year and a half old and I was running ~x86 (the testing packages).  I also wanted to set my drives to have normal partitioning.  I’ve been running a software (md) raid 0 with 3 80gig drives for 2 years now and I haven’t had any problems, but I decided it’s best not to push my luck.

The plan was to install FreeBSD. Heh.  I got it almost working.  I got frustrated with the package system and I figure I will just save FreeBSD for a rainy day when I feel like upgrading my server (which is going on 331 days without a reboot).  Yeah, I fail at that.  I really like portage though…. a lot.  I can pretty much do a stage 1 gentoo install without docs.

Why am I going through all this effort to get a Unix/Linux machine up and running again?  Well, I have an idea for another project.  It involves a spotter plugin for the GIMP (somewhat like spot heal brush in Photoshop) but better?  Probably not better, but I have plans on making a special thing for making sure you inspect an entire image.  I’ll be damned if I have to develop GNU software on a Windows laptop with Cygwin!  Especially on a single monitor.