April 2009 Archives

Python code to find duplicate texts on google

| | No TrackBacks

I came across an accusation of plagiarism on the web today and thought it would be interesting to code up some python that finds candidate texts that are plagiarisms of the reference text. The obvious idea: query google for matches.

So the first thing was to find the unusual words that would form the google signature. I tracked down a list of English word frequencies and saved it to disk. Then I wrote code to load the reference text into memory and to count occurrences of each word. Then the top-scoring words are the ones that occur much more frequently than expected in the word frequency list.

Downloads: plagiarism.pyword frequency file

Shim Sham for Frankie's birthday

| | No TrackBacks

Lori and I performed in a shim sham at Grand Central in March. Yehoodi, a local dance group, got together about 50-100 people on a Saturday afternoon to perform for Frankie Manning's 95th birthday. We assembled in the main hall, but the large presence of so many related people dressed in vintage and the slowness with which we worked allowed the train station security to shut us down. They directed us to a cramped and out of the way location where we were permitted to dance.

You can see the performance on Youtube. Lori and I are way at the back -- I'm on the left and Lori's on the right. Or at least, that's what I remember. We're not particularly visible.

Later, we went over to TKTS in Time Square and did a real performance on the steps. At least, Lori did -- I elected to watch from the side. (The first half of the performance has not been posted -- not sure why.)

If you have an ambition to learn the shim sham, I think that Mandy Gould of Toronto has the best instructional video on Youtube.

Django

| | No TrackBacks

Recently, I've gone crazy for Django. It's a fabulous platform for developing web applications (primarily content management systems) really fast. I love it!

How to use Django with Apache and mod_python

Serving media files

Django cheatsheets

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.32-en

About this Archive

This page is an archive of entries from April 2009 listed from newest to oldest.

March 2009 is the previous archive.

May 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.