Recently in programming Category

Stupefying Django error

| | No TrackBacks

Wow. I spent a lot of time on this this afternoon. I had a Django application with an admin interface. Every time I tried to create an object with no foreign keys, I got this error on the save:

Traceback (most recent call last):

  File "C:\Python26\lib\site-packages\django\core\servers\basehttp.py", line 279, in run
    self.result = application(self.environ, self.start_response)

  File "C:\Python26\lib\site-packages\django\core\servers\basehttp.py", line 651, in __call__
    return self.application(environ, start_response)

  File "C:\Python26\lib\site-packages\django\core\handlers\wsgi.py", line 241, in __call__
    response = self.get_response(request)

  File "C:\Python26\lib\site-packages\django\core\handlers\base.py", line 134, in get_response
    return self.handle_uncaught_exception(request, resolver, exc_info)

  File "C:\Python26\lib\site-packages\django\core\handlers\base.py", line 154, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)

  File "C:\Python26\lib\site-packages\django\views\debug.py", line 40, in technical_500_response
    html = reporter.get_traceback_html()

  File "C:\Python26\lib\site-packages\django\views\debug.py", line 86, in get_traceback_html
    frames = self.get_traceback_frames()

  File "C:\Python26\lib\site-packages\django\views\debug.py", line 205, in get_traceback_frames
    pre_context_lineno, pre_context, context_line, post_context = self._get_lines_from_file(filename, lineno, 7, loader, module_name)

  File "C:\Python26\lib\site-packages\django\views\debug.py", line 186, in _get_lines_from_file
    context_line = source[lineno].strip('\n')

IndexError: list index out of range

Totally mysterious: there is not a line of my code in the trace stack and I have no idea what file it is trying to index into. I really made no progress on this until I tried the shell:

>>> from django_league.league.models import *
>>> Sport.objects.create(sport_name='Hockey')
__unicode__
Traceback (most recent call last):
  File "lgt;console>", line 1, in lgt;module>
  File "C:\Python26\lib\site-packages\django\db\models\base.py", line 328, in __
repr__
    u = unicode(self)
  File "C:\Users\hughdbrown\Documents\django\django_league\..\django_league\leag
ue\models.py", line 17, in __unicode__
    return u"%s" % (sport_name, )
NameError: global name 'sport_name' is not defined

Now we're getting somewhere, I thought. I have a function of mine in scope. The problem was that I had defined my __unicode__ method without using self in the code:

class Sport(models.Model):    
    sport_name = models.CharField(max_length=20)
    def __unicode__(self):
        return u"%s" % (self.sport_name, )

    @models.permalink
    def get_absolute_url(self):
        return ('sport', None, {'object_id' : self.id})

    class Meta:
        ordering = ['sport_name']

So fixing it was pretty easy once I knew that.

MoreLinq

| | No TrackBacks

Recently, I wanted a way to iterate over two sequences in LINQ the same way you do in functional languages like python. It's most commonly called zip() and it works like this:

seq = [fn(a1, b1) for a1, b1 in zip(a, b)]

This applies a function fn() to each pair of variables pulled from a and b together. There is no direct way to do this in LINQ 3.5, but I came across MoreLinq which has an implementation of zip() and many other useful functions.

VB.NET code to implement Excel Rank()

| | No TrackBacks

Recently, I had to implement the Excel function Rank() for a project. I came up with this templatized LINQ code:

Public Class ReversedComparer(Of T As IComparable)
    Implements IComparer(Of T)
    Public Function Compare(ByVal x As T, ByVal y As T) As Integer _
    Implements System.Collections.Generic.IComparer(Of T).Compare
       Return -x.CompareTo(y)
    End Function
End Class

Private Shared Function BinarySearch(Of T As IComparable)(ByRef x() As T, _
                                     ByVal val As T, _
                                     ByVal comparer As IComparer(Of T)) As Integer
    Dim lo As Integer = 0, hi As Integer = x.Count - 1
    While lo <= hi
        Dim mid As Integer = (hi + lo) \ 2
        If comparer.Compare(val, x(mid)) <> 1 Then
            hi = mid - 1
        Else
            lo = mid + 1
        End If
    End While
    Return CInt(If(comparer.Compare(x(lo), val) = 0, lo, -1))
End Function

Public Shared Function Rank(ByRef X() As Double) As Integer()
    Dim sorted() As Double = (From xx In X Order By xx Descending Select xx).ToArray()
     Dim comparer As IComparer(Of Double) = New ReversedComparer(Of Double)
     Return X.Select(Function(val) 1 + BinarySearch(Of Double)(sorted, val, comparer)).ToArray()
End Function

And it's pretty good as far as it goes. It takes a copy of the data in reverse-sorted order and then does a binary search to find the 0-based position in the array for each element in the original array. And that's fine, but I wanted to implement something more like my python code:

import collections
def rank(arr):
	"""
	>>> a = list(range(10))
	>>> print rank(a)
	[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
	>>> b = list(range(10))
	>>> b.reverse()
	>>> print rank(b)
	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
	>>> c = [5] * 10
	>>> print rank(c)
	[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
	>>> d = ([5] * 5) + ([1] * 5)
	>>> d
	[5, 5, 5, 5, 5, 1, 1, 1, 1, 1]
	>>> print rank(d)
	[1, 1, 1, 1, 1, 6, 6, 6, 6, 6]
	"""
	d = collections.defaultdict(list)
	for i, v in enumerate(arr):
		d[v].append(i)
	result = [0] * len(arr)
	i = 1
	for k in sorted(d, reverse = True):
		for j in d[k]:
			result[j] = i
		i += len(d[k])
	return result

if __name__ == '__main__':
	import doctest
	doctest.testmod()
 

So I worked through the LINQ issues using the LinqPad interpreter and came up with this:

Function Rank(Of T)(Byref x() as T) as Integer()
    Dim original_pos = x.Select(Function(xx, index) New With {.Val = xx, .Index = index}) _
             .ToLookup(Function(xxx) xxx.Val)
    Dim keys = original_pos.OrderByDescending(Function(yy) yy.Key)    
    Dim result(x.Count-1) as Integer
    Dim i as Integer = 1
    For Each item in keys
        For Each v in item
            result(v.Index) = i
        Next
        i = i + item.Count
    Next
    Return result
End Function

And I think that's pretty good code: templatized LINQ code that uses lambda functions and implements the algorithm as quickly as I know how.

The return of CoWhereIs

| | No TrackBacks

A long time ago, I wrote CoWhereIs in C/C++. It was a small program to search the registry for where a COM server was to be found. Back in the mists of time, this was something you needed to do from time to time if you were a COM developer, as I was. And somewhere along the way, I stopped needing to do that and I lost the source code. But it was no big problem because I was no longer working with COM.

And now I find that I have some .NET code that implements user defined functions (UDFs) in Excel 2007 and it turns out that the integration between Excel and .NET is stuck in 2005: the code has to be exposed as a COM server that is installed with regasm on your machine. So now I need an installer and I need to test the installer to see if it is, in fact, calling regasm to patch the registry so that my .NET/COM server is actually available.

And so I found myself rewriting CoWhereIs to search the registry. This time, I've written it in python because the times have changed. Enjoy.

from __future__ import with_statement
from _winreg import *
import os.path

class MissingProgIDException(Exception):
    pass
class MissingCLSIDException(Exception):
    pass

def getCLSID(progID):
    key = r'%s\CLSID' % progID
    try:
        with OpenKey(HKEY_CLASSES_ROOT, key) as h:
            return QueryValue(h, "")
    except WindowsError, err:
        print r"Missing progID: HKEY_CLASSES_ROOT\%s" % key
        raise MissingProgIDException()

def CoWhereIs(progID):
    clsid = getCLSID(progID)
    print "ProgID %s is CLSID %s" % (progID, clsid)
    key = r'CLSID\%s\InProcServer32' % clsid
    try:
        with OpenKey(HKEY_CLASSES_ROOT, key) as h2:
            location = QueryValueEx(h2, "CodeBase")
            file_name = location[0].replace("file:///", "")
            file_loc = file_name.replace(r'/', r'\\')
            print file_loc, ": File exists?", os.path.exists(file_loc)
    except WindowsError, err:
        print r"Missing CLSID: HKEY_CLASSES_ROOT\%s" % key
        raise MissingCLSIDException()

if __name__ == '__main__':
    import sys
    for arg in sys.argv[1:]:
        try:
            CoWhereIs(arg)
        except:
            pass

C# and const correctness

| | No TrackBacks

On a recent interview, I was asked: "Why doesn't C# have keywords for const-correctness?"

There are a few answers. First, C# does have constants and const class members. You can declare class members as const or readonly to get compile-time and run-time constants, respectively.

And it's even possible to write classes that are immutable, so that you cannot modify an existing one but have to make a new one, possibly based on an existing one. It's like strings:

    s.replace("The", "An");
does not modify the original string. Instead, you have to take the return value:
    s = s.replace("The", "An");

It won't come as a great surprise to hear that I favor making classes immutable wherever they can be, as a means of clarifying their usage and to make them safer.

But the real question is: why doesn't C# have const methods (methods where the code promises not to modify class members) or methods that respect const parameters? The nearest I have found to an answer on that is Anders Hejlsberg in an online interview:

Immutables

Bill Venners: In addition to being a C# and CLR construct, the value type is a general object-oriented programming concept. Another such concept is immutable types.

When I went to the first JavaOne back in 1996, everyone seemed to have one complaint about what they missed from C++ that Java didn't have. Different people had different complaints, but they all seemed to have at least one complaint. My complaint was const. I really liked what I could do with const in C++, though somehow I have gotten along in Java just fine without it.

Did you consider including support for the concept of immutable directly in C# and the CLR?

Anders Hejlsberg: There are two questions in here. With respect to immutability, it's tricky because what you're saying when you say something is immutable, is that from an external perspective, I cannot observe any mutation. That doesn't necessarily mean that it doesn't have a cache inside that makes it go more efficiently. It's just on the outside it looks immutable. That's hard for a compiler to figure out. We could certainly have a rule that says you can only modify the fields of this object in the constructor. And we could make certain usability guarantees. But it actually rules out some scenarios that are in use. So we haven't codified immutability as a hard guarantee, because it's a hard guarantee to make. The concept of an immutable object is very useful, but it's just up to the author to say that it's immutable.

Bill Venners: Immutability is part of the semantics of the class.

Anders Hejlsberg: Yes. With respect to const, it's interesting, because we hear that complaint all the time too: "Why don't you have const?" Implicit in the question is, "Why don't you have const that is enforced by the runtime?" That's really what people are asking, although they don't come out and say it that way.

The reason that const works in C++ is because you can cast it away. If you couldn't cast it away, then your world would suck. If you declare a method that takes a const Bla, you could pass it a non-const Bla. But if it's the other way around you can't. If you declare a method that takes a non-const Bla, you can't pass it a const Bla. So now you're stuck. So you gradually need a const version of everything that isn't const, and you end up with a shadow world. In C++ you get away with it, because as with anything in C++ it is purely optional whether you want this check or not. You can just whack the constness away if you don't like it.

The interviewees approach "Immutability [as] part of the semantics of the class." They consider this the relevant feature, not semantics of individual methods of a class.

Eric Gunnerson also considers const-correctness "a good idea in the small, but doesn't work too well in the large." Furthermore, he points out that it is hard to enforce across multiple languages.

There are also many entries on StackOverFlow.com on this question (searchable here), and the best answer provided is here:

I suspect there are some practical reasons, and some theoretical reasons:

  • Should the constness apply to the object or the reference? If it's in the reference, should this be compile-time only, or as a bit within the reference itself? Can something else which has a non-const reference to the same object fiddle with it under the hood?
  • Would you want to be able to cast it away as you can in C++? That doesn't sound very much like something you'd want on a managed platform... but what about all those times where it makes sense in C++?
  • Syntax gets tricky (IMO) when you have more than one type involved in a declaration - think arrays, generics etc. It can become hard to work out exactly which bit is const.
  • If you can't cast it away, everyone has to get it right. In other words, both the .NET framework types and any other 3rd party libraries you use all have to do the right thing, or you're left with nasty situations where your code can't do the right thing because of a subtle problem with constness.

There's a big one in terms of why it can't be supported now though:

  • Backwards compatibility: there's no way all libraries would be correctly migrated to it, making it pretty much useless :(

I agree it would be useful to have some sort of constness indicator, but I can't see it happening, I'm afraid.

So, approaches you can use to get immutability at the class level:

  1. immutable object pattern
    Class is immutable by design, made so by the developer
  2. collections returned as AsReadOnly
    List and Array generic classes both support this method.
  3. class members that are marked as readonly or const

The first two would support immutable method parameters -- you could not change them in the scope of the function and get side-effects. But C++-style const-correctness on methods is not available by language syntax.

Open source components

| | No TrackBacks

Last week, I went to a django presentation at HUGE Inc. in Brooklyn. For me, the highlight was Kevin Fricovsky's five minute (well, ten minute) lightning talk in which he demonstrated the power of using open source components. He took a fairly small blog project of his and pulled it down from github.com. Then he used pip to get the dependencies listed in the requirements.txt file. Inside of ten minutes, he had a working blog that used five to ten open source components.

In contrast to this, I got a big reality check on using open source code. I wanted to try out some of Steve Souders's ideas on improving website performance on a django open source project that is already running. I forked some code from github and was going to apply django-compressor to the CSS and JS to see the performance improvement. I never got to this point.

The project was missing a couple of important pieces:

  • a list of the requirements and how to obtain them
  • the settings.py file to list the installed apps

Without these, I was pretty much dead. I soldiered on, though, and found eight required components to add, but:

  • one of the components did not work with python 2.5 at head rev
  • I got a subtle templetags error message that I eventually realized meant that I had a wrong similarly-named open source component

I've written to the project author to find out where to get the correct component. It's a bracing counterpoint to the presentation. Taking your own code and adding components -- likely to work. Taking a project in an unknown state and trying to back out what the components are -- much more likely to fail.

By the way, if you get a message like this:

'XXX' is not a valid tag library: Could not load template library from django.templatetags.XXX, No module named models

then I have a technique for you. Assuming that templatetags module XXX has a ton of import statements in its files, you have to figure out which file it is importing that is causing the failure. Take code like this:

from django import template
from django.conf import settings
from django.core import template_loader
from django.db import models
from django.contrib.contenttypes.models import ContentType
from django.core.exceptions import ObjectDoesNotExist
from django.utils.safestring import mark_safe
from syncr.twitter.models import Tweet
from syncr.delicious.models import Bookmark
from syncr.flickr.models import Photo
from tagging.templatetags import tagging_tags
from tumblr.models import TumblrPost
from lastfm.models import LastfmPost
import re
import datetime

and turn it into this:

try:
    from django import template
    from django.conf import settings
    from django.core import template_loader
    from django.db import models
    from django.contrib.contenttypes.models import ContentType
    from django.core.exceptions import ObjectDoesNotExist
    from django.utils.safestring import mark_safe
except ImportError, exc:
    print "django error in %s: %s" % (__file__, exc)

try:
    from syncr.twitter.models import Tweet
    from syncr.delicious.models import Bookmark
    from syncr.flickr.models import Photo
    from tagging.templatetags import tagging_tags
    from tumblr.models import TumblrPost
    from lastfm.models import LastfmPost
except ImportError, exc:
    print "package error in %s: %s" % (__file__, exc)

try:
    import re
    import datetime
except ImportError, exc:
    print "python error in %s: %s" % (__file__, exc)

Do this for every file in the module. Run the dev server and watch for the import error location reported. Once you know which block is failing in which file, add print statements to see which particular import you get to before it fails. Then you'll know which import is not working. In my case, this showed me that I had used the wrong open source tumblr module for this project.

James Bennett has some more detailed observations on this point, too.

Recent reading

| | No TrackBacks

I've been doing a lot of reading recently. Here are Django books I've finished:

Here are general web development books I've read:

And a book on Git: Version Control with Git.

In addition, here are the books I am currently reading:

I particularly recommend Pro Django for django developers. It is advanced and offers a lot of insight into django and the python techniques it is built on.

Binary search coding test

| | No TrackBacks

The other day, I had an interview in which I was asked to "find an integer in a sorted array or integers if it was present." This is obviously a binary search, so I coded up this (but in C/C++):

def binarysearch(iter, val):
	lo, hi = 0, len(iter)-1
	while lo <= hi:
		mid = lo + (hi - lo) / 2
		if iter[mid] >= val :
			hi = mid - 1
		else:
			lo = mid + 1
	return (lo if iter[lo] == val else -1)

There are a couple of things that make this noteworthy:

  • Calculation of mid

    If you calculate the midpoint of the array in the more straightforward way (mid = (lo + hi) / 2), then you risk integer overflow when working with fixed sized integers.

  • Use of a single comparison in the loop

    It seems that you should actually test for an exact match on finding the element you are searching for. This is not so. More below.

  • Returning iter[lo] instead of iter[mid]

    It is not necessary to keep track of the last midpoint. When you fall out of the loop, either iter[lo] is the desired value or the value was not found in the array. If it is found, then in an array that has multiple instances of the value, lo is the location of the first occurrence.

It was on point two that the interview went a bit astray. The interviewer wanted to see code that looked more like this:

def binarysearch(iter, val):
	lo, hi = 0, len(iter)-1
	while lo <= hi:
		mid = lo + (hi - lo) / 2
		if iter[mid] == val :
			return mid
		elif iter[mid] > val:
			hi = mid - 1
		else:
			lo = mid + 1
	return -1

And that's all very well -- it's a correct algorithm -- but it's not as efficient. In my algorithm, there is only one comparison per loop. In the other, there is either one if there is a match or two on a failure. For comparison purposes and to make this concrete, let's imagine we have an array or 63 elements and we look for every element of the array.

Cost of my code

63 nodes searched * 6 comparisons per search
378 comparisons

Cost of other code

1 nodes searched * 1 comparisons = 1
2 nodes searched * 3 comparisons = 6
4 nodes searched * 5 comparisons = 20
8 nodes searched * 7 comparisons = 56
16 nodes searched * 9 comparisons = 144
32 nodes searched * 11 comparisons = 352
579 comparisons

So contrary to the expectations of the interviewer, it is actually more efficient to search to the end of the array every time using fewer comparisons than to search less of the array with more comparisons.

My StackOverflow flair

| | No TrackBacks

I don't think my column layout is wide enough to put this on the right-hand side, so for now I am posting my StackOverflow rating in this article:

Python code to find duplicate texts on google

| | No TrackBacks

I came across an accusation of plagiarism on the web today and thought it would be interesting to code up some python that finds candidate texts that are plagiarisms of the reference text. The obvious idea: query google for matches.

So the first thing was to find the unusual words that would form the google signature. I tracked down a list of English word frequencies and saved it to disk. Then I wrote code to load the reference text into memory and to count occurrences of each word. Then the top-scoring words are the ones that occur much more frequently than expected in the word frequency list.

Downloads: plagiarism.pyword frequency file

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.32-en

About this Archive

This page is an archive of recent entries in the programming category.

products is the previous category.

python is the next category.

Find recent content on the main index or look in the archives to find all content.