Recently in python Category

Stupefying Django error

| | No TrackBacks

Wow. I spent a lot of time on this this afternoon. I had a Django application with an admin interface. Every time I tried to create an object with no foreign keys, I got this error on the save:

Traceback (most recent call last):

  File "C:\Python26\lib\site-packages\django\core\servers\basehttp.py", line 279, in run
    self.result = application(self.environ, self.start_response)

  File "C:\Python26\lib\site-packages\django\core\servers\basehttp.py", line 651, in __call__
    return self.application(environ, start_response)

  File "C:\Python26\lib\site-packages\django\core\handlers\wsgi.py", line 241, in __call__
    response = self.get_response(request)

  File "C:\Python26\lib\site-packages\django\core\handlers\base.py", line 134, in get_response
    return self.handle_uncaught_exception(request, resolver, exc_info)

  File "C:\Python26\lib\site-packages\django\core\handlers\base.py", line 154, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)

  File "C:\Python26\lib\site-packages\django\views\debug.py", line 40, in technical_500_response
    html = reporter.get_traceback_html()

  File "C:\Python26\lib\site-packages\django\views\debug.py", line 86, in get_traceback_html
    frames = self.get_traceback_frames()

  File "C:\Python26\lib\site-packages\django\views\debug.py", line 205, in get_traceback_frames
    pre_context_lineno, pre_context, context_line, post_context = self._get_lines_from_file(filename, lineno, 7, loader, module_name)

  File "C:\Python26\lib\site-packages\django\views\debug.py", line 186, in _get_lines_from_file
    context_line = source[lineno].strip('\n')

IndexError: list index out of range

Totally mysterious: there is not a line of my code in the trace stack and I have no idea what file it is trying to index into. I really made no progress on this until I tried the shell:

>>> from django_league.league.models import *
>>> Sport.objects.create(sport_name='Hockey')
__unicode__
Traceback (most recent call last):
  File "lgt;console>", line 1, in lgt;module>
  File "C:\Python26\lib\site-packages\django\db\models\base.py", line 328, in __
repr__
    u = unicode(self)
  File "C:\Users\hughdbrown\Documents\django\django_league\..\django_league\leag
ue\models.py", line 17, in __unicode__
    return u"%s" % (sport_name, )
NameError: global name 'sport_name' is not defined

Now we're getting somewhere, I thought. I have a function of mine in scope. The problem was that I had defined my __unicode__ method without using self in the code:

class Sport(models.Model):    
    sport_name = models.CharField(max_length=20)
    def __unicode__(self):
        return u"%s" % (self.sport_name, )

    @models.permalink
    def get_absolute_url(self):
        return ('sport', None, {'object_id' : self.id})

    class Meta:
        ordering = ['sport_name']

So fixing it was pretty easy once I knew that.

VB.NET code to implement Excel Rank()

| | No TrackBacks

Recently, I had to implement the Excel function Rank() for a project. I came up with this templatized LINQ code:

Public Class ReversedComparer(Of T As IComparable)
    Implements IComparer(Of T)
    Public Function Compare(ByVal x As T, ByVal y As T) As Integer _
    Implements System.Collections.Generic.IComparer(Of T).Compare
       Return -x.CompareTo(y)
    End Function
End Class

Private Shared Function BinarySearch(Of T As IComparable)(ByRef x() As T, _
                                     ByVal val As T, _
                                     ByVal comparer As IComparer(Of T)) As Integer
    Dim lo As Integer = 0, hi As Integer = x.Count - 1
    While lo <= hi
        Dim mid As Integer = (hi + lo) \ 2
        If comparer.Compare(val, x(mid)) <> 1 Then
            hi = mid - 1
        Else
            lo = mid + 1
        End If
    End While
    Return CInt(If(comparer.Compare(x(lo), val) = 0, lo, -1))
End Function

Public Shared Function Rank(ByRef X() As Double) As Integer()
    Dim sorted() As Double = (From xx In X Order By xx Descending Select xx).ToArray()
     Dim comparer As IComparer(Of Double) = New ReversedComparer(Of Double)
     Return X.Select(Function(val) 1 + BinarySearch(Of Double)(sorted, val, comparer)).ToArray()
End Function

And it's pretty good as far as it goes. It takes a copy of the data in reverse-sorted order and then does a binary search to find the 0-based position in the array for each element in the original array. And that's fine, but I wanted to implement something more like my python code:

import collections
def rank(arr):
	"""
	>>> a = list(range(10))
	>>> print rank(a)
	[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
	>>> b = list(range(10))
	>>> b.reverse()
	>>> print rank(b)
	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
	>>> c = [5] * 10
	>>> print rank(c)
	[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
	>>> d = ([5] * 5) + ([1] * 5)
	>>> d
	[5, 5, 5, 5, 5, 1, 1, 1, 1, 1]
	>>> print rank(d)
	[1, 1, 1, 1, 1, 6, 6, 6, 6, 6]
	"""
	d = collections.defaultdict(list)
	for i, v in enumerate(arr):
		d[v].append(i)
	result = [0] * len(arr)
	i = 1
	for k in sorted(d, reverse = True):
		for j in d[k]:
			result[j] = i
		i += len(d[k])
	return result

if __name__ == '__main__':
	import doctest
	doctest.testmod()
 

So I worked through the LINQ issues using the LinqPad interpreter and came up with this:

Function Rank(Of T)(Byref x() as T) as Integer()
    Dim original_pos = x.Select(Function(xx, index) New With {.Val = xx, .Index = index}) _
             .ToLookup(Function(xxx) xxx.Val)
    Dim keys = original_pos.OrderByDescending(Function(yy) yy.Key)    
    Dim result(x.Count-1) as Integer
    Dim i as Integer = 1
    For Each item in keys
        For Each v in item
            result(v.Index) = i
        Next
        i = i + item.Count
    Next
    Return result
End Function

And I think that's pretty good code: templatized LINQ code that uses lambda functions and implements the algorithm as quickly as I know how.

The return of CoWhereIs

| | No TrackBacks

A long time ago, I wrote CoWhereIs in C/C++. It was a small program to search the registry for where a COM server was to be found. Back in the mists of time, this was something you needed to do from time to time if you were a COM developer, as I was. And somewhere along the way, I stopped needing to do that and I lost the source code. But it was no big problem because I was no longer working with COM.

And now I find that I have some .NET code that implements user defined functions (UDFs) in Excel 2007 and it turns out that the integration between Excel and .NET is stuck in 2005: the code has to be exposed as a COM server that is installed with regasm on your machine. So now I need an installer and I need to test the installer to see if it is, in fact, calling regasm to patch the registry so that my .NET/COM server is actually available.

And so I found myself rewriting CoWhereIs to search the registry. This time, I've written it in python because the times have changed. Enjoy.

from __future__ import with_statement
from _winreg import *
import os.path

class MissingProgIDException(Exception):
    pass
class MissingCLSIDException(Exception):
    pass

def getCLSID(progID):
    key = r'%s\CLSID' % progID
    try:
        with OpenKey(HKEY_CLASSES_ROOT, key) as h:
            return QueryValue(h, "")
    except WindowsError, err:
        print r"Missing progID: HKEY_CLASSES_ROOT\%s" % key
        raise MissingProgIDException()

def CoWhereIs(progID):
    clsid = getCLSID(progID)
    print "ProgID %s is CLSID %s" % (progID, clsid)
    key = r'CLSID\%s\InProcServer32' % clsid
    try:
        with OpenKey(HKEY_CLASSES_ROOT, key) as h2:
            location = QueryValueEx(h2, "CodeBase")
            file_name = location[0].replace("file:///", "")
            file_loc = file_name.replace(r'/', r'\\')
            print file_loc, ": File exists?", os.path.exists(file_loc)
    except WindowsError, err:
        print r"Missing CLSID: HKEY_CLASSES_ROOT\%s" % key
        raise MissingCLSIDException()

if __name__ == '__main__':
    import sys
    for arg in sys.argv[1:]:
        try:
            CoWhereIs(arg)
        except:
            pass

Open source components

| | No TrackBacks

Last week, I went to a django presentation at HUGE Inc. in Brooklyn. For me, the highlight was Kevin Fricovsky's five minute (well, ten minute) lightning talk in which he demonstrated the power of using open source components. He took a fairly small blog project of his and pulled it down from github.com. Then he used pip to get the dependencies listed in the requirements.txt file. Inside of ten minutes, he had a working blog that used five to ten open source components.

In contrast to this, I got a big reality check on using open source code. I wanted to try out some of Steve Souders's ideas on improving website performance on a django open source project that is already running. I forked some code from github and was going to apply django-compressor to the CSS and JS to see the performance improvement. I never got to this point.

The project was missing a couple of important pieces:

  • a list of the requirements and how to obtain them
  • the settings.py file to list the installed apps

Without these, I was pretty much dead. I soldiered on, though, and found eight required components to add, but:

  • one of the components did not work with python 2.5 at head rev
  • I got a subtle templetags error message that I eventually realized meant that I had a wrong similarly-named open source component

I've written to the project author to find out where to get the correct component. It's a bracing counterpoint to the presentation. Taking your own code and adding components -- likely to work. Taking a project in an unknown state and trying to back out what the components are -- much more likely to fail.

By the way, if you get a message like this:

'XXX' is not a valid tag library: Could not load template library from django.templatetags.XXX, No module named models

then I have a technique for you. Assuming that templatetags module XXX has a ton of import statements in its files, you have to figure out which file it is importing that is causing the failure. Take code like this:

from django import template
from django.conf import settings
from django.core import template_loader
from django.db import models
from django.contrib.contenttypes.models import ContentType
from django.core.exceptions import ObjectDoesNotExist
from django.utils.safestring import mark_safe
from syncr.twitter.models import Tweet
from syncr.delicious.models import Bookmark
from syncr.flickr.models import Photo
from tagging.templatetags import tagging_tags
from tumblr.models import TumblrPost
from lastfm.models import LastfmPost
import re
import datetime

and turn it into this:

try:
    from django import template
    from django.conf import settings
    from django.core import template_loader
    from django.db import models
    from django.contrib.contenttypes.models import ContentType
    from django.core.exceptions import ObjectDoesNotExist
    from django.utils.safestring import mark_safe
except ImportError, exc:
    print "django error in %s: %s" % (__file__, exc)

try:
    from syncr.twitter.models import Tweet
    from syncr.delicious.models import Bookmark
    from syncr.flickr.models import Photo
    from tagging.templatetags import tagging_tags
    from tumblr.models import TumblrPost
    from lastfm.models import LastfmPost
except ImportError, exc:
    print "package error in %s: %s" % (__file__, exc)

try:
    import re
    import datetime
except ImportError, exc:
    print "python error in %s: %s" % (__file__, exc)

Do this for every file in the module. Run the dev server and watch for the import error location reported. Once you know which block is failing in which file, add print statements to see which particular import you get to before it fails. Then you'll know which import is not working. In my case, this showed me that I had used the wrong open source tumblr module for this project.

James Bennett has some more detailed observations on this point, too.

Recent reading

| | No TrackBacks

I've been doing a lot of reading recently. Here are Django books I've finished:

Here are general web development books I've read:

And a book on Git: Version Control with Git.

In addition, here are the books I am currently reading:

I particularly recommend Pro Django for django developers. It is advanced and offers a lot of insight into django and the python techniques it is built on.

I am always forgetting how to do dictionary construction. Here is a piece of code I restructured for a friend. Notice that the dictionary is created from a sequence of tuples. (Slightly modified from the original for better pretty-printing.)

def slug_to_lower(fn):
    """
    Decorator to lowercase string arguments to a function.
    """
    import string
    def is_string(x):
        return isinstance(x, str) or isinstance(x, unicode)
    def trans(x):
        return (x if not is_string(x) else string.lower(x))
    def wrapped(*args, **kwargs):
        neo_args = [trans(x) for x in args]
        tuple_list = [(k,trans(v)) for k,v in kwargs.items()]
        neo_kwargs = dict(tuple_list)
        return fn(*neo_args, **neo_kwargs)
    return wrapped

The things I don't know about python

| | No TrackBacks

Sometimes, I amaze myself with what I don't know. Here is a simple case: in python, you can pass a function a dictionary with keys with the same name as the function's arguments. For some reason, I thought this was only available when you identified keyword arguments.

>>> def foo(a, b) :
...     return a + b
...
>>> foo(2, 5)
7
>>> d = {'a':2, 'b':5}
>>> foo(**d)
7

EvanJones.ca

|

A guy I play frisbee with turns out to have a cool developer's website, EvanJones.ca. Here are a couple of the cool links I found there:

Python code coverage [via EvanJones.ca]

Yahoo Zookeeper (distributed systems infrastructure) [via EvanJones.ca]

Functional programming how-to

| | No TrackBacks
A tour of Python's features suitable for implementing programs in a functional style.

Downloading from notepad.yahoo.com

| | No TrackBacks

I have a lot of content in notepad.yahoo.com -- links, articles, etc. I'd like to download it, format it programmatically, and post it to my blog. I'd like to do this in python, and I've already tried HTTPBasicAuthHandler and urllib2 with no success -- I get HTML that asks for login, rather than HTML that shows my notepad entries.

Here's what I have tried:

import urllib2, base64
username, password = 'hughdbrown@yahoo.com', '?'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, username, password)
auth_handler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)

req = urllib2.Request('http://notepad.yahoo.com/')
base64string = base64.encodestring('%s:%s' % (username, password))[:-1]
req.add_header("Authorization", "Basic %s" % base64string)

handle = urllib2.urlopen(req)
print handle.read()

Does anyone have an idea of how to get into notepad.yahoo.com to download notes?

[2008-05-08]

I got email from Alex Angelopoulos who had this recommendation:

Actually, I used WSH because I'm most used to it, even though it was Python that initially got me excited about scripting. I also _write_ about admin scripting, and the usual name of the game is "what can you do with the bits already in the box?"

That said, what I'm doing could easily be translated into Python using its COM interop. All I do is automate the browser. This is of course dramatically more resource-intensive than curl, but it means that any ugly bits of the page are handled exactly the way the browser handles them, and the job changes from a lot of parsing and submission-handling to selecting elements by name or id and then setting values or invoking actions.

I've attached a demo of it so far. It logs in, then takes the first page - presumably listing all of the notes - and returns the URLs for each and every note. It's possible to push this farther, of course, actually walking through the notes and grabbing the content of each one.

Notes on the demo:

(1) It's VBScript. I know, don't say it. : / However, I correctly cased object properties and methods so translation should be straightforward if you want to Pythonize. The only terminally ugly bits should be the setAttribute methods; those are both arguments and should be parenthesized in more rational languages. To run the demo as-is, you need to modify the username/password, and then I advise running from the console WSH host cscript so you don't get lots of popups for the URLs.

(2) It uses hardcoded username/password which I've set topside (and will likely make into commandline arguments in a better version).

(3) If you haven't done IE automation before, note the little readyState wait loops. You need to wait for IE to finish each navigation event, and in the case of complex documents, it's also necessary to wait for the document to render.

At this point, I'm not certain what you _really_ want to do from here - or if the browser automation approach isn't feasible for some reason. What I particularly like about this approach, though, is that you can work with objects instead of text, and it's always guaranteed to work with pages designed to the "Well, it worked in Internet Explorer..." spec.

Alex

And he provided WSH code:

username = "yyyyyyyy@yahoo.com"
passwd = "xxxxxxx"

Dim ie: Set ie = CreateObject("InternetExplorer.Application")

' Initial navigation needed to stabilize IE
ie.Navigate("about:blank")
Do Until ie.readyState = 4 : WScript.Sleep 100: Loop

' Show IE for debugging; can be hidden once it works.
ie.Visible = true

ie.Navigate "http://notepad.yahoo.com"

Do Until ie.readyState = 4 : WScript.Sleep 100: Loop

Dim doc: Set doc = ie.Document

Set usernameNode = doc.getElementById("username")
usernameNode.setAttribute "value", username

Set passwordNode = doc.getElementById("passwd")
passwordNode.setAttribute "value", passwd

' There's only one form, but it doesn't have an id -
' so we get by tagname and treat it like a collection...
'Dim body: Set body = doc.Body
Set forms = doc.Body.getElementsByTagName("form")
For each form in forms
	if form.getAttribute("name") = "login_form" then form.submit()
next

' we're navigating in to the folders.

Do Until ie.readyState = 4 : WScript.Sleep 100: Loop
Set doc = ie.Document

' Apparently the document builds after the browser is technically
' finished navigating. So we wait until the document readyState
' "complete" - yes, it's a text state, not a flag...
Do Until doc.readyState = "complete"
	WScript.Sleep 100
Loop

Set table = ie.Document.getElementById("datatable")

Set links = table.getElementsByTagName("a")
for each link in links
	WScript.Echo link.href
next
[2008-05-09]

And that reminded me that I have used WatiN, a .NET library ostensibly for testing GUIs but really ideal for screen-scraping HTML pages -- particularly because it has Intellisense. So I wrote a program using WatiN that does everything:

using System;
using System.Globalization;
using System.Collections.Generic;
using WatiN.Core;

namespace YahooNotepad
{
  class YahooLink
  {
    private string url, folder, title;
    DateTime dt;
    public YahooLink(string url, string folder, 
      string title, DateTime dt)
    {
      this.dt = dt;
      this.title = title;
      this.url = url;
      this.folder = folder;
    }
    public string Url { get { return this.url; } }
  }
  class Program
  {
    static DateTime hackParse(string dateStr)
    {
      string[] ds = dateStr.Split(new char[] { ' ' });
      string[] datePart = ds[0].Split(new char[] { '/' });
      string[] timePart = ds[1].Split(new char[] { ':' });
      int year = 2000 + int.Parse(datePart[2]);
      int month = int.Parse(datePart[0]);
      int day = int.Parse(datePart[1]);
      int hour = int.Parse(timePart[0]);
      if (ds[2] == "pm" && hour > 12)
        hour += 12;
      int minute = int.Parse(timePart[1]);
      DateTime dt = new DateTime(year, month, day, 
        hour, minute, 0);
      return dt;
    }
    static bool processPages(IE ie, Queue<YahooLink> notes)
    {
      Table table = ie.Table("datatable");
      //    Skip the <HT> row
      for (int i = 1; i < table.TableRows.Length; i++)
      {
        TableRow row = (TableRow) table.TableRows[i];
        TableCell href = (TableCell) row.TableCells[1];
        Link link = (Link) href.Links[0];
        TableCell folder = (TableCell) row.TableCells[2];
        TableCell date = (TableCell)row.TableCells[3];

        DateTime dt = hackParse(date.Text);
        Console.WriteLine("{0} (in {1} on {2}): {3}",
        link.OuterText, folder.Text, dt, link.Url);
        YahooLink yl = new YahooLink(link.Url, folder.Text, 
          link.OuterText, dt);
        notes.Enqueue(yl);
      }
      try
      {
        ie.Link(Find.ByText("Next")).Click();
        return true;
      }
      catch (Exception)
      {
        return false;
      }
    }
    [STAThread]
    static void Main(string[] args)
    {
      string userName = "hughdbrown@yahoo.com";
      string password = "?";
      string website = "http://notepad.yahoo.com";
      using (IE ie = new IE(website))
      {
        try
        {
          ie.TextField(Find.ById("username"))
            .TypeText(userName);
          ie.TextField(Find.ById("passwd"))
            .TypeText(password);
          ie.Button(Find.ByValue("Sign In"))
            .Click();
        }
        catch (Exception)
        {
          //    Carry on -- probably already logged in
        }

        Queue<YahooLink> notes = new Queue<YahooLink>();
        while (processPages(ie, notes))
          ;

         foreach (YahooLink yl in notes)
         {
           ie.GoTo(yl.Url);
           TextField tf = ie.TextField(Find
             .ById("charCountTA"));
           Console.WriteLine(tf.Text);
         }
         ie.Link(Find.ById("Sign Out")).Click();
       }
     }
  }
}

There are a couple of hacks:

  • The DateTime parsing failed on 4/30/08 for no apparent reason, so I just hacked up a quick-and-dirty date parser for American-style dates.
  • If you are already logged in to notepad.yahoo.com, it skips the extra log in by catching a throw exception.
  • Yahoo seems to find it peculiar that I am looking at 2000 notes and pulling them all down. It raises a security warning sometime after I have pulled down a few hundred. Maybe I'll have to add a call to sleep().

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.32-en

About this Archive

This page is an archive of recent entries in the python category.

programming is the previous category.

resume is the next category.

Find recent content on the main index or look in the archives to find all content.