April 2008 Archives

Amazon Wishlist Buddy

| | No TrackBacks
Wishlist Buddy helps you save money on the things you have in your Amazon Wishlist.

Repair UNIX files for use on Windows

| | No TrackBacks
from __future__ import with_statement

def fixFile(fileName) :
   with open(fileName, "r") as f :
      str = f.read().replace("\r\n", "\n").replace("\r", "\n")
   with open(fileName, "w") as f :
      f.write(str)

Thread pools in python

| | No TrackBacks

Rules for Parallel Programming for Multicore

| | No TrackBacks
James Reinders on parallel programming

Programming for multicore processors poses new challenges. Here are eight rules for multicore programming to help you be successful:

  1. Think parallel. Approach all problems looking for the parallelism. Understand where parallelism is, and organize your thinking to express it. Decide on the best parallel approach before other design or implementation decisions. Learn to "Think Parallel."
  2. Program using abstraction. Focus on writing code to express parallelism, but avoid writing code to manage threads or processor cores. Libraries, OpenMP, and Intel Threading Building Blocks are all examples of using abstractions. Do not use raw native threads (pthreads, Windows threads, Boost threads, and the like). Threads and MPI are the assembly languages for parallelism. They offer maximum flexibility, but require too much time to write, debug, and maintain. Your programming should be at a high-enough level that your code is about your problem, not about thread or core management.
  3. Program in tasks (chores), not threads (cores). Leave the mapping of tasks to threads or processor cores as a distinctly separate operation in your program, preferably an abstraction you are using that handles thread/core management for you. Create an abundance of tasks in your program, or a task that can be spread across processor cores automatically (such as an OpenMP loop). By creating tasks, you are free to create as many as you can without worrying about oversubscription.
  4. Design with the option to turn concurrency off. To make debugging simpler, create programs that can run without concurrency. This way, when debugging, you can run programs first with--then without--concurrency, and see if both runs fail or not. Debugging common issues is simpler when the program is not running concurrently because it is more familiar and better supported by today's tools. Knowing that something fails only when run concurrently hints at the type of bug you are tracking down. If you ignore this rule and can't force your program to run in only one thread, you'll spend too much time debugging. Since you want to have the capability to run in a single thread specifically for debugging, it doesn't need to be efficient. You just need to avoid creating parallel programs that require concurrency to work correctly, such as many producer-consumer models. MPI programs often violate this rule, which is part of the reason MPI programs can be problematic to implement and debug.
  5. Avoid using locks. Simply say "no" to locks. Locks slow programs, reduce their scalability, and are the source of bugs in parallel programs. Make implicit synchronization the solution for your program. When you still need explicit synchronization, use atomic operations. Use locks only as a last resort. Work hard to design the need for locks completely out of your program.
  6. Use tools and libraries designed to help with concurrency. Don't "tough it out" with old tools. Be critical of tool support with regards to how it presents and interacts with parallelism. Most tools are not yet ready for parallelism. Look for threadsafe libraries--ideally ones that are designed to utilize parallelism themselves.
  7. Use scalable memory allocators. Threaded programs need to use scalable memory allocators. Period. There are a number of solutions and I'd guess that all of them are better than malloc(). Using scalable memory allocators speeds up applications by eliminating global bottlenecks, reusing memory within threads to better utilize caches, and partitioning properly to avoid cache line sharing.
  8. Design to scale through increased workloads. The amount of work your program needs to handle increases over time. Plan for that. Designed with scaling in mind, your program will handle more work as the number of processor cores increase. Every year, we ask our computers to do more and more. Your designs should favor using increases in parallelism to give you advantages in handling bigger workloads in the future.

I wrote these rules with explicit mention of threading everywhere. Only rule #7 is specifically related to threading. Threading is not the only way to get value out of multicore. Running multiple programs or multiple processes is often used, especially in server applications.

These rules will work well for you to get the most out of multicore. Some will grow in importance over the next 10 years, as the number of processor cores rises and we see an increase in the diversity of the cores themselves. The coming of heterogeneous processors and NUMA, for instance, makes rule #3 more and more important.

You should understand all eight and take all eight to heart. I look forward to any comments you may have about these rules or parallelism in general.

Reduce FireFox memory usage

| | No TrackBacks

After you do this, Firefox gives up all but 8MB of memory when minimized.

  1. Type about:config in the Address Bar
  2. Right Click in the page and select New -> Boolean
  3. Create variable config.trim_on_minimize
  4. Set variable to True

FIX protocol

| | No TrackBacks

QuickFIX is a full-featured open source FIX engine, currently compatible with the FIX 4.0-4.4 spec. It runs on Windows, Linux, Solaris, FreeBSD and Mac OS X. API's are available for C++, Java, .NET, Python and Ruby.

Links to documents

The following was scraped from an online PowerPoint presentation:

Session level messages

  • logon/logout
  • heartbeat
  • test request
  • resend request
  • reject
  • sequence reset

Application level messages

  • new order
  • execution report
  • order cancel/replace request
  • allocation
  • trade capture report
  • confirmation

For each field:

  • tag (a unique number)
  • field name
  • data type (string, char, price, quantity)
  • description
  • FIXML name (XML element name)

User-defined messages

User-defined fields

Message syntaxes
  1. 'tag=value' syntax
    1. tag (tag number)
    2. =
    3. value
    4. delimiter (ASCII SOH)
  2. FIXML syntax
    • XML schema defined
    • application messages only, no session level

Sample transaction
(1) buy side connect to TCP socket on port of sell side FIX engine
(2) sell side accepts TCP connection
(1) send Logon message
(2) send Logon message back
(1) send New Order message
(2) send Execution Report acknowledging order
(2) send Execution Report containing fill

Buy 5000 IBM at 110.75

8=FIX4.2^		# BeginString
9=251^			# BodyLength
35=D^			# MsgType (New Order)
49=AFUNDMGR^		# SenderCompID (AFUNDMGR)
56=ABROKER^		# TargetCompID (ABROKER)
34=2^			# MsgSeqNum (2)
52=20030615-01:14:49^	# SendTime
11=12345^		# ClOrderID (CLient Order ID)
21=1^			# HandleInst (automated exec)
55=IBM^			# Symbol (IBM)
54=1^			# Side (buy)
60=2003061501:14:49^	#
38=5000^		# OrderQty (5000)
40=2^			# OrdType (Limit)
44=110.75^		# Price (110.75)
10=127^			# Checksum
<FIXML>
<FIXMLMessage>
<Header>
<SendingTime></SendingTime>
<Sender>
<CompID>AFUNDMGR</CompID>
</Sender>
<Target>
<CompID>ABROKERR</CompID>
</Target>
</Header>
<ApplicationMessage>
<Order>
<ClOrdID></ClOrdID>
<HandlInst Value="1" />
<Instrument>
<Symbol>IBM</Symbol>
</Instrument>
<Side Value="1" />
<TransactTime></TransactTime>
<OrderQtyData>
<OrderQty>5000</OrderQty>
</OrderQtyData>
<OrderType Value="2" />
<Price>110.75</Price>
</Order>
</ApplicationMessage>
</FIXMLMessage>
</FIXML>

BrainBench technical tests

| | No TrackBacks
My BrainBench results (transcript 2723251):
  • Python 2.4: 96th percentile
  • C# 2.0: 96th percentile
  • Written English (UK): 98th percentile
  • Written English: 99th percentile
  • SQL ANSI: 99th percentile
Other tests I plan to take:
  • MS SQL Server programming
  • .NET Framework
  • .NET Framework 2.0
  • .NET Framework Fundamentals
  • ADO.NET

Lincoln Center 2008 Swing Dance schedule

| | No TrackBacks

A list of the nights that Lincoln Center Midsummer Night Swing will be hosting swing bands:

DateBand
2008-07-08Nellie McKay and the Aristocrats Swing Band
2008-07-09Chuck Brown
2008-07-10Michael Arenella and His Dreamland Orchestra
2008-07-15Alan Gresik Swing Shift Orchestra
2008-07-17Orquesta Tipica Imperial
2008-07-22Harlem Renaissance Orchestra
2008-07-24Lavay Smith and Her Red Hot Skillet Lickers

Craig's List humor

| | No TrackBacks

Lonely Planet scandal

| | No TrackBacks

Lonely Planet scandal

There has recently been a minor scandal in the tourist book publication business because of the admission of a Lonely Planet editor that he did not always go to the places he described in his books.

Thomas Kohnstamm says that he was in San Francisco at the time. "They didn't pay me enough to go to Colombia," he told The Australian Herald Sun. So he "got the information from a chick I was dating who was an intern in the Colombian Consulate".

[...]

Speaking from his home in Seattle, Kohnstamm told The Times that his comments had been misrepresented. He denied inventing content. However, he stood by his original argument that overstretched and underfunded freelance travel writers, whose recommendations are relied upon by generations of travellers, cannot possibly visit every budget hotel and jungle outpost they are asked to.

Death of the guidebook: lost in a cutthroat world

Guidebook publishers will deny this, but the travel publishing industry is bound to exploit demand for what is widely seen as a glamour job - travel and get paid for it. But with so many competing guidebook series, many titles do not generate sales revenue that justifies the legwork that results in genuine personal recommendations. Most publishers who make claims to the contrary are being disingenuous.

Using XSD

| | No TrackBacks
Using XSD

  • Creating a DataSet Schema
  • Elements and Attributes
  • Using Simple Types
  • Using Server Explorer with the XML Designer
  • Adding Keys to a DataSet Schema
  • One-to-Many Relationships
  • Nested Relationships
  • Using the Component Designer to create a strongly typed DataSet Object
  • Creating a Strongly Typed DataSet from a DataSet schema
  • Using a Strongly Typed DataSet object
  • Understanding the DOM
  • Using and XMLReader Object
  • The XmlNode class
  • The XmlDocument class
  • Synchronizing a DataSet objectwith an XmlDataDocument object
  • XPath
  • XSD Schemas
  • Using XML with SQL Server

Windows Defender

| | No TrackBacks

US foreclosure hotspots

| | No TrackBacks

Some places are hotter than others.

C# POP3 library

| | No TrackBacks

Statistics library

| | No TrackBacks

Wiki comparison Matrix

| | No TrackBacks

Marshalling results to the UI thread

| | No TrackBacks
private delegate
[ReturnType] [MethodName] Delegate([parameters]);
private [ReturnType] MethodName
{
    if(this.InvokeRequired)
    {
        [MethodName]Delegate del = 
                new [MethodName]Delegate([MethodName]);
        object[] o = new object[]{[parameter values]};
        this.Invoke(del,o);
    }
    else
    {
        [your logic here]
    }
}

Moving archived email to GMail

| | No TrackBacks

Populate python dictionary using generator

| | No TrackBacks

Two ways to load a dictionary with keys and values:

# Using builtin zip()
d = dict(zip(the_keys, the_values))

# Using itertools.izip()
import itertools
d = dict(itertools.izip(the_keys, the_values)

Luhn checksum in python

| | No TrackBacks
def cardLuhnChecksumIsValid(card_number):
    """ checks to make sure that the card passes a luhn mod-10 checksum """
    def luhnDigit(digit, count) :
        return  ( a[digit] if ((count & 1) == oddeven) else digit )
    a = (0,2,4,6,8,1,3,5,7,9)
    oddeven = len(card_number) & 1
    total = sum(luhnDigit(int(x), count) for count, x in enumerate(card_number))
    return (total % 10) == 0

NetFlix prize

| | No TrackBacks

NetFlix is sponsoring a programming contest with a $1 million prize.

The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love.

Pyflix is a small package written in Python that provides an easy entry point for getting up and running in the Netflix Prize competition. It combines an efficient storage scheme with an intuitive high-level API that allows contestants to focus on the real problem, the recommendation system algorithm.

2008-03-08 links

| | No TrackBacks

IL Visualization

The Marvels of Monads

Generics in C#, Java, and C++

AmazNode] You give it a topic and it searches Amazon, displaying a network of related items.

.NET BackgroundWorker class

Great PowerPoint presentation

| | No TrackBacks
Lawrence Lessig speaks on intellectual property rights as a social phenomenon. It's an amazing PowerPoint presentation for its simplicity and its power.

ETF map

| | No TrackBacks

A heatmap that shows performance of ETFs, S&P 500, and the world over different time frames.

Articles on python programming

| | No TrackBacks

Links to programming resources

| | No TrackBacks

Running the Python Profiler

| | No TrackBacks

From Thomas Guest's The Third Rule of Program Optimisation

$ python -m cProfile -o prof.dump program.py

You then load the prof.dump output file and analyse it using the pstats module. For example, to find which 10 functions took the most time:

>>> import pstats 
>>> prof = pstats.Stats("prof.dump") 
>>> prof.sort_stats("time").print_stats(10) 

Reduce's greatest hits

| | No TrackBacks

Reduce's greatest hits

def unite(sets):
    return reduce(set.union, sets, set())

def intersect(sets):
    return reduce(set.intersection, sets) if sets else set()

def bits_to_integer(bits):
    return reduce(lambda acc, bit: acc << 1 | bit, bits, 0)

def concatenate(items, initial):
    from operator import concat
    return reduce(concat, items, initial)

def product(items):
    from operator import mul
    return reduce(mul, items, 1)

Stuff Nobody Likes

| | No TrackBacks

ImprovEverywhere: Best Game Ever

| | No TrackBacks

Freak Lori out

| | No TrackBacks
The wife's a bit acrophobic, so these YouTube videos would be just the thing for her.


MTA trip planner

| | No TrackBacks

Maximum subsequence

| | No TrackBacks

From WordAlinged.org. The simple solution:

def maximum_subsequence(seq):
    maxsofar, maxendinghere = 0, 0
    for s in seq:
        # invariant: maxendinghere and maxsofar are accurate
        # are accurate up to s
        maxendinghere = max(maxendinghere + s, 0)
        maxsofar = max(maxsofar, maxendinghere)
    return maxsofar

A streaming library for reuse:

def stream_accumulate(stream):
    total = 0
    for s in stream:
        total += s
        yield total

def stream_floor(stream):
    m = 0
    for s in stream:
        m = min(m, s)
        yield m

def stream_diff(s, t):
    import itertools
    for ss, tt in itertools.izip(s, t):
        yield ss - tt

def stream_diff(s, t):
    import itertools
    import operator
    return itertools.imap(operator.sub, s, t)

A maximum subsequence solution that uses the stream library:

import stream

def maximum_subsequence_stream(ss):
    "Return the stream of max sum contiguous subsequences of the input iterable."
    accu1, accu2 = itertools.tee(stream.accumulate(ss))
    return stream.ceil(stream.diff(accu1, stream.floor(accu2, baseline=0)))

def maximum_subsequence(ss):
    "Return the max sum of a contiguous subsequence of the input iterable."
    return stream.last(maximum_subsequence_stream(ss))

Structure and Interpretation of Computer Programs

| | No TrackBacks

Python groupby() code

| | No TrackBacks

I think that python's itertools.groupby() function is pretty cool. Here is some sample code:

import itertools
import operator

city_list = [
  ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
  ('Anchorage', 'AK'), ('Nome', 'AK'),
  ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')]

a = itertools.groupby(city_list, operator.itemgetter(1))
print [ (state, ",".join(i[0] for i in it)) 
  for state, it in a ]

Highlights from wordaligned.org

| | No TrackBacks
[Echo]
def echo(fn):
    "Returns a traced version of the input function."
    from itertools import chain
    def wrapped(*v, **k):
        name = fn.__name__
        print "%s(%s)" % (
            name, ", ".join(map(repr, chain(v, k.values()))))
        return fn(*v, **k)
    return wrapped
[MapReduce: Simplified Data Processing on Large Clusters]
[Idiomatic python]
[Functional programming in python]

Python anagram code

| | No TrackBacks

More code lifted from wordaligned.org:

from __future__ import with_statement
from collections import defaultdict

def word_iter(fileName) :
    with open(fileName, "r") as f:
        for line in f:
            yield line.strip().lower()

anag_dict = defaultdict(list)
for word in word_iter("wordlist.txt") :
    key = "".join(sorted(word))
    anag_dict[key].append(word)

anagrams = [(key, words) 
  for key, words in anag_dict.iteritems() 
  if len(words) > 1]

NTFS behavior changes

| | No TrackBacks

NTFS behavior changes

Fsutil behavior set disable8dot3 1
FSUTIL behavior set disablelastaccess 1
Fsutil behavior set mftzone 1

Where are your taxes going

| | No TrackBacks

.NET threading resources

| | No TrackBacks

Wine search engines/online vendors

| | No TrackBacks
[bountyhunterwine.com] Producer of some top Napa Valley wines, including their Justice Series
[Beltramo] One of the best online wine stores in the country
[wine-searcher.com] My favorite general search engine for wine
[wine.woot.com] Blog that serves up an excellent new wine selection every day
[NapaCabs] This is where I bought a Rombauer 2004 merlot.

UniqueChars class

| | No TrackBacks

Adapted from [research!rsc]

public class UniqueChars<T>
{
	private int n;
	private T dense[((unsigned long)((T)-1)) + 1];
	private T sparse[((unsigned long)((T)-1)) + 1];

	public UniqueChars() { clearSet(); }
	public void clearSet() { n = 0; }

	public void addMember(T val)
	{
		dense[n] = val;
		sparse[val] = n++;
	}
	public isMember(T val)
	{
		const T index = sparse[val];
		return index < n && dense[index] == val;
	}
	public void removeMember(T val)
	{
		if (isMember(val))
		{
			const T old_val = dense[--n];
			const T new_index = sparse[val]
			dense[new_index] = old_val;
			sparse[old_val] = new_index;
		}
	}
};

void findUniqueChars(const unsigned char *str)
{
	UniqueChars<unsigned char> s;
	unsigned char c;
	while (0 != (c = *str++))
	{
		if (! s.isMember(c))
		{
			s.addMember(c);
			putc (c, stdout);
		}
	}
}

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.32-en

About this Archive

This page is an archive of entries from April 2008 listed from newest to oldest.

March 2008 is the previous archive.

May 2008 is the next archive.

Find recent content on the main index or look in the archives to find all content.