April 2008 Archives
from __future__ import with_statement
def fixFile(fileName) :
with open(fileName, "r") as f :
str = f.read().replace("\r\n", "\n").replace("\r", "\n")
with open(fileName, "w") as f :
f.write(str)
Programming for multicore processors poses new challenges. Here are eight rules for multicore programming to help you be successful:
- Think parallel. Approach all problems looking for the parallelism. Understand where parallelism is, and organize your thinking to express it. Decide on the best parallel approach before other design or implementation decisions. Learn to "Think Parallel."
- Program using abstraction. Focus on writing code to express parallelism, but avoid writing code to manage threads or processor cores. Libraries, OpenMP, and Intel Threading Building Blocks are all examples of using abstractions. Do not use raw native threads (pthreads, Windows threads, Boost threads, and the like). Threads and MPI are the assembly languages for parallelism. They offer maximum flexibility, but require too much time to write, debug, and maintain. Your programming should be at a high-enough level that your code is about your problem, not about thread or core management.
- Program in tasks (chores), not threads (cores). Leave the mapping of tasks to threads or processor cores as a distinctly separate operation in your program, preferably an abstraction you are using that handles thread/core management for you. Create an abundance of tasks in your program, or a task that can be spread across processor cores automatically (such as an OpenMP loop). By creating tasks, you are free to create as many as you can without worrying about oversubscription.
- Design with the option to turn concurrency off. To make debugging simpler, create programs that can run without concurrency. This way, when debugging, you can run programs first with--then without--concurrency, and see if both runs fail or not. Debugging common issues is simpler when the program is not running concurrently because it is more familiar and better supported by today's tools. Knowing that something fails only when run concurrently hints at the type of bug you are tracking down. If you ignore this rule and can't force your program to run in only one thread, you'll spend too much time debugging. Since you want to have the capability to run in a single thread specifically for debugging, it doesn't need to be efficient. You just need to avoid creating parallel programs that require concurrency to work correctly, such as many producer-consumer models. MPI programs often violate this rule, which is part of the reason MPI programs can be problematic to implement and debug.
- Avoid using locks. Simply say "no" to locks. Locks slow programs, reduce their scalability, and are the source of bugs in parallel programs. Make implicit synchronization the solution for your program. When you still need explicit synchronization, use atomic operations. Use locks only as a last resort. Work hard to design the need for locks completely out of your program.
- Use tools and libraries designed to help with concurrency. Don't "tough it out" with old tools. Be critical of tool support with regards to how it presents and interacts with parallelism. Most tools are not yet ready for parallelism. Look for threadsafe libraries--ideally ones that are designed to utilize parallelism themselves.
- Use scalable memory allocators. Threaded programs need to use scalable memory allocators. Period. There are a number of solutions and I'd guess that all of them are better than malloc(). Using scalable memory allocators speeds up applications by eliminating global bottlenecks, reusing memory within threads to better utilize caches, and partitioning properly to avoid cache line sharing.
- Design to scale through increased workloads. The amount of work your program needs to handle increases over time. Plan for that. Designed with scaling in mind, your program will handle more work as the number of processor cores increase. Every year, we ask our computers to do more and more. Your designs should favor using increases in parallelism to give you advantages in handling bigger workloads in the future.
I wrote these rules with explicit mention of threading everywhere. Only rule #7 is specifically related to threading. Threading is not the only way to get value out of multicore. Running multiple programs or multiple processes is often used, especially in server applications.
These rules will work well for you to get the most out of multicore. Some will grow in importance over the next 10 years, as the number of processor cores rises and we see an increase in the diversity of the cores themselves. The coming of heterogeneous processors and NUMA, for instance, makes rule #3 more and more important.
You should understand all eight and take all eight to heart. I look forward to any comments you may have about these rules or parallelism in general.
After you do this, Firefox gives up all but 8MB of memory when minimized.
- Type about:config in the Address Bar
- Right Click in the page and select New -> Boolean
- Create variable config.trim_on_minimize
- Set variable to True
QuickFIX is a full-featured open source FIX engine, currently compatible with the FIX 4.0-4.4 spec. It runs on Windows, Linux, Solaris, FreeBSD and Mac OS X. API's are available for C++, Java, .NET, Python and Ruby.
Links to documents
The following was scraped from an online PowerPoint presentation:
Session level messages
- logon/logout
- heartbeat
- test request
- resend request
- reject
- sequence reset
Application level messages
- new order
- execution report
- order cancel/replace request
- allocation
- trade capture report
- confirmation
For each field:
- tag (a unique number)
- field name
- data type (string, char, price, quantity)
- description
- FIXML name (XML element name)
User-defined messages
User-defined fields
Message syntaxes
- 'tag=value' syntax
- tag (tag number)
- =
- value
- delimiter (ASCII SOH)
- FIXML syntax
- XML schema defined
- application messages only, no session level
Sample transaction
(1) buy side connect to TCP socket on port of sell side FIX engine
(2) sell side accepts TCP connection
(1) send Logon message
(2) send Logon message back
(1) send New Order message
(2) send Execution Report acknowledging order
(2) send Execution Report containing fillBuy 5000 IBM at 110.75
8=FIX4.2^ # BeginString 9=251^ # BodyLength 35=D^ # MsgType (New Order) 49=AFUNDMGR^ # SenderCompID (AFUNDMGR) 56=ABROKER^ # TargetCompID (ABROKER) 34=2^ # MsgSeqNum (2) 52=20030615-01:14:49^ # SendTime 11=12345^ # ClOrderID (CLient Order ID) 21=1^ # HandleInst (automated exec) 55=IBM^ # Symbol (IBM) 54=1^ # Side (buy) 60=2003061501:14:49^ # 38=5000^ # OrderQty (5000) 40=2^ # OrdType (Limit) 44=110.75^ # Price (110.75) 10=127^ # Checksum
<FIXML> <FIXMLMessage> <Header> <SendingTime></SendingTime> <Sender> <CompID>AFUNDMGR</CompID> </Sender> <Target> <CompID>ABROKERR</CompID> </Target> </Header> <ApplicationMessage> <Order> <ClOrdID></ClOrdID> <HandlInst Value="1" /> <Instrument> <Symbol>IBM</Symbol> </Instrument> <Side Value="1" /> <TransactTime></TransactTime> <OrderQtyData> <OrderQty>5000</OrderQty> </OrderQtyData> <OrderType Value="2" /> <Price>110.75</Price> </Order> </ApplicationMessage> </FIXMLMessage> </FIXML>
- Python 2.4: 96th percentile
- C# 2.0: 96th percentile
- Written English (UK): 98th percentile
- Written English: 99th percentile
- SQL ANSI: 99th percentile
- MS SQL Server programming
- .NET Framework
- .NET Framework 2.0
- .NET Framework Fundamentals
- ADO.NET
A list of the nights that Lincoln Center Midsummer Night Swing will be hosting swing bands:
| Date | Band |
|---|---|
| 2008-07-08 | Nellie McKay and the Aristocrats Swing Band |
| 2008-07-09 | Chuck Brown |
| 2008-07-10 | Michael Arenella and His Dreamland Orchestra |
| 2008-07-15 | Alan Gresik Swing Shift Orchestra |
| 2008-07-17 | Orquesta Tipica Imperial |
| 2008-07-22 | Harlem Renaissance Orchestra |
| 2008-07-24 | Lavay Smith and Her Red Hot Skillet Lickers |
Lonely Planet scandal
There has recently been a minor scandal in the tourist book publication business because of the admission of a Lonely Planet editor that he did not always go to the places he described in his books.
Thomas Kohnstamm says that he was in San Francisco at the time. "They didn't pay me enough to go to Colombia," he told The Australian Herald Sun. So he "got the information from a chick I was dating who was an intern in the Colombian Consulate".
[...]
Speaking from his home in Seattle, Kohnstamm told The Times that his comments had been misrepresented. He denied inventing content. However, he stood by his original argument that overstretched and underfunded freelance travel writers, whose recommendations are relied upon by generations of travellers, cannot possibly visit every budget hotel and jungle outpost they are asked to.
Death of the guidebook: lost in a cutthroat world
Guidebook publishers will deny this, but the travel publishing industry is bound to exploit demand for what is widely seen as a glamour job - travel and get paid for it. But with so many competing guidebook series, many titles do not generate sales revenue that justifies the legwork that results in genuine personal recommendations. Most publishers who make claims to the contrary are being disingenuous.
- Creating a DataSet Schema
- Elements and Attributes
- Using Simple Types
- Using Server Explorer with the XML Designer
- Adding Keys to a DataSet Schema
- One-to-Many Relationships
- Nested Relationships
- Using the Component Designer to create a strongly typed DataSet Object
- Creating a Strongly Typed DataSet from a DataSet schema
- Using a Strongly Typed DataSet object
- Understanding the DOM
- Using and XMLReader Object
- The XmlNode class
- The XmlDocument class
- Synchronizing a DataSet objectwith an XmlDataDocument object
- XPath
- XSD Schemas
- Using XML with SQL Server
Some places are hotter than others.
private delegate
[ReturnType] [MethodName] Delegate([parameters]);
private [ReturnType] MethodName
{
if(this.InvokeRequired)
{
[MethodName]Delegate del =
new [MethodName]Delegate([MethodName]);
object[] o = new object[]{[parameter values]};
this.Invoke(del,o);
}
else
{
[your logic here]
}
}
Two ways to load a dictionary with keys and values:
# Using builtin zip() d = dict(zip(the_keys, the_values)) # Using itertools.izip() import itertools d = dict(itertools.izip(the_keys, the_values)
def cardLuhnChecksumIsValid(card_number):
""" checks to make sure that the card passes a luhn mod-10 checksum """
def luhnDigit(digit, count) :
return ( a[digit] if ((count & 1) == oddeven) else digit )
a = (0,2,4,6,8,1,3,5,7,9)
oddeven = len(card_number) & 1
total = sum(luhnDigit(int(x), count) for count, x in enumerate(card_number))
return (total % 10) == 0
NetFlix is sponsoring a programming contest with a $1 million prize.
The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love.
Pyflix is a small package written in Python that provides an easy entry point for getting up and running in the Netflix Prize competition. It combines an efficient storage scheme with an intuitive high-level API that allows contestants to focus on the real problem, the recommendation system algorithm.
IL Visualization
AmazNode] You give it a topic and it searches Amazon, displaying a network of related items.
A heatmap that shows performance of ETFs, S&P 500, and the world over different time frames.
From Thomas Guest's The Third Rule of Program Optimisation
$ python -m cProfile -o prof.dump program.py
You then load the prof.dump output file and analyse it using the pstats module. For example, to find which 10 functions took the most time:
>>> import pstats
>>> prof = pstats.Stats("prof.dump")
>>> prof.sort_stats("time").print_stats(10)
Reduce's greatest hits
def unite(sets):
return reduce(set.union, sets, set())
def intersect(sets):
return reduce(set.intersection, sets) if sets else set()
def bits_to_integer(bits):
return reduce(lambda acc, bit: acc << 1 | bit, bits, 0)
def concatenate(items, initial):
from operator import concat
return reduce(concat, items, initial)
def product(items):
from operator import mul
return reduce(mul, items, 1)
From WordAlinged.org. The simple solution:
def maximum_subsequence(seq):
maxsofar, maxendinghere = 0, 0
for s in seq:
# invariant: maxendinghere and maxsofar are accurate
# are accurate up to s
maxendinghere = max(maxendinghere + s, 0)
maxsofar = max(maxsofar, maxendinghere)
return maxsofar
A streaming library for reuse:
def stream_accumulate(stream):
total = 0
for s in stream:
total += s
yield total
def stream_floor(stream):
m = 0
for s in stream:
m = min(m, s)
yield m
def stream_diff(s, t):
import itertools
for ss, tt in itertools.izip(s, t):
yield ss - tt
def stream_diff(s, t):
import itertools
import operator
return itertools.imap(operator.sub, s, t)
A maximum subsequence solution that uses the stream library:
import stream
def maximum_subsequence_stream(ss):
"Return the stream of max sum contiguous subsequences of the input iterable."
accu1, accu2 = itertools.tee(stream.accumulate(ss))
return stream.ceil(stream.diff(accu1, stream.floor(accu2, baseline=0)))
def maximum_subsequence(ss):
"Return the max sum of a contiguous subsequence of the input iterable."
return stream.last(maximum_subsequence_stream(ss))
I think that python's itertools.groupby() function is pretty cool. Here is some sample code:
import itertools
import operator
city_list = [
('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
('Anchorage', 'AK'), ('Nome', 'AK'),
('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')]
a = itertools.groupby(city_list, operator.itemgetter(1))
print [ (state, ",".join(i[0] for i in it))
for state, it in a ]
def echo(fn):
"Returns a traced version of the input function."
from itertools import chain
def wrapped(*v, **k):
name = fn.__name__
print "%s(%s)" % (
name, ", ".join(map(repr, chain(v, k.values()))))
return fn(*v, **k)
return wrapped
[MapReduce: Simplified Data Processing on Large Clusters][Idiomatic python]
[Functional programming in python]
More code lifted from wordaligned.org:
from __future__ import with_statement
from collections import defaultdict
def word_iter(fileName) :
with open(fileName, "r") as f:
for line in f:
yield line.strip().lower()
anag_dict = defaultdict(list)
for word in word_iter("wordlist.txt") :
key = "".join(sorted(word))
anag_dict[key].append(word)
anagrams = [(key, words)
for key, words in anag_dict.iteritems()
if len(words) > 1]
NTFS behavior changes
Fsutil behavior set disable8dot3 1 FSUTIL behavior set disablelastaccess 1 Fsutil behavior set mftzone 1
[Beltramo] One of the best online wine stores in the country
[wine-searcher.com] My favorite general search engine for wine
[wine.woot.com] Blog that serves up an excellent new wine selection every day
[NapaCabs] This is where I bought a Rombauer 2004 merlot.
Adapted from [research!rsc]
public class UniqueChars<T>
{
private int n;
private T dense[((unsigned long)((T)-1)) + 1];
private T sparse[((unsigned long)((T)-1)) + 1];
public UniqueChars() { clearSet(); }
public void clearSet() { n = 0; }
public void addMember(T val)
{
dense[n] = val;
sparse[val] = n++;
}
public isMember(T val)
{
const T index = sparse[val];
return index < n && dense[index] == val;
}
public void removeMember(T val)
{
if (isMember(val))
{
const T old_val = dense[--n];
const T new_index = sparse[val]
dense[new_index] = old_val;
sparse[old_val] = new_index;
}
}
};
void findUniqueChars(const unsigned char *str)
{
UniqueChars<unsigned char> s;
unsigned char c;
while (0 != (c = *str++))
{
if (! s.isMember(c))
{
s.addMember(c);
putc (c, stdout);
}
}
}
