I thought I would give Seth's Squidoo thing a go.
Of course, what other subject to create content on, but CSV files! 
Let's see if it generates any traffic…
I thought I would give Seth's Squidoo thing a go.
Of course, what other subject to create content on, but CSV files! 
Let's see if it generates any traffic…
I've decided to change the title of my blog. I haven't been happy with the old one for a while.
It was a play on the film How to Succeed in Business Without Really Trying, but I don't think it ever really worked.
I've been working on some new directions for the business, with new products due for release later this year, so the 2.0 moniker seems right. Now I just need to check with Tim...
Yep, you read right. Too many fonts and bye-bye Firefox.
And by "too-many" I mean one more than the number of standard windows fonts…
You know, it really sucks. Firefox is so cool, but this is just not on. I was using Opera before I found this fix and you know what? Opera kicks ass. It's a damn fine browser and it hardly ever crashes these days. Back in the day I never used Opera because it was always bloody crashing. Must be Firefox's turn eh?
This "Interwebnet" thing will never catch on…
I am not a happy camper.
Firefox and Thunderbird on my Windows 2000 box are crashing completely at random. It seems like some sites and emails are toxic! Thunderbird even dies sometimes when scrolling.
Being of the software mindset I have poked and prodded at this problem a bit, but I'm stuck. Here's the killer: both work perfectly on my XP laptop. Killer huh?
And yes, I've done all usual stuff like reinstalling, new profiles, compacting, yada yada, etc. etc.
Like I said, not a happy camper at all, at all.
I've been playing around with Ruby lately, trying to write a UDP server. Of course, the way to write servers in Ruby is apparently (I'm still a bit new at this Ruby malarky) to subclass GServer.
And what a fine piece of code GServer is! Really does the job and shows just how easy it is to do stuff in Ruby. If you want to a TCP server.
But I needed a UDP server. So after much messing about I came up with the following half-baked solution. Please feel free to tear to pieces. I should remind you however that this code is WOMM certified.
# with apologies to John W. Small
require "socket"
require "thread"
class UServer
DEFAULT_HOST = "127.0.0.1"
DEFAULT_TRANSPORT = "TCP";
def serve(io,content)
end
@@services = {} # Hash of opened ports, i.e. services
@@servicesMutex = Mutex.new
def UServer.stop(port,
host = DEFAULT_HOST,
trans = DEFAULT_TRANSPORT)
@@servicesMutex.synchronize {
@@services[host][port][trans].stop
}
end
def UServer.in_service?(port,
host = DEFAULT_HOST,
trans = DEFAULT_TRANSPORT)
@@services.has_key?(host) and
@@services[host].has_key?(port) and
@@services[host][port].has_key?(trans)
end
def stop
@connectionsMutex.synchronize {
if @serverThread
@serverThread.raise "stop"
end
}
end
def stopped?
nil == @serverThread
end
def shutdown
@shutdown = true
end
def connections
@connections.size
end
def join
@serverThread.join if @serverThread
end
attr_reader :port, :host, :trans, :maxConnections
attr_accessor :stdlog, :audit, :debug
def connecting(client)
addr = client.peeraddr
log("#{self.class.to_s} #{@host}:#{@port} #{@trans} client:#{addr[1]} " +
"#{addr[2]}<#{addr[3]}> connect")
true
end
def disconnecting(clientPort)
log("#{self.class.to_s} #{@host}:#{@port} #{@trans}" +
"client:#{clientPort} disconnect")
end
protected :connecting, :disconnecting
def starting()
log("#{self.class.to_s} #{@host}:#{@port} #{@trans} start")
end
def stopping()
log("#{self.class.to_s} #{@host}:#{@port} #{@trans} stop")
end
protected :starting, :stopping
def error(detail)
log(detail.backtrace.join("\n"))
end
def log(msg)
if @stdlog
@stdlog.puts("[#{Time.new.ctime}] %s" % msg)
@stdlog.flush
end
end
protected :error, :log
def initialize(port, host = DEFAULT_HOST, trans = DEFAULT_TRANSPORT, maxConnections = 4,
stdlog = $stderr, audit = false, debug = false)
@serverThread = nil
@port = port
@host = host
@trans = trans
@maxConnections = maxConnections
@connections = []
@connectionsMutex = Mutex.new
@connectionsCV = ConditionVariable.new
@stdlog = stdlog
@audit = audit
@debug = debug
end
def start(maxConnections = -1)
raise "running" if !stopped?
@shutdown = false
@maxConnections = maxConnections if maxConnections > 0
@@servicesMutex.synchronize {
if UServer.in_service?(@port,@host,@trans)
raise "Port already in use: #{host}:#{@port} #{@trans}!"
end
if "TCP" == @trans
@server = ContentTCPServer.new(@host,@port)
else
@server = ContentUDPServer.new(@host,@port)
end
@@services[@host] = {} unless @@services.has_key?(@host)
@@services[@host][@port] = {} unless @@services[@host].has_key?(@port)
@@services[@host][@port][@trans] = self;
}
@serverThread = Thread.new {
begin
starting if @audit
while !@shutdown
@connectionsMutex.synchronize {
puts "start @con.size=#{@connections.size}"
while @connections.size >= @maxConnections
@connectionsCV.wait(@connectionsMutex)
end
}
client, port, close, content = @server.accept
Thread.new(client,port,close,content) {
|myClient, myPort, myClose, myContent|
@connectionsMutex.synchronize {
@connections << Thread.current
}
begin
serve(myClient,myContent) if !@audit or connecting(myClient)
puts "finished serve"
rescue => detail
error(detail) if @debug
ensure
begin
if myClose
myClient.close
end
rescue
end
@connectionsMutex.synchronize {
@connections.delete(Thread.current)
@connectionsCV.signal
}
disconnecting(myPort) if @audit
end
}
end
rescue => detail
error(detail) if @debug
ensure
begin
@server.close
rescue
end
if @shutdown
@connectionsMutex.synchronize {
while @connections.size > 0
@connectionsCV.wait(@connectionsMutex)
end
}
else
@connections.each { |c| c.raise "stop" }
end
@serverThread = nil
@@servicesMutex.synchronize {
@@services[@host][@port].delete(@trans)
}
stopping if @audit
end
}
self
end
end
class ContentTCPServer
def initialize( host, port )
@server = TCPServer.new(host,port)
end
def accept
client = @server.accept
return [client,client.peeraddr[1],true,client.gets(nil)]
end
def close
@server.close
end
end
class ContentUDPServer
def initialize( host, port )
puts "init"
@socket = UDPSocket.new
puts "new s: #{@socket} on #{host}:#{port}"
@socket.bind(host, port)
puts "bound: #{@socket}"
end
def accept
puts "accept"
packet = @socket.recvfrom(1024)
return [@socket, 0, false, packet[0]]
end
def close
@socket.close
end
end
Lovely! Who says Java can't be translated into Ruby? I even used duck typing!
Anyway, see if you can spot the significant difference in thread handling between this and the original GServer.
Well, OK, here it is:
GServer says:
@connections << Thread.new(client) { |myClient|
...
}
I say:
Thread.new(client,port,close,content) { |myClient, myPort, myClose, myContent|
@connectionsMutex.synchronize {
@connections << Thread.current
}
...
}
What happened was that UDP packets were handled so quickly that the thread was (mostly) never placed in the @connections list until after it was (supposedly) removed from the list by
@connectionsMutex.synchronize {
@connections.delete(Thread.current)
@connectionsCV.signal
}
at the end of the request thread. Seems like a nasty race condition to me. My code just makes sure that the thread is placed into the @connections list before nasty stuff can happen.
Am I right? You tell me...
I'm going to give some insider information. When I was starting up my components business I found it nearly impossible to get an data on what my sales would look like. So now that I've had a couple of years to collect some data on my own products, I'm going to let you have a little look at it.
The most critical decision you face for a new product is how to price it. There's all sorts of stuff written about this. And all sorts of names for the various strategies: skimming, penetration, cost-plus, etc. Go read Joel for a nice introduction.
Anyway, what I'm going to give you is my demand curve, and my revenue curve. Hmm. I don't know if it's called a revenue curve, but it seems like a good name. It's the one where instead of showing price, like in the demand curve, you show total revenue. Basically, price by units sold. Both curves show values per month. Right, on with the show:
My Demand Curve

My Revenue Curve

Ah. Number's eh? Actual revenue numbers? ROFL mate!
Nope, sorry, can't give you actual revenue numbers or units sold. Confidential business information and all that.
Still, if you're thinking of getting into the software components business, these curves will give you something to chew on. Let me explain them a bit more.
First, the demand curve does show the product price. That's public knowledge. I have collected my data from three pricing periods. I started with $97 and stayed with that price for the first 16 months. I then moved to $47 for a further 10 months. And I've been on $170 for the last two months. This gives a nice bit of data to work with. We can plot three points on the demand and revenue curves and interpolate between them to get some idea of the shape of the curves.
Second, the product I'm focusing on is actually the single developer versions of CSV Manager and XML Manager. The single developer version is the entry level version of these components and the biggest seller. Both products are also sufficiently similar in style and function that we can acceptably merge their data sets. These curves are representative of the majority of Ricebridge sales.
Third, the scales are linear and they are not zero-based. The curves give you information about the relative values between the three prices.
So what are the curves telling you? Well first, software components ain't no Giffen good. Bummer dude. The demand curve is pretty normal. Push up the price and no-one buys. Flog 'em cheap and you'll get a load of buyers.
More interesting is the revenue curve. This is what actually helps you decide what price is going to make the most money. For software you can pretty much ignore unit costs. The main cost to you of selling an extra unit is just your payment processing fees and they increase linearly. Looking at my curve you can see that there's a sweet spot between the $97 and $170 prices.
So should I reprice at $120? Would you?
Well I'm not going to. There's not enough data for the $170 price yet. I have a feeling it will hit sales harder than it has to date. I think I've just been lucky so far. That pushes the maximum point of the curve closer to $97. Remember, don't panic. Get the data.
You can tell that the $47 price point was a disaster. I stuck with it for too long. The data does not lie. It may have generated higher sales volumes, but overall revenue was significantly down. But hey, I had to know.
Of course, the problem with this entire analysis is that software products change. New versions get released. CSV Manager 1.2 came out last November. It has new features so you get more for a given price. Let's just ignore that inconvenient factoid — it sorta messes up my lovely graphs!
To return to my pricing decision, I think the only way is up, actually. Given that the revenue difference between $170 and $97 is not that big, and given that new versions will be better value for money, I think on the whole that my price is pretty much about right for the moment. Yes, I am sacrificing volume. It's probably a skimming strategy. Then again, a price penetration strategy against open source (most of the alternative components) would be pretty nuts.
So there you go, if you're thinking of entering the software components market, now you have a bit more to go on. If you're already in the market, you might want to post something similar…
I'd like to announce the release of a really cool new product by J-Stels Softare: StelsXML.
"StelsXML is a JDBC type 4 driver that allows you to perform SQL queries and other JDBC operations on XML files. With the StelsXML JDBC driver, you can easily access data contained in your XML documents by using standard SQL syntax and XPath expressions. The driver is completely platform-independent."
StelsXML uses Ricebridge XML Manager as its underlying XML engine. As you can imagine we're pretty happy to have been chosen for the job and we can heartily recommend the final product. Yet another way to break the impedance mismatch between Java and XML.
So go check StelsXML out!
This is a new release of Ricebridge CSV Manager. New features include support for Java Beans, a pull/push streaming API for loading and saving CSV, and a simplified set of load and save methods. The example code has been expanded and now includes line-by-line explanations. It's all detailed on the What's New page
We're also introducing a new pricing scheme. A free single developer XML Manager license (worth $170) is included FREE with every CSV Manager license. PLUS, we include a free SIX MONTH email support package (worth $1500) with each purchase. And if you're an independent contractor, we've introduced a new option just for you: claim a 50% discount when you link to us from your site or blog!
This version is fully backwards compatible with CSV Manager 1.1 so you won't need to change your existing code.
Existing Ricebridge customers are invited to download the free upgrade from their user accounts.
We're just putting the finishing touches on the next release of CSV Manager.

It's been a bit of a long haul this one. I decided to alter the naming convention for our CSV loading and saving methods, so that they would be easier to learn. Instead of having loads of methods for each type of data source (File, InputStream, etc), I refactored the API so that the load and save methods take an Object as the data source, and work out for themselves what to do with it. Much easier! Except that it meant rewriting lots of documentation…
I've also decided to add a six-month email support package into every purchase. We were supporting all customers anyway, so let's make it official. This package is worth $1500 by the way. Nice!
And because CSV Manager and XML Manager are really designed to work together, you now get a FREE single-developer license of XML Manager with every CSV Manager purchase. If you'd like to know more about how these two products can be integrated, read our XML to CSV and back again article.
But we are going to put our prices up. There's no way round that one. The price change mostly covers the six-month email support package. We're not a fire-and-forget company. If you use our stuff, we do want to help you get it working for your project. So this seems a better match with what people need. But yes, prices have gone up. Sorry guys! 
There is some good news. If you're an independent contractor (and we'll have a pretty loose definition of this), and you mention us on your blog, then you can claim a 50% discount on ALL our products. And if you don't have a blog, we'll work something out. This will be a trust thing, so don't be shy – just ask!
And if you've been surfing round our site in the last two months and you think you should get the old prices, well, we agree. Just send us a mail explaining the situation and we'll sort you out. This should help ease the pain a bit. Only valid for 2006, etc., etc.
So what are the new prices? We'll be launching very soon, so you'll find out this week… stay tuned!
We're about to launch the next version of CSV Manager. It's coming out Real Soon Now. 
The thing is, I'm pretty much obsessed with having really high-quality API documentation. I mean, most Java API documentation is really tragic. One liners, out-of-date, misspellings, rotten links, no context. All that sort of evil stuff.
I want Ricebridge documentation to be different. But by God, it's hard. Writing good API documentation is the most difficult thing I ever done. It's not technically difficult, it's psychologically difficult. You see, every item of text has to be context-independent. You have to be able to jump into the API docs at any point and get pretty much all the information you need, or find links to it at least. That means that each item of documentation is highly repetitive and must contain lots of redundancy. This is what makes the API documentation "good". You can always find out what you need to know right away, right where you are.
But it makes it a right royal pain to write. A major major job. But I think it's worth it. It's a differentiator, to go all business-speak on you. It's part of what makes Ricebridge components different. We don't answer support questions by saying "go ask in the forums, and you might get an answer..." Documentation, which you pay for, is where you should find what you need.
Are we there yet? No way. We're not even 10%. But I do know we're a lot better than most. We care, at least.
It's very frustrating though, because I don't know how to do it better. Writing good API documentation just seems to be hard problem. You can't outsource it easily. Only the person who wrote the code really understands it. You can't just dump a lot of text on someone and expect them to "make it better". You need someone who has domain knowledge. And there just aren't that many copywriters out there who know enough about Java.
I don't mean to dump on copywriters in general by the way, but I've just seen far too much software documentation that was obviously written by someone who was not given sufficient technical support. It's unfair on the copywriter as well. I want to be specific about this problem. I 'm not taking about user manuals or introductory guides or even reference material. I'm taking about hard-core technical documentation for developers. Our products do not have a GUI, so the only way you can even understand how they work is to program with them. If you can't program, you can't even experience them, which seems essential for writing about them.
I don't know. Am I being to hard? Too stuck in my box? If you reckon you're a dab hand at writing Java API documentation and you're freelance, drop me a line and we'll look you up the next time. I am willing to try.
So in the meantime we soldier on and I have to write it myself. It actually takes longer than the coding. No it really does. But you know what, I'm proud of the results and every single customer loves our documentation, so we'll stick with it.
I am still working away on the touch typing. My drills have taken a hit recently as some of my projects have ramped up quite a bit. It was ever thus…
I'm also getting pretty frustrated with learn2type. They have 6 levels of typing skill and you are supposed to progress through the levels by gaining better accuracy and speed. However I have been stuck at level 3 for a long time now. This level is quite a big jump from the first two. My performance has been stuck at around 75%. Learn2Type has this performance measure that applies to each practice run — I don't know how they calculate it. Anyway, I just don't seem to be able to break through the barrier.
Obviously, I just have to keep at it until it "clicks". But I do think that the design of the system is incorrect. Level progression is an important reward strategy for skill building. All good computer game concepts have level-progression very carefully fine-tuned, so that the player gets constant positive encouragment. The same thing applies in this case. Except that learn2type have got it wrong. Instead of 6 levels there should be 60. That way you can keep moving up levels all the time. That's how you generate "addiction". Right now, learn2type has turned into a chore.
The other niggly thing about learn2type is that the font used to display the text you have to type in is a very bad font. It's hard to tell the difference between 1 (one) and l (el). Or 0 (zero) and O (Oh). Now this is just silly. Not choosing a good monospace font for typing lessons is a pretty major blunder. Not sure how they let this one get through. It's not a show-stopper, but it is annoying.
Anyway, onwards…
We're hard at work on CSV Manager 1.2 - coming out Real Soon Now! But in the meantime we're bringing out a bug fix release of 1.1. This release (1.1.11) fixes bug #0013: comment characters at the start of a data field were causing parse errors.
This was a real nasty bug. We added comment support recently as it was a common feature request from customers. And this bug is a really classic example of the law of unintended consequences. It's a great new feature, but it introduced a hard-to-find bug. We had no test cases for the standard comment character '#', in the case where it was just ordinary data. In fact we have no test cases to check that all valid characters can in fact be parsed. We sould have had them. So this is a classic lesson: think hard about regression testing, and try really really hard to check against all input conditions.
One easy way to perform this type of test is something called fuzz-testing. Basically you chuck a load of random data at your program and see it it breaks. We actually do have a test like this. We use massive randomly generated CSV files to check performance and parsing. But here's the thing. At no point did any of those files ever have a '#' character in just the right place to trigger this bug. The laws of probability are against you on this one.
I think this shows the importance of a combined testing approach. You must apply all the techniques. None of them are silver bullets. You must create tests based on really hard thinking about test conditions, and you should also have randomised testing. And don't forget your test coverage either. Or multi-platform testing. etc. etc. Erik Sink had some good points on this stuff recently. Anyway, the long and the short is, it's not easy to cover all the bases and you have to put in the leg work. No wonder software products take ten times more work than software projects.
Back to the bug fix release. This is an important one to download and install. If your input data ever contains a '#' you could be in trouble with older versions of CSV Manager. So login to your user account and grab the latest release. We strongly encourage you to do this. And let me apologise for introducing such a nasty bug. Data integrity and loss prevention is a design mantra for CSV Manager, so we're not happy campers at all about messing that up.
Finally, thanks to Dan for discovering the bug. He also got our first bug-bounty – a $15 ThinkGeek gift cert.
tag-gen:Technorati Tags: ricebridge java csv manager jar Del.icio.us Tags: ricebridge java csv manager jar
Let’s say you’ve got a list of globs: '*', 'a*', '*a', '*a*'. Now sort them. Sort them based on how specific they are. More specific globs match fewer things. Less specific globs match more things.
If Dr. Seuss had had anything to say about globs, what would he have said?
Glob on String
Blob on Thing
Glob on String for Blog on Thing
Blog on Glob for Thing on String
When Globs on Strings match Blogs on Things
Then... Blogs on Globs match Strings on Things
And When Blogs match Globs and Strings match Things
Then... Blogs on Globs match Blogs on Things on Globs on Strings!
So how do you sort globs? And why would you want to sort them? I’m working on a project where you use globs to pick out the error conditions that you are interested in. These error conditions have names. So you can match entire classes of error by using globs. For example 'foo.*' matches the errors 'foo.bar' and 'foo.baz'.
But what if I also add '*.*' to the mix. This also matches 'foo.bar' and 'foo.baz'. But I want 'foo.*' to match first. It’s more specific. For some definition of specific.
Again, 'a*' is more specific than '*a*'. 'a*' can match fewer things. Let’s look at this more closely. Say we have the set of words: [aa,ab,ba,bb]. Then
* matches aa,ab,ba,bb
a* matches aa, ab
*a matches aa, ba
*a* matches aa, ab, ba
So we can order the globs like so: 'a*', '*a', '*a*', '*', in order of increasing matches. The ones at the front are more specific. They match fewer things.
What about 'a*' and '*a'. How do we decide which comes first. Rather arbitrarily, lets say that the more specific prefix of 'a*' makes it the winner. Prefixes are more specific than suffixes. Hmm. That’s getting way too philosophical. Take Java packages: 'com.ricebridge.csvman.*' is more specific than '*.csvman.test'. The latter can occur in any top level package, but the former in only one. Yeah it’s pretty weak, but hey!
Now here’s the thing: what is the rule for sorting globs? I want a rule which:
1. is human computable 'just by looking' — that is, the rule is obvious
2. creates a full ordering of any set of globs — no arbitrary ordering of non-identical globs
3. has reasonable efficiency
The first idea: Use the number of stars. More stars means more specific. When the number of stars is the same, sort alphabetically. Neat and easy. And wrong. Sure it puts 'a*b*c' before 'a*c', but it also puts '*a*' before 'a*', which is incorrect.
Let’s look at globs. There are stars, and then there are normal characters. They always form an alternating pattern: 'a*b*c'. Or '*a*b*'. So this looks like an essential (and not accidental) feature of globs. The number of stars and the number of 'normals'. Yes, in general, the more stars you have, the more constrained you are, but there’s more going on.
After thinking about this for a bit, I realised that the outside stars are special. Very special. A glob with a star at the start or end, or both, can match a lot of things. Way more things that a glob with stars inside other characters. What if we just define an ordering for the outside stars. And we have one already. Form the preceding discussion: 'a', 'a*', '*a', '*a*', '*'.
Now that 'a' can stand for anything inside the outside stars. Other globs in fact. But those globs can’t have outside stars! The glob '**' is the same as the glob '*'. Adjacent stars merge into one. So '*a*', with a = '*b*' becomes '**b**' becomes '*b*'. On the other hand with a = 'b*c', '*a*' becomes '*b*c*'. A different beast entirely.
Now at this point you might be saying, "hold on a sec, you’re trying to order regular expressions! That's insane! You need to evaluate them to do that!" Because globs are just regular expressions: 'a*' is really /a.*/ in regex land. True. But I think that for the special case of globs built from stars only (let’s drop the use of '?'), we can in fact define an ordering without requiring evaluation. The problem is sufficiently complex to require some thought, but it's not a 'hard' problem.
I would think that due to back-tracking, solving the sorting problem for standard regular expressions is probably going to involve solving the halting problem, since you have to evaluate each regex. But I could be wrong. I often am.
Back to the globs. We can sort based on the outside stars. Fine. We have five subsets to sort now: 'a', 'a*', '*a', '*a*', '*'. '*' is a special case — it’s always last. It's the least specific glob of all, matching anything. The glob with no stars, 'a', is also easy. Just sort alphabetically.
For the other three we note that a will always take the forms 'b', 'b*b', 'b*b*b', ..., and so on. That is, the essential thing is the number of inside stars. In this case, the more stars you have, the less you can match. For example 'a*a' will match 'azbza', 'azcza'. But 'a*b*a' will only match 'azbza'. More stars are more specific. Problem solved!
Nope. Not quite. What about globs having the same number of inside stars? What about 'a*a' versus 'a*ab'? Which is more specific? I think 'a*ab' is. It has more information, more constraints on what it can match. For any given finite set of words, there are more ways to end in 'a' alone than 'ab'. So the next criteria is: more normal characters, more specific.
You can see the next catch, can't you? What if we have the same number of normal characters? Alphabetic sort? No, doesn’t work – the characters may be spread out between in the stars in different ways: 'a*ab' versus 'ab*a'. Which is more specific? Hmm. Let’s invoke our prefix rule again. This makes 'ab*a' more specific than 'a*ab', because it has a longer prefix. The criteria is: whoever’s got the mostest on the leftest.
And finally, if the character spread is identical, then we go alphabetic. So 'aa*a' precedes 'ab*a'. Let's see what we have...
The human rules are:
1. Prefixes are more specific than suffixes
2. Outside stars first: 'a', 'a*', '*a','*a*'
3. Inside stars: more stars means more specific
4. Normal characters: more characters, more to the left, is more specific
5. Otherwise go alphabetic
Here’s a sample list of globs, starting with the most specific. See if this list matches your idea of how they should be ordered.
com.ricebridge.csvman.CsvManager
com.*.*Manager
com.ricebridge.*.CsvManager
com.ricebridge.csvman.*
*.ricebridge.csvman
*.ricebridge.*
*
This is almost right. But really I think com.ricebridge.*.CsvManager is more specific than com.*.*Manager. How can we arrange this? Do more stars really mean more specific? Or do normal characters provide far more specificity?
The more normal characters you use, the more specific you are being. Stars provide very little information. But strings of normal characters really narrow things down. But in which set of words? What is the total set of words that we are matching in the com.ricebridge scenario? Infinite? Is there some structure? If these are the names of Java classes (my target domain), then com.ricebridge.*.CsvManager is definitely more specific (matches fewer items) than com.*.*Manager. Seems like prefixes really do rule. I wonder is this somehow connected to Zipf's Law...
Anyway, let’s drop rules 3 and 4, and replace them with:
Inside characters: longest prefix wins
So even if you have loads of stars, if you have a short prefix, you lose. This gives:
com.ricebridge.csvman.CsvManager
com.ricebridge.*.CsvManager
com.*.*Manager
com.ricebridge.csvman.*
*.ricebridge.csvman
*.ricebridge.*
*
That’s a lot better. Let’s restate the human rules:
1. Outside stars first: 'a', 'a*', '*a', '*a*'
2. Inside characters: longest prefix wins
4. Prefixes same length? more stars wins
3. Otherwise go alphabetic
That new rule number 4 means that 'a*b*c' is more specific than 'a*b'. So you only compare the minimum number of prefixes. If they're are equal, take the longest glob as more specific.
A much better ruleset! And it pretty much conforms to the criteria for the sorting algorithm: human computable, full ordering, and reasonable performance.
Time to go code it up. Post a comment if you want to save me from myself!

I was in Berlin recently and while wandering around the Brandenbürger Tor I came across this drain cover. Another example of european design I suppose…
I'm trying out all the free typing lessons on the web in the vain hope that something will stick and I'll eventually be able to touch-type. Here’s all the typing posts, including site reviews, if you're interested.
I've moved on from goodtyping.com. Now I'm with learn2type.com. Whereas goodtyping eventually moves you to a pay–per–use model, learn2type is purely ad&ndashsupported. And that's fine. The ads don't really get in the way.
Where the sites do differ is their approach to learning. goodtyping is much easier when you're getting started. It follows the traditional model. You start with the home row and learn progressively more keys over time and the typing drills use real words and sentences (mostly). learn2type on the other hand dumps you right in at the deep end with all the keys. This is great if you are already comfortable with the basics, but I don't think an absolute beginner would find it much fun. I'm glad I started with goodtyping.

That said, learn2type is a much better site once you are beyond the beginner stage. The drills are more challenging, you get nice graphs showing your progress (love those!), and you can learn the numeric keypad as well. The site also claims that it uses a learning algorithm to figure out which keys are giving you trouble so that you can concentrate on them in the drills. Not sure how well that works (then again, I find most of the keys tricky!).
On the whole this "learning to touch–type" project is taking a while. I originally started so that I would be able to write more for this blog. My "natural" typing is pretty fast but has a very high error rate. Writing blog entries just takes too long and the constant backspacing kills my flow. I don't seem to have this problem when I'm coding, but then I use auto&ndashcomplete a lot, so I never have to type that much. Also, when coding, you're mostly thinking, not bashing on the keyboard, so reduced speed is not a big deal.
The biggest challenge is finding enough time to do the drills. I don't know about you but I'm not exactly blessed with much free time between work, clients, kids and required downtime. I get about 5–10 minutes drilling in on a weekday. Probably not enough.
The other thing that is frustrating is the transition to full–time touch–typing. That is really hard. My touch–typing is nowhere near fast enough to use on a daily basis. So I stick with old bad habits with most of my typing. I hope that is not "unwiring" my touch&ndashltyping. I'm kind of hoping that my brain will regard touch–typing as a different "language" and store it in a new brain module (apparently this is the way natural language learning works).
If you know of any good online typing resources, let me know! I intend to try them all out…
tag-gen:Technorati Tags: touch typing learn online drill home row keyboard Del.icio.us Tags: touch typing learn online drill home row keyboard