Pricing Changes for the Next Release of CSV Manager

We're just putting the finishing touches on the next release of CSV Manager.

It's been a bit of a long haul this one. I decided to alter the naming convention for our CSV loading and saving methods, so that they would be easier to learn. Instead of having loads of methods for each type of data source (File, InputStream, etc), I refactored the API so that the load and save methods take an Object as the data source, and work out for themselves what to do with it. Much easier! Except that it meant rewriting lots of documentation…

I've also decided to add a six-month email support package into every purchase. We were supporting all customers anyway, so let's make it official. This package is worth $1500 by the way. Nice!

And because CSV Manager and XML Manager are really designed to work together, you now get a FREE single-developer license of XML Manager with every CSV Manager purchase. If you'd like to know more about how these two products can be integrated, read our XML to CSV and back again article.

But we are going to put our prices up. There's no way round that one. The price change mostly covers the six-month email support package. We're not a fire-and-forget company. If you use our stuff, we do want to help you get it working for your project. So this seems a better match with what people need. But yes, prices have gone up. Sorry guys! :)

There is some good news. If you're an independent contractor (and we'll have a pretty loose definition of this), and you mention us on your blog, then you can claim a 50% discount on ALL our products. And if you don't have a blog, we'll work something out. This will be a trust thing, so don't be shy – just ask!

And if you've been surfing round our site in the last two months and you think you should get the old prices, well, we agree. Just send us a mail explaining the situation and we'll sort you out. This should help ease the pain a bit. Only valid for 2006, etc., etc.

So what are the new prices? We'll be launching very soon, so you'll find out this week… stay tuned!

Posted in General | Leave a comment

Would You Like Good Documentation?

So would I kid, you got any?

We're about to launch the next version of CSV Manager. It's coming out Real Soon Now. :)

The thing is, I'm pretty much obsessed with having really high-quality API documentation. I mean, most Java API documentation is really tragic. One liners, out-of-date, misspellings, rotten links, no context. All that sort of evil stuff.

I want Ricebridge documentation to be different. But by God, it's hard. Writing good API documentation is the most difficult thing I ever done. It's not technically difficult, it's psychologically difficult. You see, every item of text has to be context-independent. You have to be able to jump into the API docs at any point and get pretty much all the information you need, or find links to it at least. That means that each item of documentation is highly repetitive and must contain lots of redundancy. This is what makes the API documentation “good”. You can always find out what you need to know right away, right where you are.

But it makes it a right royal pain to write. A major major job. But I think it's worth it. It's a differentiator, to go all business-speak on you. It's part of what makes Ricebridge components different. We don't answer support questions by saying “go ask in the forums, and you might get an answer…” Documentation, which you pay for, is where you should find what you need.

Are we there yet? No way. We're not even 10%. But I do know we're a lot better than most. We care, at least.

It's very frustrating though, because I don't know how to do it better. Writing good API documentation just seems to be hard problem. You can't outsource it easily. Only the person who wrote the code really understands it. You can't just dump a lot of text on someone and expect them to “make it better”. You need someone who has domain knowledge. And there just aren't that many copywriters out there who know enough about Java.

I don't mean to dump on copywriters in general by the way, but I've just seen far too much software documentation that was obviously written by someone who was not given sufficient technical support. It's unfair on the copywriter as well. I want to be specific about this problem. I 'm not taking about user manuals or introductory guides or even reference material. I'm taking about hard-core technical documentation for developers. Our products do not have a GUI, so the only way you can even understand how they work is to program with them. If you can't program, you can't even experience them, which seems essential for writing about them.

I don't know. Am I being to hard? Too stuck in my box? If you reckon you're a dab hand at writing Java API documentation and you're freelance, drop me a line and we'll look you up the next time. I am willing to try.

So in the meantime we soldier on and I have to write it myself. It actually takes longer than the coding. No it really does. But you know what, I'm proud of the results and every single customer loves our documentation, so we'll stick with it.

Posted in General | Leave a comment

Still Learning to Touch Type

I am still working away on the touch typing. My drills have taken a hit recently as some of my projects have ramped up quite a bit. It was ever thus…

I'm also getting pretty frustrated with learn2type. They have 6 levels of typing skill and you are supposed to progress through the levels by gaining better accuracy and speed. However I have been stuck at level 3 for a long time now. This level is quite a big jump from the first two. My performance has been stuck at around 75%. Learn2Type has this performance measure that applies to each practice run — I don't know how they calculate it. Anyway, I just don't seem to be able to break through the barrier.

Obviously, I just have to keep at it until it “clicks”. But I do think that the design of the system is incorrect. Level progression is an important reward strategy for skill building. All good computer game concepts have level-progression very carefully fine-tuned, so that the player gets constant positive encouragment. The same thing applies in this case. Except that learn2type have got it wrong. Instead of 6 levels there should be 60. That way you can keep moving up levels all the time. That's how you generate “addiction”. Right now, learn2type has turned into a chore.

The other niggly thing about learn2type is that the font used to display the text you have to type in is a very bad font. It's hard to tell the difference between 1 (one) and l (el). Or 0 (zero) and O (Oh). Now this is just silly. Not choosing a good monospace font for typing lessons is a pretty major blunder. Not sure how they let this one get through. It's not a show-stopper, but it is annoying.

Anyway, onwards…

Posted in General | Leave a comment

CSV Manager 1.1.11 Released

We're hard at work on CSV Manager 1.2 – coming out Real Soon Now! But in the meantime we're bringing out a bug fix release of 1.1. This release (1.1.11) fixes bug #0013: comment characters at the start of a data field were causing parse errors.

This was a real nasty bug. We added comment support recently as it was a common feature request from customers. And this bug is a really classic example of the law of unintended consequences. It's a great new feature, but it introduced a hard-to-find bug. We had no test cases for the standard comment character '#', in the case where it was just ordinary data. In fact we have no test cases to check that all valid characters can in fact be parsed. We sould have had them. So this is a classic lesson: think hard about regression testing, and try really really hard to check against all input conditions.

One easy way to perform this type of test is something called fuzz-testing. Basically you chuck a load of random data at your program and see it it breaks. We actually do have a test like this. We use massive randomly generated CSV files to check performance and parsing. But here's the thing. At no point did any of those files ever have a '#' character in just the right place to trigger this bug. The laws of probability are against you on this one.

I think this shows the importance of a combined testing approach. You must apply all the techniques. None of them are silver bullets. You must create tests based on really hard thinking about test conditions, and you should also have randomised testing. And don't forget your test coverage either. Or multi-platform testing. etc. etc. Erik Sink had some good points on this stuff recently. Anyway, the long and the short is, it's not easy to cover all the bases and you have to put in the leg work. No wonder software products take ten times more work than software projects.

Back to the bug fix release. This is an important one to download and install. If your input data ever contains a '#' you could be in trouble with older versions of CSV Manager. So login to your user account and grab the latest release. We strongly encourage you to do this. And let me apologise for introducing such a nasty bug. Data integrity and loss prevention is a design mantra for CSV Manager, so we're not happy campers at all about messing that up.

Finally, thanks to Dan for discovering the bug. He also got our first bug-bounty – a $15 ThinkGeek gift cert.

tag-gen:Technorati Tags: Tags:

Posted in General | Leave a comment

How To Make Your Globs Specific

Let’s say you’ve got a list of globs: '*', 'a*',
'*a', '*a*'. Now sort them. Sort them based on how
specific they are. More specific globs match fewer things. Less specific
globs match more things.

If Dr. Seuss had had anything to say about globs, what would he have said?

Glob on String
Blob on Thing
Glob on String for Blog on Thing
Blog on Glob for Thing on String

When Globs on Strings match Blogs on Things
Then… Blogs on Globs match Strings on Things

And When Blogs match Globs and Strings match Things
Then… Blogs on Globs match Blogs on Things on Globs on Strings!

So how do you sort globs? And why would you want to sort them? I’m
working on a project where you use globs to pick out the error
conditions that you are interested in. These error conditions have
names. So you can match entire classes of error by using globs. For
example 'foo.*' matches the errors '' and

But what if I also add '*.*' to the mix. This also matches
'' and 'foo.baz'. But I want 'foo.*' to match
first. It’s more specific. For some definition of specific.

Again, 'a*' is more specific than '*a*'. 'a*' can
match fewer things. Let’s look at this more closely. Say we have the
set of words: [aa,ab,ba,bb]. Then

* matches aa,ab,ba,bb
a* matches aa, ab
*a matches aa, ba
*a* matches aa, ab, ba

So we can order the globs like so: 'a*', '*a',
'*a*', '*', in order of increasing matches. The ones at
the front are more specific. They match fewer things.

What about 'a*' and '*a'. How do we decide which comes
first. Rather arbitrarily, lets say that the more specific prefix of
'a*' makes it the winner. Prefixes are more specific than
suffixes. Hmm. That’s getting way too philosophical. Take Java
packages: 'com.ricebridge.csvman.*' is more specific than
'*.csvman.test'. The latter can occur in any top level
package, but the former in only one.
Yeah it’s pretty weak, but hey!

Now here’s the thing: what is the rule for sorting globs? I want a rule which:

1. is human computable 'just by looking' — that is, the rule is obvious
2. creates a full ordering of any set of globs — no arbitrary ordering of non-identical globs
3. has reasonable efficiency

The first idea: Use the number of stars. More stars means more
specific. When the number of stars is the same, sort
alphabetically. Neat and easy. And wrong. Sure it puts 'a*b*c'
before 'a*c', but it also puts '*a*' before 'a*',
which is incorrect.

Let’s look at globs. There are stars, and then there are normal
characters. They always form an alternating pattern: 'a*b*c'. Or
'*a*b*'. So this looks like an essential (and not accidental)
feature of globs. The number of stars and the number of
'normals'. Yes, in general, the more stars you have, the more
constrained you are, but there’s more going on.

After thinking about this for a bit, I realised that the outside stars
are special. Very special. A glob with a star at the start or end, or
both, can match a lot of things. Way more things that a glob with
stars inside other characters. What if we just define an ordering for
the outside stars. And we have one already. Form the preceding
discussion: 'a', 'a*', '*a', '*a*', '*'.

Now that 'a' can stand for anything inside the outside
stars. Other globs in fact. But those globs can’t have outside stars!
The glob '**' is the same as the glob '*'. Adjacent stars
merge into one. So '*a*', with a = '*b*' becomes
'**b**' becomes '*b*'. On the other hand with a = 'b*c', '*a*'
becomes '*b*c*'. A different beast entirely.

Now at this point you might be saying, “hold on a sec, you’re
trying to order regular expressions! That's insane! You need to
evaluate them to do that!” Because globs are just regular
expressions: 'a*' is really /a.*/ in regex land. True. But I
think that for the special case of globs built from stars only (let’s
drop the use of '?'), we can in fact define an ordering without
requiring evaluation. The problem is sufficiently complex to require
some thought, but it's not a 'hard' problem.

I would think that due to back-tracking, solving the sorting problem
for standard regular expressions is probably going to involve solving
the halting problem, since you have to evaluate each regex. But I
could be wrong. I often am.

Back to the globs. We can sort based on the outside stars. Fine. We
have five subsets to sort now: 'a', 'a*',
'*a', '*a*', '*'. '*' is a special case —
it’s always last. It's the least specific glob of all, matching
anything. The glob with no stars, 'a', is also easy. Just sort

For the other three we note that a will always take the forms
'b', 'b*b', 'b*b*b', …, and so on. That is, the essential thing is
the number of inside stars. In this case, the more stars you have, the
less you can match. For example 'a*a' will match
'azbza', 'azcza'. But 'a*b*a' will only match
'azbza'. More stars are more specific. Problem solved!

Nope. Not quite. What about globs having the same number of inside
stars? What about 'a*a' versus 'a*ab'? Which is more
specific? I think 'a*ab' is. It has more information, more
constraints on what it can match. For any given finite set of words,
there are more ways to end in 'a' alone than 'ab'. So the
next criteria is: more normal characters, more specific.

You can see the next catch, can't you? What if we have the same number
of normal characters? Alphabetic sort? No, doesn’t work – the
characters may be spread out between in the stars in different ways:
'a*ab' versus 'ab*a'. Which is more specific? Hmm. Let’s
invoke our prefix rule again. This makes 'ab*a' more specific
than 'a*ab', because it has a longer prefix. The criteria is:
whoever’s got the mostest on the leftest.

And finally, if the character spread is identical, then we go
alphabetic. So 'aa*a' precedes 'ab*a'. Let's see what we have…

The human rules are:

1. Prefixes are more specific than suffixes
2. Outside stars first: 'a', 'a*', '*a','*a*'
3. Inside stars: more stars means more specific
4. Normal characters: more characters, more to the left, is more specific
5. Otherwise go alphabetic

Here’s a sample list of globs, starting with
the most specific. See if this list matches your idea of how they
should be ordered.


This is almost right. But really I think com.ricebridge.*.CsvManager
is more specific than com.*.*Manager. How can we arrange this? Do more
stars really mean more specific? Or do normal characters provide far
more specificity?

The more normal characters you use, the more specific you are
being. Stars provide very little information. But strings of normal
characters really narrow things down. But in which set of words? What is the
total set of words that we are matching in the com.ricebridge
scenario? Infinite? Is there some structure? If these are the names of
Java classes (my target domain), then com.ricebridge.*.CsvManager is
definitely more specific (matches fewer items) than
com.*.*Manager. Seems like prefixes really do rule. I wonder is this
somehow connected to Zipf's Law

Anyway, let’s drop rules 3 and 4, and replace them with:

Inside characters: longest prefix wins

So even if you have loads of stars, if you have a short prefix, you lose. This gives:


That’s a lot better. Let’s restate the human rules:

1. Outside stars first: 'a', 'a*', '*a', '*a*'
2. Inside characters: longest prefix wins
4. Prefixes same length? more stars wins
3. Otherwise go alphabetic

That new rule number 4 means that 'a*b*c' is more specific than 'a*b'. So you only compare the minimum number of prefixes. If they're are equal, take the longest glob as more specific.

A much better ruleset! And it pretty much conforms to the criteria for the
sorting algorithm: human computable, full ordering, and reasonable

Time to go code it up. Post a comment if you want to save me from myself!

Posted in Java | Leave a comment

Berlin Drain Cover

I was in Berlin recently and while wandering around the Brandenbürger Tor I came across this drain cover. Another example of european design I suppose…

Posted in General | Leave a comment

Back To The Typing

I'm trying out all the free typing lessons on the web in the vain hope that something will stick and I'll eventually be able to touch-type. Here’s all the typing posts, including site reviews, if you're interested.

I've moved on from Now I'm with Whereas goodtyping eventually moves you to a pay–per–use model, learn2type is purely ad&ndashsupported. And that's fine. The ads don't really get in the way.

Where the sites do differ is their approach to learning. goodtyping is much easier when you're getting started. It follows the traditional model. You start with the home row and learn progressively more keys over time and the typing drills use real words and sentences (mostly). learn2type on the other hand dumps you right in at the deep end with all the keys. This is great if you are already comfortable with the basics, but I don't think an absolute beginner would find it much fun. I'm glad I started with goodtyping.

That said, learn2type is a much better site once you are beyond the beginner stage. The drills are more challenging, you get nice graphs showing your progress (love those!), and you can learn the numeric keypad as well. The site also claims that it uses a learning algorithm to figure out which keys are giving you trouble so that you can concentrate on them in the drills. Not sure how well that works (then again, I find most of the keys tricky!).

On the whole this “learning to touch–type” project is taking a while. I originally started so that I would be able to write more for this blog. My “natural” typing is pretty fast but has a very high error rate. Writing blog entries just takes too long and the constant backspacing kills my flow. I don't seem to have this problem when I'm coding, but then I use auto&ndashcomplete a lot, so I never have to type that much. Also, when coding, you're mostly thinking, not bashing on the keyboard, so reduced speed is not a big deal.

The biggest challenge is finding enough time to do the drills. I don't know about you but I'm not exactly blessed with much free time between work, clients, kids and required downtime. I get about 5–10 minutes drilling in on a weekday. Probably not enough.

The other thing that is frustrating is the transition to full–time touch–typing. That is really hard. My touch–typing is nowhere near fast enough to use on a daily basis. So I stick with old bad habits with most of my typing. I hope that is not “unwiring” my touch&ndashltyping. I'm kind of hoping that my brain will regard touch–typing as a different “language” and store it in a new brain module (apparently this is the way natural language learning works).

If you know of any good online typing resources, let me know! I intend to try them all out…

tag-gen:Technorati Tags: Tags:

Posted in General | Leave a comment

Be the Best Available

There are only two (profitable) markets. The cheapest available and the best available. Play the middle ground and you play to lose. Seth hits the mark again. Imagine how much it would have been to hire this guy before blogs? Now you can get his head for free.

OK, I know this is not exactly rocket science. High volume versus high margin. Clean and simple. Lots of people have written about it. But it's nice to get a reminder. It gives you motivation and inspiration.

When I started Ricebridge I had this in mind, if not in words. Give something a name and you give it power. Give an idea a name and you manifest it. “Be the best available”. What a brilliant way to give yourself focus.

In my own market (software components) I have really tried to live this idea. Yes, there are loads of Open Source alternatives. That's the high–volume competition. And I go for the high–end business. When you really need to get the best there is, I'm there. People need this option.

Of course, it's a lot more work to build full-service components. All that other stuff outside the code. All the additional support that comes with the product. That's really where my market is, not the code itself. I've been thinking about this truth for quite a while now. Fully embracing it means providing high-level support for my components, as standard.

Pretty soon I'm going to announce an entirely new licensing model. Prices will change, but what you get will be far more than just a piece of code. Instead of providing support as separate (expensive) products, I'm going to make support a part of the core offering. When you buy a Ricebridge component, you will get high-level support as standard. Your questions will be answered directly and quickly. No more searching through documentation or FAQs. What you buy is another team member who just happens to be a complete specialist in one part of your project.

I really think that the lack of support is what puts people off buying software components. Commercial components are usually so badly supported that people are driven to Open Source. We've all had our nightmares. At least Open Source has some sort of support structure. It may be random, but at least you have some hope. If you look at the component vendors that are well-respected, you find that they are the exception that proves the rule. Everybody raves about them.

The thing is, supported components are what is needed, not software components. There is no such thing as plug–and–play when it comes to building software. You always need a helping hand.

It's time to start playing to my strengths.

tag-gen:Technorati Tags: Tags:

Posted in Business | Leave a comment

Further Versioning Contortions

Yesterday I had a plan to use reflection to solve the compatibility versioning problem with my software components.

Basically I need to change an interface that existing customers have implemented. And I want to maintain compatibility so that no-one has to recompile anything.

I did try to anticipate this problem by providing for changes in an abstract support class. Users were advised to extend this class rather than implement the interface directly so that future changes could be hidden from them.

Well it looks like I might be able to get away with it after all. And without any reflection. Here is a concrete example of the problem, and a proposed solution.

The solution satisfies the requirement that existing code must not be changed. Existing binaries must still work, and existing code must still compile without errors. But the interface will change. Here goes…

First, here's the current situation, demonstrated in code. I've used a simple example to show the essence of the problem.

We have ColorManager class, to which Colors can be added. Colors have names and numbers. To implement a new Color you extend the ColorSupport abstract support class which in turn implements the Color interface. Instead of implementing the interface methods directly, you implement protected *Impl methods instead. These are called by ColorSupport. This insulates you from changes to Color.

The change is that the Color.getName method is to be renamed Color.getCode, and the ColorSupport.getNumberImpl method is to be made protected (it was accidentally released as public).

Anyway, here's the initial code:

public class ColorManager {

  public void addColor( Color pColor ) {
    System.out.println( "color:"+pColor.getName()
                        +","+pColor.getNumber() );

  public static final void main( String[] args ) {
    ColorManager cm = new ColorManager();

    Red red = new Red();

public interface Color {
  public int getNumber();

  // this will be changed to getCode
  public String getName();

// this is the abstract support class
public abstract class ColorSupport implements Color {
  public int getNumber() {
    return getNumberImpl();
  public String getName() {
    return getNameImpl();
  // this needs to be made protected
  public abstract int getNumberImpl();
  protected abstract String getNameImpl();

public class Red extends ColorSupport {

  public int getNumberImpl() {
    return 0;

  protected String getNameImpl() {
    return "red";

These classes mirror the current design of the LineListener and LineProvider callback interfaces in CSV Manager.

So the class Red represents a user created class. It cannot change. And neither can any existing Red.class bytecode.

The solution has to take into account the following: The old ColorSupport will be deprecated, but still supported. The next major version will use a changed ColorSupport and remove all compatibility code. This is allowed as compatibility can be changed on a major release. ColorSupport is a name we want to keep (for consistency across product lines). So if we use a new support class in the meantime, we have to insulate new users who implement the new, correct, Color methods. We must make sure that their code required no changes when we move to the next major version!

So here's the basic idea: apply the changes to Color, which breaks the old ColorSupport and Red Classes. Detach ColorSupport from Color and make it a standalone class. Add a method to ColorManager that can accept ColorSupport. This ensures that old Color implementations that extend ColorSupport still work with ColorManager.

Next, create a ColorSupportImpl class. This is the new ColorSupport. It will replace the old ColorSupport with the next major version. ColorSupportImpl extends the new Color interface directly. It works just the same as the old design. But we know that the name ColorSupportImpl is temporary and will be dropped. So we need to place an insulation class in between the concrete color classes and ColorSupportImpl. To do this we change the recommended way to implement colors. For every color, there is a specific color support class. For example, Green will extend GreenSupport which then extends ColorSupportImpl.

That still leaves one little problem. What about colors that we have not defined? What about user-defined colors? We need to specify that custom colors extend an insulation class rather than ColorSupport, as is currently the case. We'll use CustomColor. So this suggests a change to the standard policy across product lines. Custom concrete user classes extend an abstract custom class which extends an abstract support class that implements the interface in question.

Wow, that seems like a really complicated way to do something simple. In a normal environment you would never do this. You would refactor and modify client code. And for released software components you can't do this. Releasing commercial software creates an entirely different set of issues. In this case it is far far more important to support existing customers than it is to refactor to a clean design. The vendor has to accept the responsibility for maintaining compatibility for reasonable periods and between clear boundaries. You only need to take a look at the situation with plugins to Eclipse or Firefox to see how difficult this problem is. And they get it mostly right!

Here's the code for the new version. Watch out, we've got lots more classes!

public class ColorManager {

  public void addColor( Color pColor ) {
    System.out.println( "color:"+pColor.getCode()
    +","+pColor.getNumber() );

  // this keeps the old colors working
  public void addColor( ColorSupport pColorSupport ) {
    addColor( new ColorSupportFixer(pColorSupport) );

  public static final void main( String[] args ) {
    ColorManager cm = new ColorManager();

    // this is an old color
    // old custom colors will work this way as well
    Red red = new Red();

    // this is a new standard color
    Green green = new Green();

    // this is a new custom color
    Blue blue = new Blue();

public interface Color {

  public int getNumber();

  // this is the new version
  public String getCode();

// this is the same as before, but no longer
// implements Color
public abstract class ColorSupport {
  public int getNumber() {
    return getNumberImpl();
  public String getName() {
    return getNameImpl();
  public abstract int getNumberImpl();
  protected abstract String getNameImpl();

// this is unchanged - just what we want!
public class Red extends ColorSupport {

  public int getNumberImpl() {
    return 0;

  protected String getNameImpl() {
    return "red";

// this is the new verion of ColorSupport
public abstract class ColorSupportImpl 
  implements Color {

  public int getNumber() {
    return getNumberImpl();

  public String getCode() {
    return getCodeImpl();

  protected abstract int getNumberImpl();
  protected abstract String getCodeImpl();


// an insulation class, currently does nothing
public abstract class GreenSupport 
  extends ColorSupportImpl {}

// a new standard color
public class Green extends GreenSupport {

  protected int getNumberImpl() {
    return 1;

  protected String getCodeImpl() {
    return "green";


// an insulation class for custom colors
public abstract class CustomColor 
  extends ColorSupportImpl {}

// a custom color
public class Blue extends CustomColor {

  protected int getNumberImpl() {
    return 2;

  protected String getCodeImpl() {
    return "blue";

// this hooks up the old and new interfaces
public class ColorSupportFixer 
  extends ColorSupportImpl {

  private ColorSupport iColorSupport;
  public ColorSupportFixer( ColorSupport pColorSupport ) {
    iColorSupport = pColorSupport;
  protected int getNumberImpl() {
    return iColorSupport.getNumber();

  // convert getCode to getName
  protected String getCodeImpl() {
    return iColorSupport.getName();


Like I said, it's not pretty. But it does allow the API to move forward with full backwards compatibility.

The insulation classes (GreenSupport and CustomColor) are empty in the example above and will probably also be empty in the next CSV Manager release (1.2). Their purpose is to allow ColorSupportImpl to change its name in release 2.0.

And they serve another very important purpose. If in the future further changes arise that require more compatibility workarounds, they allow for the use of a reflection-based solution in ColorSupport and/or the insulation classes. Thus one layer of changes can be applied on the interface side, and one on the implementation side. This “feels” like the right solution.

Of course, some types of changes (for example, changing method access from public to protected) may not be amenable to a reflection-based solution. They may require a third layer of insulation. We'll cross that bridge when we come to it, if we cross it at all. I rely on the belief that as the API converges on an acceptable design, these types of changes will become less of a problem. Once the API has been in use for a longer period, changes become so exponentially expensive that it is better to put up with design mistakes. This is what happened with the standard Java API.

I reckon I am still able to pull this off at this stage in the life-cycle of CSV Manager. The cost will be more complex documentation until 2.0, when the compatibility code can be ditched. And the cost will be increased code complexity inside CSV Manager, which means more work for me to bug fix it all. Of course, I have a large set of unit tests so this should not be a big problem.

Well, it looks like we're set. Any final thoughts before I dive into the code?

Posted in Java | Leave a comment

More On Versioning

Brian Smith made some good points on my last post about software component versioning. They're too detailed to reply to in a comment so I'm posting a reply as a full entry.

You'll probably need to read that last post for this one to make sense. Also, I kind of have a strategy now. More on than at the end of this post.

I would just change the version number to 2.0, and then tell your customers that they can have a free upgrade even though this isn't the normal policy.

There's not enough new stuff to go to 2.0. It's not really fair to customers either. I think it has to be 1.2. I'd like 2.0 to be a bigger bang.

Unless you are taking away functionality I have a hard time understanding why you cannot maintain backward-compatibility. Compatability is something that everybody expects, and doubly so if they are paying for the product. It seems like the problem could be solved by adding a new interface instead of replacing an existing one.

When adding methods, this is the case. In fact I will be adding methods and maintaining the old ones. But if you want to change method access permissions it's basically impossible. I want to move a public method to protected.

Also, adding a new interface is not an option as I need to keep the name of the interface consistent across product lines. Bit of a catch-22 really. It's a bit of a pain when you have to manage more than one product. I'm starting to have some sympathy for Sun and the mess that is java.*.

The nice thing about backward-compatibility is that there is helps encourage people to upgrade to the latest version. If people refuse to upgrade then you end up supporting old versions longer. If a customer finds a bug in version 1.1 then they will ask you for a 1.1.1 to fix it, instead of upgrading to 2.x where it was already fixed.

Ah. Yes. But I do this in any case. All major versions will be supported for as long as possible. This is important. Part of the reason developers hate external components is because they can't be trusted and a lot of vendors just don't care if there are bugs. Well I care, and I won't let bugs stand if I can help it.

I don't really like the idea that the first component of the version number only changes when there is an incompatible change. If you maintain compatibility for five years then the version number would be something like 1.9. But, 1.9 probably would have much, much more functionality than 1.0. Yet, the numbers 1.9 and 1.0 don't seem that much different from each other (it is easy to misread it as 1 & 23/100). The result is that this scheme is counterproductive to marketing the product: the main version number increases when something bad happens.

The worst case would be when you publish an interface in, say, version 1.6.1 that for some reason has to change immediately (e.g. the way the interface was structured facilitates some kind of security problem). To fix this problem you want to release a fixed version of the API. But, now you have to call this version 2.x even though it might be a very, very small change. Maybe the jump from 1.0.1 to 1.6.1 was a huge improvement in functionality. Yet, the increase from 1.6.1 to 2.0.0 was a single bugfix, perhaps just a few characters changed in the source code. It is counter-intuitive.

This is a very strict interpretation. I would say that one can bump to 2.0 at anytime, for marketing reasons say. It would only be forced by a compatibility issue.

In any case, I think I am going to drop the minor version restriction and say that minor numbers can accept compatibility changes. This means that you can only be 100% sure of compatibility on the same major.minor revision. But that's pretty much OK.

It works okay for open source products because marketing for them is totally different (often ignored).

Very true. And there is something to learn from Open Source. Version numbers are an important way to give a quick overview of the state of the project. I like that and it's very useful, even necessary, for software components. It's much less important for applications, where the version number is just a marketing gimmick. But for my stuff the version number is, in a way, part of the customer service. It's very important that it has a crystal clear meaning.

So this all seems like a very difficult problem to solve. However I was forgetting one thing about Java: Just-In-Time compilation!

How does that help? Well, it means that reflection is not as bad as you think it is. The “common knowledge” is that reflection is slow. Well sure, the lookup is slow, but once you have the method reference the JIT compiler will optimize the calls for you. In the case of data processing where you perform the same operation many times, this effect comes into play very quickly.

So reflection is OK! I can use it! I'm going to look at trying to rewrite the support classes to handle this situation. They will introspect themselves to see if the old methods are declared. They can thus recognize the old interface and work with it. Old code should just keep working.

Yes this will make my support classes messy and more complex. Tough for me. That's the price to be paid. If you want to maintain compatibility for paying customers, as Brian noted above, you must do this. Anything else is just lazy.

Of course, this may not work completely. We'll see. But even if I can reduce the changes required, that's a good thing.

Thank You Blogosphere, and Thank You Brian!

Posted in General | Leave a comment