Saturday, December 6, 2008

Pesticide paradox

Do you have a lot of automatic tests? Do you continously add new test cases and improve existing ones? No? You shouldn't sleep well... ;-)

Please be aware that the value of the same tests executed over and over again (e.g. regression tests) decreases with time. This is known as "pesticide paradox".

Following ISTQB "Certified Tester Foundation Level Syllabus":

If the same tests are repeated over and over again, eventually the same set of test cases will no longer find any new defects. To overcome this “pesticide paradox”, the test cases need to be regularly reviewed and revised, and new and different tests need to be written to exercise different parts of the software or system to potentially find more defects.

The conclusion:

It's not enough to have automatic tests suite. It's essential to improve it over the time.

Sunday, November 23, 2008

C# needs Linq to be (really) functional language

C# is a functional language. Although this is true for for all versions of the language there is very significant difference in usability between C# 3.0 and older releases.

To see some differences let's compare the same code fragment using some typical functional constructs written in both: C# 2.0 and 3.0. First let's write simple function, which inspects (converts to readable string) array content using C# 2.0.


public class Arrays
{
public static string Inspect<T>(T obj)
{
return obj != null ? obj.ToString() : "null";
}
public static string Inspect<T>(T[] array)
{
Converter<T, string> map = new Converter<T, string>(Inspect);
string[] converted = Array.ConvertAll<T, string>(array, map);
return "[" + String.Join(", ", converted) + "]";
}
}
Above example illustrates that functional programming with C# may be very hard to read, verbose and... so ugly!

Let see the difference when using C# 3.0, which introduces a lot of new features, inlcuding:

  • Lambda Expressions (you can easily define functions "in-line"),
  • Extension Methods (you can use bunch of helper methods added to existing types or you can add your own methods to any type, including System types),
  • Type Inference For Generics (in most cases you don't have to write types in <> brackets when invoking generic method),
  • Implicitly Typed Local Variables (you don't have to declare types for local variables),
  • Query Expressions (the most valuable and unique? feature introduced with LINQ, you can write select-from-where queries directly in C# language for different data sources inlcuding relational databases and XML files).

The same function now written in C# 3.0 is... different ;-)

using System.Linq;

public class Arrays
{
public static string Inspect<T>(T[] array)
{
var converted = array.Select(obj => obj != null ? obj.ToString() : "null").ToArray();
return "[" + String.Join(", ", converted) + "]";
}
}
Let's see in details two key code lines...

using System.Linq;
Above line is crucial for the example, because without using Linq we wouldn't be able to use bunch of useful extensions methods provided by Linq for IEnumerable interface (in our case method Select wouldn't be visible to the compiler). Here is one "trick" I wasted some time to find out and make this working. When, after adding "using System.Linq", you see following error:

The type or namespace name 'Linq' does not exist in the
namespace 'System' (are you missing an assembly reference?)

then this is most probably caused by missing reference to assembly containing Linq. This is quite obvious(?), but the name of asssembly containing Linq is not. It is System.Core.

The second line although not very pretty looks quite innocent, doesn't it?

var converted = array.Select(obj => obj != null ? obj.ToString() : "null").ToArray();
Let's see key (new) elements related to C# 3.0:
  1. We don't have to declare type for local variables, so in our example we can use

    var converted =
    instead of

    string[] converted =
  2. We can use extension methods for IEnumerable interface provided by Linq directly on IEnumerable instance. In our case we can use

    array.Select
    instead of

    Array.ConvertAll
  3. We can use type inference for invoking generic methods, so we don't have to write types in <> brackets. We can simply type:

    Select( ToArray(
    instead of

    Select<T, string>( ToArray<string>(
  4. We can use lambda expressions instead of declaring separate methods or delegates.

    obj => obj != null ? obj.ToString() : "null"
    instead of

    public static string Inspect<T>(T obj)
    {
    return obj != null ? obj.ToString() : "null";
    }
Unfortunatelly there are things we cannot do, although we would expect to be able to do them. One of the examples is, that we cannot assign lambda expression to "untyped" variable. In our example we cannot write:

var converter = obj => obj != null ? obj.ToString() : "null";
var converted = array.Select(converter).ToArray();
to compile, we would have to change this into

Func<T, string> converter = obj => obj != null ? obj.ToString() : "null";
var converted = array.Select(converter).ToArray();
that's why it's better to put this lambda expression inline, even for the cost of long and harder to read line.

As we can see C# 3.0 gives us (especially when using Linq) powerfull language features, that make the language more friendly for functional programming. Although new features are not perfect, they are indeed useful.

Tuesday, April 29, 2008

Books about Software Engineering I recommend

My favorite book about Agile:
Agility and Discipline Made Easy by Perr Kroll and Bruce MacIsaac (no polish edition)

Another great book about Agile:
Lean Software Development by Mary and Tom Poppendiecks (no polish edition)

Introduction to XP (short and nice, a lot of good practices):
Extreme Programming Explained by Kent Beck and Cynthia Andres (polish edition)

Very good book about one of the most important "best practices":
Continuous Integration by Paul Duvall (no polish edition)

You cannot do "real" refactoring without reading it first:
Refactoring by Marting Fowler (polish edition)

Two not perfect, but very valuable books about data:
First Course in Database Systems by Jeffrey Ullman and Jenninfer Widom (polish edition)
Database Systems by Thomas Connolly and Carolyn Begg (polish edition)

Very nice book about combining business and technical point of view:
Beyond Software Architecture by Luke Hohmann (polish edition)

Two boring, but very valuable (necessary!) books providing an overview on Software Engineering:
Software Engineering by Roger Pressman (polish edition)
Software Engineering by Ian Sommerville (polish edition)

Want to borrow in Krakow? Don't hesitate to contact me at AdamCzepil@gmail.com.

Friday, March 7, 2008

Pair Programming in Practice

Is pair programming a viable method?

Yes. This is a viable method. In our small (8 people) company we are using pair programming almost every day. But... it is not so easy as I thought at the beginning.

1. I think it is almost impossible to work in pair whole day (I cannot explain this very precisely, but when I work in pair I also need some loneliness from time to time, just to take a breath). For me 6h is max.
2. Working in pair is very tiring. After whole day of pair programming I am usually not in the mood for parties ;-) I just want to go to sleep.
3. Not always two people make good pair for particular task. It can be a problem for me to say "no" to colleague from the team if I don't feel we are right pair for the task.
4. I need several days to get used to new colleague. Usually, at the beginning, mostly one person writes the code and we switch rarely. We need some time to learn how to switch roles often and in the way, that we are "equal" parts of pair (i.e. 50% of time coding, 50% of time helping).
5. Some tasks I prefer to do alone. This includes very simple, but arduous tasks (it's faster then), but also tasks I know I am the best expert in the team (then I have more fun because I don't have to answer all those questions :)
6. It is a problem for me to find balance between being a teacher and student. In theory in the second case above I should work in pair with my colleagues to teach them ("exchange knowledge"). And I do. My colleagues also do this with me. But not always, it's very hard to be a teacher or even a student all the time. I think we usually avoid situations "teacher-student". They are rather exception, than a rule. But we keep in mind, that... "today you teach me A, tomorrow I'll teach you B".

Some benefits:

1. Knowlegde exchange is unbelievable. Even though we don't work in pairs all the time. Even though we don't like teach/learn all the time.
2. I believe hard/complex tasks are done faster in pair vs. single.
3. I believe, that the code we produce in pairs is better, especially in terms of maintainability.
4. Two heads sometimes make up amazing, surprisingly good solutions (creativity!)
5. I like it, I learn a lot from other guys (and I hope vice-versa :)

Some problems I know:

1. It is very hard to prove, that working in pairs is better than single work. There is several research papers about it, bit the conclusions are contradictory, or at least - "fuzzy".
2. I think that in usual case (task not very complex) this is simply not true, that pair programming is more effective. I belive, that is better from other reasons (better code, knowledge sharing and fun).

In our city several compannies claims to use pair programming in some projects. This includes Sabre (which is Agile-based, so nothing surprising) and... Motorola! AFAIK it is still very, very rare practice.

Sunday, February 24, 2008

Deep equality in modern languages

I was recently creating with Ruby simple Extract-Transform-Load tool for importing data set from XML file to our databse. At the beginning it seemed very easy, although painful task. The first problem I was trying to solve was testing. Nothing interesting, I thought. And I was almost right. Almost...

Testing algorithm was very simple:

1. expected = predefined data set (so called test fixture)
2. import data from XML file to DB
3. actual = load imported data from DB
4. assert_equal expected, actual, "all data should be equal"

The problem was in assert_equal, because "default" equal was not I wanted to be. Default equal is shallow, i.e. compares only objects itself without its dependencies. I needed deep equality.

Example 1

title_a = ['Some title', author_a]
author_a = ['Gauss', 'C']



In this example title and author are in relation "has a" - title_a has author_a. This causes "standard" equal to not work correctly. Let for example

author_b = ['Gauss', 'C'].

Authors a and b are equal, but have different identifiers (identities, references), thus if we define

title_b = ['Some title', author_b]

we'll get title_a not equal title_b. To obtain expected equality result I must redefine default equality operator. So far, so obvious...

The problem was, that I had to redefine this operator in all data model classes. Ordinary thing, I thought. I was always doing this way. This is very simple, I have to choose which "fields" compare directly and which "through references". Very simple. And stupid. And error prone. And time consuming. And very hard to maintain. Why, the hell, languages I am working with do not provide such obvious functionality!? Maybe I missed something? Maybe everyone knows how to do this very easily, except me?

I did small research and it appeared that such functionality is implemented in... Eiffel, but only partailly [1]. Java, C++, Ruby and even Smalltalk don't have such language feature.

Maybe it is hard? I thought. My first idea was to compare two directed graphs created from object instances and (some*) references between them. The problem is well known, it is called graph isomorphism. Quickly it appeared, that graph isomorphism (GI) problem has no polynomial solution. Curiously, it is believed that GI is neither P nor NP-complete... There is special complexity class, called GI-complete, for this problem and all problems with polynomial-time Turing reduction. It is belived, although obviously (P=NP?) not proven, that:

P-complete < GI-complete < NP-complete

Anyway it seemed, that although solution is possible, it may be not efficient...

* - more about this in my next post

My second idea was that, comparing to GI problem, we have additional information - "root" vertices we are starting comparison from, and thus the solution may be more efficient than for GI problem. After some googling I found I was right. I found very interesting paper [2] which, besides formal point of view on deep equality, defines polynomial algorithm for the problem, even in the case with circular references! Additionally, algorithm looks quite simple to implement.

My third idea was, that the solution may be even simpler if we assume no circular references. I don't know if this is true, but I would like to check this, because my...

Fourth, final idea, was to implement deep equality solution for Ruby. But before I'll do this I will write next post entitled "Deep equality and Aggregation" i.a. describing how to declare which relations should be taken into consideration by deep_equal operation.

References:

1. Copying and Comparing: Problems and Solutions, P. Grogono &
    M. Sakkinen, 2000
2. Deep Equality Revisited, S. Abiteboul & J.V. den Bussche, 1995