Thursday, July 03, 2008

Selector Function Object (Part 2)

This post is a long time waiting, and surprisingly the TR1/Boost.Fusion supporting implementation basically wraps whatever Joel De Guzman has already implemented in Fusion (more precisely, making a function object that wraps a call to at_c<...>). Someone has informed me that the Boost.Fusion selectors now have function object equivalents, so the problem being addressed by this implementation is somewhat already solved.

Now I'm really looking forward to seeing more of the functional programming approach trickling more into C++. Looking at functional programming languages like Haskell, Lisp, and Erlang, I'm slowly getting more and more convinced that functional programming will play a major role in the future of programming -- and that what we have right now are barely scratching the surface of the scale of problems we're able to solve using functional programming, high performance parallel computing, and idioms translated into libraries/frameworks in existing languages.

If you're still interested in the implementation I have, please send me an email so that I can share the implementation with you.

Monday, April 21, 2008

Selector Function Object

Over the downtime this past week I've been dabbling again at writing a generic enough function object that I plan to bring into the Boost C++ Library collection. I found out that programming every day with C++, I still find some relatively simple things really hard to do even with the STL. One of these simple(r) problems is selecting the first or second element of a pair contained in STL containers.

The problem that recurs pretty much in every program that uses STL maps/multi-maps is to be able to first select the first or second element from every pair in the container, and the second to be able to apply a function only on the first or second element to every pair in the container. There's already an SGI extension to the STL called select1st and select2nd but these extensions require you to give the type of the pair it will take upon instantiation. This isn't very generic and tends to be a very rigid coupling between the solution that is selecting the element and the type(s) of the elements contained in a pair.

Rudimentary Solution

The first solution I came up with is a simple select object. This first solution is really simple -- it's a function object which determines the result of getting the first/second element of a pair, and then takes a pair as an argument to return the correct element chosen. It looks a little like this:



namespace helper {
template <int N, class T0, class T1>
struct result;

template <class T0, class T1>
struct result<0, T0, T1> {
typedef T0 type;
};

template <class T0, class T1>
struct result<1, T0, T1> {
typedef T1 type;
};
};

template <int N>
struct select;

template <>
struct select<0> {

template <class T0, class T1>
typename helper::result<0, T0, T1>::type
operator() (std::pair<T0, T1> const & input)
const {
return input.first;
};
};

template <>
struct select<1> {
template <class T0, class T1>
typename helper::result<1, T0, T1>::type
operator() (std::pair<T0, T1> const & input)
const {
return input.second;
};
};



Whew! That was a handful. So if you notice, there are a few problems with this approach, and let me go into detail into some of them.


  • Repetitive code with full specialization of select. It is obvious that the repetitive nature of the code makes it really brittle and "ugly". However, technically we've captured the required functionality if only for the simpler cases.

  • Reference and const awareness in the result type computer. Advanced C++ developers will see that the implementation of the result computer will break in cases where the pair contains references or non-copyable types. This is because we return copies of the element instead of references -- in certain cases as well, this is a performance problem especially if we're copying huge objects around.

  • We can only select elements, not immediately perform operations on them. This is because this implementation pretty much makes it impossible to chain this with the other STL function objects (binder comes to mind, among others).

  • Reliance on STL pair's. Obviously, if we used TR1 or Boost (Fusion) Tuples, we wouldn't be able to extend this readily -- unless we're prepared to write out all the specializations of the select function object from select<3> upwards.[0]



Let me then try to address these issues in the next sections.

Take 2: Remove Explicit Repitition

Our first bullet item talks about repetitive code, and is one of the relatively easier things we can deal with. In this approach, we will require three things: a wrapper type to return a convertible object (convertible to the correct type), a generator function which will determine the types for us at compile time, and a "generalized" element getter. So in this step, we add another layer of indirection we can deal with.

The implementation would then include the following additions/changes:


namespace helper {
template <int N, class T0, class T1>
struct get_impl;

template <class T0, class T1>
struct get_impl<0, T0, T1> {
typedef typename
result<0, T0, T1>::type
result_type;

explicit get_impl(
std::pair<T0, T1> const & pair
) : _pair(pair) { };

operator result_type const & () {
return _pair.first;
};

private:
std::pair<T0, T1> const &
_pair;
};

template <class T0, class T1>
struct get_impl<1, T0, T1> {
typedef typename
result<1, T0, T1>::type
result_type;

explicit get_impl(
std::pair<T0, T1> const & pair
) : _pair(pair) { };

operator result_type const & () {
return _pair.second;
};

private:
std::pair<T0, T1> const &
_pair;
};

// generator function
template <int N, class T0, class T1>
get_impl<N, T0, T1>
get(std::pair<T0, T1> const & pair) {
return get_impl<N, T0, T1>(pair);
};
};

template <int N>
struct select {
template <class T0, class T1>
typename result<N, T0, T1>::type const &
operator() (std::pair<T0, T1> const & pair)
const {
return get<N>(pair);
};
};


The above looks like more code and it is, but it actually allows us to scale our implementation of the act of getting the 'Nth' element [1]. But let me go through the approach we've taken here.

If you notice, helper::get_impl<...> is convertible to result_type const &. This means when we return an object of type helper::get_impl<0, int, double>, it can be converted to an int that is encapsulated in the pair passed in the constructor of the wrapper object. This approach allows us to make the implementation of template <int N> struct select { ... } simpler.

The next problem we need to deal with is the issue of reference and const-awareness.

Take 3: Handling References and Const Awareness

So now we'll have problems if we had pairs that had references included instead of just plain types. It is not impossible to have std::pair's passed to instances of the selector -- even though highly unlikely to be part of containers. So we'll deal with references, so that we have a proper solution.

To deal with this, we need to check if the type we're going to return in helper::result<...>::type will be the correct reference. So we need to remove the references from T0 or T1, in the typedef. We can do this with Boost.Type_Traits:


namespace helper {
template <class T0, class T1>
struct result<0, T0, T1> {
typedef typename
boost::remove_reference<T0>::type
type;
};

template <class T0, class T1>
struct result<1, T0, T1> {
typedef typename
boost::remove_reference<T1>::type
type;
};
};


So it's getting more and more complex as we go, and admittedly the solution to this seemingly simple problem touches some of the subtleties of the genericity we achieve with C++ and the STL.

To Be Continued...

In the next article in this line, I will dive a little deeper into the problems we're trying to avoid and real-life use cases for this simple utility. For now, you can imagine a simple solution involving an STL algorithm and the select<#>. You can even try it for yourself by getting the latest version of this implementation from the Boost C++ Developers Mailing List (or if you'd like to get it privately, send me an email).

Basically, the idea I want to be able to convey is that in C++, with great power comes great flexibility -- and this flexibility borders between being really generic and being very useful. What this flexibility and power gives us C++ developers are the tools we can use and manipulate to be able to make simple things really simple -- even though implemented in a not so simple manner.

If you haven't been diving deeper into C++'s features to make your solutions more generic and more usable outside the context of your application, this may be a good time to be able to solve bigger problems that not only you are encountering, but also what others are solving. You never know what simple problems you'd run into and what kinds of challenges you'll be facing. Even in this simple case of implementing a generic selector, we're already running into issues you might not be facing in your everyday programming -- but it allows us to get more acquainted with the very powerful C++ features already available to you at your disposal.

Notes & References

[0] - Surprisingly enough, the solution underneath any tuple supporting approach pre-C++0x will require having to use Preprocessor Programming and to some extent code generation/repitition, albeit in a "managed" manner.

[1] - I'm still currently testing the implementation of the extension to supporting TR1/Boost.Fusion tuple selection. The next article should follow soon with more details as to what's involved in making that implementaiton possible.

Thursday, April 17, 2008

Lisp'isms in C++

In the previous post, someone commented about the lack of exposition on how my re-learning Lisp has directly affected the C++ I write. It is true that functional programming per se is not a Lisp only domain, and it would be unfair to attribute domain decomposition and bottom up design to learning Lisp. So in this post I then expound on the "Lisp'isms" in the C++ code I write on a daily basis now.

One of the things that Lisp has are macros that not every other language out there has. Admittedly, in C++ there's nothing close to that -- until the advent of template metaprogramming. So let me try and do justice (with the very limited Lisp knowledge I still have about macro's) and give an example.

Generating Function Objects

In Lisp (Common Lisp, for those language lawyers out there) you have the capability to define macros that take an argument (or set of arguments in the form of a list or lists) and return generated code. This is built into the language and is nothing really new in the Lisp world.

There's nothing close to this in C++, but you can emulate this with template metaprogramming. The trick is to use the available libraries to make generating code at compile time simpler. There's the Boost.MPL and Boost.Fusion libraries that allow you to exploit C++'s template system to generate types and eventually function objects you can use throughout your code.

An example of these applications can be seen in Boost.Proto where you're able to implement your domain specific embedded language using, well, a domain specific embedded language. You can call it a meta domain specific embedded language hosted in C++. Essentially, like in Lisp where you use Lisp to extend Lisp, Boost.Proto allows you to extend C++ using -- well, C++. This post is too short and limited to discuss the details of Boost.Proto in depth (which is something that might be worth doing some other time). [0]

That being said though, you'll still need to create the new DSEL within the rules of C++ -- unlike in Lisp where your macro's can take in even illegal Lisp forms.

Lambda

In another previous post I wrote a little bit about Lambda, and how it will change C++. Well actually, we don't need to wait for C++0x to be able to approximate Lambda support in C++. All we'd need to do is right now use existing libraries like Boost.Lambda, Boost.Bind, and Boost.Phoenix (in Boost.Spirit) to create *almost* nameless function objects.

The idea behind all the above libraries is to leverage C++'s template system to compose functions out of already defined primitive function object types. Using expression templates, these function objects are essentially created at compile time and converted to machine code by the compiler. These function objects are usually -- but not exclusively -- used in conjunction with the STL's algorithms that take function objects applied to a range of iterators.

In Lisp, lambda is the way of creating functions -- nameless functions that is -- and can be bound to symbols which can then in turn be referred to to refer to functions. In C++, we (almost) have this capability by using function object wrappers -- we have the TR1 function object wrapper, or Boost.Function which is an implementation of the same TR1 function object wrapper only in Boost.

CLOS and Boost.Fusion's Struct Adapter

In Lisp, you have the Common Lisp Object System, where you were able to essentially bring in Object Oriented design to influence the implementation of certain concepts in the solution you're writing. Until recently, there was no easy way in C++ to be able to create structs or types that you can inspect even at compile time. I say until recently, but because now you can do this with Boost.Fusion's Struct Adapter.

The CLOS is a collection of macros (in the Lisp sense) and functions that allows you to represent classes in Lisp. Boost.Fusion's Struct Adapter is a set of preprocessor macro's (in the C++ sense) that allows you to generate a type that looks and feels like a normal C++ struct when being used, but in reality is a Boost.Fusion sequence. This is a very important development because this means we can now start writing generic functions that deals with Fusion sequences at compile time *and* runtime.

Not only that, but now that means we can leverage other CLOS goodness available in Lisp in C++ as well. This I'll have to write about in a later post.

Multimethods

In Lisp, since functions are generic and are external to objects and dispatched through pattern/type matching, you get for free the efficient implementation of the multimethod solution. This post is too limited to discuss the multimethod problem in object orientation, so I won't go into it in detail. [1]

Essentially, with Boost.Fusion's Struct Adapter, we can now perform some magic by following "the Lisp way" of making free functions that dispatch on matching two or more types of the parameters. Here's an example implementation using first, pure C++ templates:

template <class T1, class T2>
static void draw(T1 const &, T2 const &);

template <>
static void draw<rectangle,absolute_canvas>(rectangle const & object, absolute_canvas const & canvas) { ... };

template <>
static void draw<circle, absolute_canvas>(circle const & object, absolute_canvas const & canvas) { ... };

There are a number of problems with the above approach, which involves ambiguities and the combinatorial explosion of the number of implementations. There is a cleaner way of doing this with template metaprogramming -- but I will have to write another post about that some day soon.

Summary

So how does knowing Lisp change how you think and program? At least programming in C++, you get introduced to concepts like functional programming and template metaprogramming that are very powerful enablers for making bottom-up design and top-down decomposition a reality. It doesn't stop with the concepts though, you get to reflect the concepts in code that directly translates to solutions whatever the particular domain you're working in.

Yes, C++ will never be a functional programming language. But just like Lisp, it supports multiple programming paradigms -- object oriented, imperative, functional, generic, and sometimes declarative. The parallels exist and there is a case to be made to put C++ and Lisp in the same category as definitely extensible and very powerful programming language.

I agree with the great Lisp writers and developers out there -- if only to change the way you think about programming, you should learn Lisp. But as with any performance critical project, nothing beats machine code -- and the best solution I personally will still choose C++ when developing high performance and highly scalable systems overy any other language out there.

But pound for pound, I'll take Lisp for anything remotely complex for quick prototyping and deployment. For performance criticial stuff (like the projects I'm working with), I will choose C++ in a heartbeat.

References

[0] - Boost.Proto documentation
http://boost-sandbox.sourceforge.net/libs/proto/doc/html/index.html

[1] - Multiple Dispatch or Multimethods
http://en.wikipedia.org/wiki/Multiple_dispatch

Tuesday, April 15, 2008

Thinking Functional

I think I now know why LISP will change the way you think. There is a certain appeal to "simplicity" when you think in functional terms -- where you have functions applied to data to produce data, as opposed to thinking in terms of operations per se. This breaks the 'assembly' way of thinking where you look at the details of how a certain operation affects the state of the whole system. Let me cite a few examples of concrete events while programming where I find the functional programming paradigm has crept into my psyche for solving problems.

Remember the 'for' loop? It seemed like the most powerful thing that was added to higher level programming languages at a time when assembly programming was the only thing available. It seemed so cool to be able to iterate through elements of a collection or a range of values with a simple construct. 'While' had its appeal, but it's too close to the 'LOOP' instruction avilable in assembly language. What seemed so cool then now seems so lacking when you think in terms of functional programming.

The 'for' loop was fine until you didn't know the range from which to iterate over. In C++, sure you can use iterators and use these for checking the ranges -- and if you've tried this before, you'll find out that it gets really unweildy when you do it by hand. And then there's the issue of not being able to filter "automagically", but rather you'd have to put in an (ugly) "if" inside the 'for' loop to filter only on values that you want to operate on.

So the STL fortunately provides us with facilities for functional/generic programming. One facility is 'for_each', which applies a function (object) on each element of the given range. This is an improvement of the the bare 'for' loop, but is still a little lacking -- we are not able to "automatigically" store the result of the function into another container/destination. Plus, it also doesn't allow us to filter "automagically" elements from the range.

A variation on 'for_each' is the 'transform' algorithm which 1) applies a function (object) on each element of the given range and 2) stores the result of the function into another container/destination using an output iterator. We're getting somewhere here, we already have a mechanism for which we're able to apply a function over a range and get the results stored through an output iterator. However, we still don't have a way to "automagically" filter elements from the range.

In the Boost C++ Library, we then have the 'filter_iterator' concept -- a proposed new style iterator which will create an iterator that referred to elements that satisfy a given filtering function (object). Then this allows us to define a filtering function, and create a range from the normal iterators we can get from STL containers which "automagically" filters for us.

Notice what we've done in that given exercise: 1) we defined a separate filtering function 2) we defined an operation to be applied to data and 3) we composed a higher level application from building blocks from the STL. If we thought in terms of objects, we'd have a completely different (and with my bias, uglier) solution.

In the above exercise we were thinking in terms of functions being applied to data, to generate the data we want to end up with. We can then layer each function application on top of each other to come up with a higher level transformation of the data from one form to serveral different forms. The process of finding the functional elements of a higher level operation is usually called 'decomposition' -- allowing us to decompose a higher level concept into smaller concepts we can refer to as 'primitives'. In mathematics, the primitives are usually drawn as operations which can no longer be broken down to smaller operations (addition, subtraction, identity operation, etc.).

So what will thinking functionally do to your application design? Just as in LISP, when we know already the primitives of the domain from which we are working on, we are then able to create functions that reflect these primitives in (usually) simple functions we can apply on data. Then just like bottom up development, we are able to define other operations based on the primitives we've decomposed and then compose these other operations.

It's really surprising that thinking functionally can lead to simpler solutions and simpler designs by merely taking the time to look at the big picture, define the primitive operations, and then composing other operations from these primitives. It's already worked for me in numerous circumstances -- although admittedly, coming up with the primitives does take a lot of effort to test, design, and implement. Once the primitives have been defined though, writing other operations defined from the primitives becomes a lot easier and solving problems takes less effort.

Now that I've been immersing myself in functional programming thought, I notice myself thinking first before I write that next 'for' loop and first determine if there's a better way using functions and the STL algorithms. More often than not, with the STL algorithms I get a cleaner more robust solution than if I coded it by hand with a bare 'for' loop.

I personally think is a good thing. I'm happy I've been re-acquianted with functional programming and LISP.

NOTE: I have been experimenting with Boost.Spirit's Phoenix framework and parallel computing. I'm currenly interested in writing parallel algorithms that automagically 'packaged' tasks and executed these in thread pools. Maybe there's already something in Intel's TBB but I'm looking at other approaches that incorporated functional programming from the start. For example, a parallel transform, or a parallel reduce. Drop me a line if you're interested.

Monday, April 07, 2008

Why Lambda will Change C++

Of all the new features coming into C++'s next version of the language standard, I think definitely one of the things which will make the single biggest change will be the addition of support for lambda. Before I go into that, let me give some context into why I've gone into thinking about this a little, and why I've chosen to blog about it.

It's a long weekend here in the Philippines and I thought to myself (in between learning Python and polishing my C++ skills) I should take a look at functional programming and immerse myself in the feel of thinking more in terms of functions. So I picked up a college textbook I used to read a lot and turned the pages towards functional programming languages. The first pure functional programming language that I've been able to learn and use to some extent is LISP.

So I tried to be ambitious and use the HTTP client that was available as an open source library. I quickly found out that I don't have enough LISP voodoo to pull that off so I said let me get back to the basics first.

After (re)learning how to use 'mapcar' and revisiting my recursive programming archives, I got back into the groove.

However, one thing I noticed is that working in a functional programming language's environment and context changed the way I looked at problem solving: instead of working at the level of types and classes and design, I was looking at functional solutions -- contexts, function applications. I wasn't thinking about concurrency, storage, efficiency -- what mattered to me was that when I implemented a factorial function, it would be alright to get the factorial of 100 and not worry about efficiency.

It was changing the way I approached problem solving by going at a higher level than I was used to in my day-to-day programming. This allowed me to think of the problem at an abstract level and solve it in that level instead of worrying about the lower level details -- like memory allocation, locality, and concurrency concerns.

Then reality kicks in, and I realize that the problems I'm solving are nowhere near the domain of 'abstract computation'. Although admittedly, the higher level problems I face I would rather solve in a higher level language like LISP/Scheme/Haskell -- but crossing that language barrier means more moving parts and more problems in the long run.

So back to the context of the post: if there was a way to be able to construct C++ programs in a functional manner -- or support functional programming more closely in the language -- then somehow I might just be able to approach the higher level problems I've been tackling in a more familiar language, and also in a familiar way of computing. Thinking in terms of functions instead of strictly types and the interaction between "objects" would greatly abstract the problems and therefore the solutions being created.

One thing that surprised me with my experiments with LISP recently is the ease of defining 'nameless functions'. Of course in C++ there are libraries that allows you to do these -- Boost.Lambda, Boost.Bind, and Phoenix (part of the Boost.Spirit parser framework). The STL algorithms in C++ also allowed you to think in functional programming terms, using terms like 'transform', 'for_each', among others. But there was something missing.

In LISP, you were able to do (mapcar lambda(x) (* x x) '(1 2 3)) and get a list (1 4 9). In C++, unless you used one of the earlier mentioned libraries, you'd have to write your own simple functor which returned the square of a number. In C++, you'd have to worry about types -- for example, would you return a long when you took in an int to account for the possibility of overflow, or a double, or a special type of integer. To be fair about it, I'm not exactly comparing apples to apples. But the point of it is that the above one-liner was practically impossible to turn into a one-liner in C++... until C++0x where you were able to do lambda's.

The promise of lambda support (and new style iterator support) in C++ is the ability to do transform(container, [](auto x)->decltype(x) {return x * x;}, back_inserter(result_container)) -- which transformed each element of container by applying the lambda function [](x)->decltype(x) {return x*x;} and then storing the result into the 'result_container'.

This then allows C++ programmers to think in functional programming terms and think about how functions will be applied to data instead of how to create loops that performed operations in a sequential manner. Consequently, this allows for simple(r) parallelization of programs by the compiler or by supporting facilities (libraries, frameworks, etc.). For example, compilers in the future can choose to determine if the lambda expression are re-entrant and can be parallelized automatically (Intel compilers are already leveraging the Streamed SIMD Extensions in their compilers). Even better would be the use of parallel-aware containers and STL implementations which automatically distribute load among worker threads under the hood (transform might do runtime analysis on the size of the container and automatically map the computations to processing threads and then collect the information in a final collection step).

The possibilities that lambda opens up for C++ are endless -- consider for example being able to expressively and consisely define operations on data to be performed on multiple machines. One day perhaps the functional programming roots of the Google MapReduce system would take on a whole new level -- with open source frameworks in C++ which allowed programmers to define bite-sized chunks of C++ to be the mappers and reducers. One interesting thought is the possibility of packaging these lambda expressions into network-streamable nuggets self-contained to be executed and distributed even on global scale computers (think SETI@home and global scale utility computing).

All this and more can be made possible just by adding better support for functional programming in C++. Couple this with the very powerful object oriented programming support, generic programming support of C++, and efficient very high performance machine code, and you've got yourself a language fit to handle the programming/engineering problems of the future. Suddenly, C++ turns into a must-learn language to be able to leverage the shift from sequential computing and object oriented programming towards parallel computing and functional programming. If there's ever a good time to be learning/using C++, now is definitely the time.

Definitely exciting times ahead.