* rename OperatorEquals -> Assign * move Util.h and FwDecl.h to a util/ subdir
- make use of CoeffReadCost to determine when to unroll the loops, for now only in Product.h and in OperatorEquals.h performance remains the same: generally still not as good as before the big changes.
internaly uses OpenMP if enabled at compile time. * added a bench/ folder with a couple benchmarks and benchmark tools.