* comparison (<, <=, ==, !=, ...)
* selection
* nullary ops such as random or constant generation
* misc unary ops such as log(), exp(), or a user defined unaryExpr()
Cleaned up the code a little.
Added the ability to parallelize the evaluation of a tensor expression over multiple cpu cores.
Added the ability to offload the evaluation of a tensor expression to a GPU.
* Added ability to map a region of the memory to a tensor
* Added basic support for unary and binary coefficient wise expressions, such as addition or square root
* Provided an emulation layer to make it possible to compile the code with compilers (such as nvcc) that don't support cxx11.