eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-15 17:41:30 +08:00

Author	SHA1	Message	Date
ChipKerchner	13d7658c5d	Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl). (cherry picked from commit 413bc491f1721afdb9802553b13a5b7aba67ed3b)	2021-08-10 20:40:54 +00:00
Gauri Deshpande	93bff85a42	remove denormal flushing in fp32tobf16 for avx & avx512 (cherry picked from commit e6a5a594a7f3cbe2f9843d4ef57a10d478cbb818)	2021-08-09 22:15:42 +00:00
Rasmus Munk Larsen	4e0357c6dd	Avoid memory allocation in tridiagonalization_inplace_selector::run. (cherry picked from commit a5a7faeb455efd7f6edb1138eda2e37546039b7d)	2021-08-06 21:48:00 +00:00
Antonio Sanchez	5b83d3c4bc	Make inverse 3x3 faster and avoid gcc bug. There seems to be a gcc 4.7 bug that incorrectly flags the current 3x3 inverse as using uninitialized memory. I'm pretty sure it's a false positive, but it's hard to trigger. The same warning does not trigger with clang or later compiler versions. In trying to find a work-around, this implementation turns out to be faster anyways for static-sized matrices. ``` name old cpu/op new cpu/op delta BM_Inverse3x3<DynamicMatrix3T<float>> 423ns ± 2% 433ns ± 3% +2.32% (p=0.000 n=98+96) BM_Inverse3x3<DynamicMatrix3T<double>> 425ns ± 2% 427ns ± 3% +0.48% (p=0.003 n=99+96) BM_Inverse3x3<StaticMatrix3T<float>> 7.10ns ± 2% 0.80ns ± 1% -88.67% (p=0.000 n=114+112) BM_Inverse3x3<StaticMatrix3T<double>> 7.45ns ± 2% 1.34ns ± 1% -82.01% (p=0.000 n=105+111) BM_AliasedInverse3x3<DynamicMatrix3T<float>> 409ns ± 3% 419ns ± 3% +2.40% (p=0.000 n=100+98) BM_AliasedInverse3x3<DynamicMatrix3T<double>> 414ns ± 3% 413ns ± 2% ~ (p=0.322 n=98+98) BM_AliasedInverse3x3<StaticMatrix3T<float>> 7.57ns ± 1% 0.80ns ± 1% -89.37% (p=0.000 n=111+114) BM_AliasedInverse3x3<StaticMatrix3T<double>> 9.09ns ± 1% 2.58ns ±41% -71.60% (p=0.000 n=113+116) ``` (cherry picked from commit 5ad8b9bfe2bf75620bc89467c5cc051fc2a597df)	2021-08-04 22:06:52 +00:00
Antonio Sanchez	237c59a2aa	Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset. The `memset` function and bitwise manipulation only apply to POD types that do not require initialization, otherwise resulting in UB. We currently violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and bitwise operations are applied byte-by-byte in the generic implementations. This is causing issues for scalar types that do require initialization or that contain non-POD info such as pointers (#2201). We either break them, or force specializations of these functions for custom scalars, even if they are not vectorized. Here we modify these functions for scalars only - instead using only scalar operations: - `pzero`: `Scalar(0)` for all scalars. - `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars. - `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars - `pand`, `por`, `pxor`, `pnot`: use operators `&`, `\|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise. For non-scalar types, the original implementations are used to maintain compatibility and minimize the number of changes. Fixes #2201. (cherry picked from commit 3d98a6ef5ce0ba85acaee4ffffc53f0f21bd8fd2)	2021-08-03 16:32:59 +00:00
Antonio Sanchez	3dc42eeaec	Enable equality comparisons on GPU. Since `std::equal_to::operator()` is not a device function, it fails on GPU. On my device, I seem to get a silent crash in the kernel (no reported error, but the kernel does not complete). Replacing this with a portable version enables comparisons on device. Addresses #2292 - would need to be cherry-picked. The 3.3 branch also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get fully working. (cherry picked from commit 7880f10526a11dc5544426c54c5763de576bf285)	2021-08-03 16:15:44 +00:00
hyunggi-sv	7adc1545b4	fix:typo in dox (has->have) (cherry picked from commit 02a0e79c701da7aa8dfad79b13cd1e7fae46d634)	2021-08-03 00:54:41 +00:00
Antonio Sanchez	c0c7b695cd	Fix assignment operator issue for latest MSVC+NVCC. Details are scattered across #920, #1000, #1324, #2291. Summary: some MSVC versions have a bug that requires omitting explicit `operator=` definitions (leads to duplicate definition errors), and some MSVC versions require adding explicit `operator=` definitions (otherwise implicitly deleted errors). This mess tries to cover all the cases encountered. Fixes #2291. (cherry picked from commit 9816fe59b47dc4c07967b5ee93a8e8aaa6e9c308)	2021-08-03 00:52:21 +00:00
arthurfeeney	9c90d5d832	Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector. (cherry picked from commit a77638387dd1aa2d07d2dae240cc30b303b4ef38)	2021-07-22 18:01:55 +00:00
Antonio Sanchez	5d37114fc0	Fix explicit default cache size typo. (cherry picked from commit 297f0f563d916260665d7fadc017f94f1a5e7a03)	2021-07-20 18:42:25 +00:00
Rohit Santhanam	930696fc53	Enable extract et. al. for HIP GPU. (cherry picked from commit beea14a18f76817439b4d8901d29db2e9c4a24c8)	2021-07-09 16:14:19 +00:00
Rasmus Munk Larsen	56966fd2e6	Defer to std::fill_n when filling a dense object with a constant value. (cherry picked from commit 0c361c4899c9042d2b25cd60d7826ab464caacb7)	2021-07-09 03:59:56 +00:00
Guoqiang QI	69ec4907da	Make a copy of input matrix when try to do the inverse in place, this fixes #2285 . (cherry picked from commit 4bcd42c271761dc5341f8e08ca7d357c3614cb01)	2021-07-08 17:07:54 +00:00
Rasmus Munk Larsen	05bab8139a	Fix breakage of conj_helper in conjunction with custom types introduced in !537 . (cherry picked from commit 7b35638ddb99a0298c5d3450de506a8e8e0203d3)	2021-07-02 20:59:50 +00:00
Chip Kerchner	eebde572d9	Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow (cherry picked from commit 91e99ec1e02100d07e35a7abb1b5c76707237219)	2021-07-01 23:32:38 +00:00
Antonio Sanchez	8190739f12	Fix compile issues for gcc 4.8. - Move constructors can only be defaulted as NOEXCEPT if all members have NOEXCEPT move constructors. - gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter. (cherry picked from commit 6035da5283f12f7e6a49cda0c21696c8e5a115b7)	2021-07-01 23:18:10 +00:00
Antonio Sanchez	b6db013435	Fix inverse nullptr/asan errors for LU. For empty or single-column matrices, the current `PartialPivLU` currently dereferences a `nullptr` or accesses memory out-of-bounds. Here we adjust the checks to avoid this. (cherry picked from commit 154f00e9eacaec5667215784c7601b55024e2f61)	2021-07-01 22:57:25 +00:00
Dan Miller	1f6b1c1a1f	Fix duplicate definitions on Mac (cherry picked from commit eb047759030558acf0764d5d2f913f4f84cf85a8)	2021-07-01 20:49:05 +00:00
Alexander Karatarakis	517294d6e1	Make DenseStorage<> trivially_copyable (cherry picked from commit 60400334a92268272c6bf525da89eec5e99c3e5a)	2021-07-01 20:48:47 +00:00
大河メタル	94e2250b36	Correct declarations for aarch64-pc-windows-msvc (cherry picked from commit c81da59a252b3479753b2eada26ee0cf46280bd0)	2021-06-30 04:10:04 +00:00
Rasmus Munk Larsen	380d0e4916	Get rid of redundant `pabs` instruction in complex square root. (cherry picked from commit 5aebbe9098f53f01c99eed67b52725397e955280)	2021-06-29 23:27:09 +00:00
Rohit Santhanam	e83af2cc24	Commit 52a5f982 broke conjhelper functionality for HIP GPUs. This commit addresses this. (cherry picked from commit 2d132d17365ffc84c0cc7a7da9b8f7090e94b476)	2021-06-25 19:56:18 +00:00
Rasmus Munk Larsen	413ff2b531	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code. (cherry picked from commit bffd267d176410a517a0fe9afa6dde99c213c08a)	2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen	a235ddef39	Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations. (cherry picked from commit 52a5f9821235e5a9f7e9b3e0198d45d42a1cb267)	2021-06-24 23:30:42 +00:00
Antonio Sanchez	c2c0f6f64b	Fix fix<> for gcc-4.9.3. There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` replacement. Fixes ##2267 (cherry picked from commit 35a367d557078462a0793c88c44dcad64fc63698)	2021-06-21 17:26:07 +00:00
Antonio Sanchez	ee4e099aa2	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps. (cherry picked from commit 12e8d57108c50d8a63605c6eb0144c838c128337)	2021-06-17 17:11:08 +00:00
Chip-Kerchner	9fc93ce31a	EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. (cherry picked from commit ef1fd341a895fda883f655102f371fa8b41f2088)	2021-06-16 22:14:17 +00:00
Antonio Sanchez	1374f49f28	Add missing ppc pcmp_lt_or_nan<Packet8bf> (cherry picked from commit 9e94c5957000c38a6553552c96a7a27b1fc2860d)	2021-06-15 22:12:22 +00:00
Rasmus Munk Larsen	47722a66f2	Fix more enum arithmetic. (cherry picked from commit 13fb5ab92c3226f7b9be20882b0418d53516d35a)	2021-06-15 16:40:35 +00:00
Antonio Sanchez	5e75331b9f	Fix checking of version number for mingw. MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC) 10-win32 20210110`, which causes the version extraction to fail. Added support for this with tests. Also added `make_unsigned` for `long long`, since mingw seems to use that for `uint64_t`. Related to #2268. CMake and build passes for me after this. (cherry picked from commit ad82d20cf649ba8c07352f947fd25766d0328df2)	2021-06-12 00:02:26 +00:00
Rasmus Munk Larsen	1cb1ffd5b2	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled. (cherry picked from commit fc87e2cbaa65e7e93a2c695ce5a9dc048a64a985)	2021-06-11 02:57:02 +00:00
Rasmus Munk Larsen	4b502a7215	Fix c++20 warnings about using enums in arithmetic expressions. (cherry picked from commit f64b2954c711b7846ae6ae228c5f14bd8dd56ec4)	2021-06-11 02:35:19 +00:00
Cyril Kaiser	573570b6c9	Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor. (cherry picked from commit 91cd67f057f90101cf858d63916ee56a58511b0d)	2021-05-26 19:45:25 +00:00
Antonio Sanchez	98cf1e076f	Add missing NEON ptranspose implementations. Unified implementation using only `vzip`. (cherry picked from commit dba753a986b527a17c8cc62474d0487aec7c2b36)	2021-05-25 19:09:50 +00:00
Antonio Sanchez	ee2a8f7139	Modify Unary/Binary/TernaryOp evaluators to work for non-class types. This used to work for non-class types (e.g. raw function pointers) in Eigen 3.3. This was changed in commit 11f55b29 to optimize the evaluator: > `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization. though I cannot reproduce the 16 byte result. Both before the change and after, with multiple compilers/versions, I always get a result of 40 bytes. https://godbolt.org/z/MsjTc1PGe This change modifies the code slightly to allow non-class types. The final generated code is identical, and the expression remains 40 bytes for the `abs2` sample case. Fixes #2251 (cherry picked from commit ebb300d0b4340104dcade3afa656a57da2b7660c)	2021-05-25 18:19:53 +00:00
Steve Bronder	4fbd01cd4b	Adds macro for checking if C++14 variable templates are supported (cherry picked from commit 17200570239f23b2f0d3b434bc0269c46c409791)	2021-05-21 16:43:30 +00:00
Niall Murphy	a883a8797c	Use derived object type in conservative_resize_like_impl When calling conservativeResize() on a matrix with DontAlign flag, the temporary variable used to perform the resize should have the same Options as the original matrix to ensure that the correct override of swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> & other). Calling the base class swap (i.e in DenseBase) results in assertions errors or memory corruption. (cherry picked from commit 391094c50743f28f9174f455661f650bf07e0177)	2021-05-20 23:43:57 +00:00
guoqiangqi	2f908f8255	Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 . (cherry picked from commit 3d9051ea84a5089b277c88dac456b3b1576bfa7f)	2021-05-12 17:02:19 +00:00
Nathan Luehr	d1825cbb68	Device implementation of log for std::complex types. (cherry picked from commit 7e6a1c129c201db4eff46f4dd68acdc7e935eaf2)	2021-05-11 22:31:53 +00:00
Nathan Luehr	d9288f078d	Fix ambiguity due to argument dependent lookup. (cherry picked from commit 6753f0f197e7b8a8019e82e7b144ac0281d6a7f1)	2021-05-11 22:00:36 +00:00
Rohit Santhanam	85ebd6aff8	Fix for issue where numext::imag and numext::real are used before they are defined. (cherry picked from commit 39ec31c0adbdde6b8cda36b3415e9cc2af20dab6)	2021-05-10 20:14:10 +00:00
Antonio Sanchez	2947c0cc84	Restore ABI compatibility for conj with 3.3, fix conflict with boost. The boost library unfortunately specializes `conj` for various types and assumes the original two-template-parameter version. This changes restores the second parameter. This also restores ABI compatibility. The specialization for `std::complex` is because `std::conj` is not a device function. For custom complex scalar types, users should provide their own `conj` implementation. We may consider removing the unnecessary second parameter in the future - but this will require modifying boost as well. Fixes #2112. (cherry picked from commit c0eb5f89a406243f71eae0b705eba4437d9f8565)	2021-05-07 18:38:23 +00:00
Antonio Sanchez	42acbd5700	Fix numext::arg return type. The cxx11 path for `numext::arg` incorrectly returned the complex type instead of the real type, leading to compile errors. Fixed this and added tests. Related to !477, which uncovered the issue. (cherry picked from commit 90e9a33e1ce3e4e7663dd67e6c1f225afaf5c206)	2021-05-07 17:52:07 +00:00
Christoph Hertzberg	9e0dc8f09b	Revert addition of unused `paddsub<Packet2cf>`. This fixes #2242 (cherry picked from commit 722ca0b665666f3af579002ad752541d7319d1b6)	2021-05-07 16:23:03 +00:00
Antonio Sanchez	fc2cc10842	Better CUDA complex division. The original produced NaNs when dividing 0/b for subnormal b. The `complex_divide_stable` was changed to use the more common Smith's algorithm. (cherry picked from commit 1c013be2cc6a999268be2f25575cd6a07bd52c45)	2021-04-29 17:58:45 +00:00
Antonio Sanchez	a33855f6ee	Add missing pcmp_lt_or_nan for NEON Packet4bf. (cherry picked from commit 172db7bfc32def5ed0f885287e352b63dd5cd767)	2021-04-27 21:15:08 +00:00
Jakub Lichman	ac3c5aad31	Tests added and AVX512 bug fixed for pcmp_lt_or_nan (cherry picked from commit d87648a6bea315645b893c3815ca8c6bb00ec5d2)	2021-04-26 18:07:55 +00:00
Antonio Sanchez	8830d66c02	DenseStorage safely copy/swap. Fixes #2229. For dynamic matrices with fixed-sized storage, only copy/swap elements that have been set. Otherwise, this leads to inefficient copying, and potential UB for non-initialized elements. (cherry picked from commit d213a0bcea2344aa3f6c9856da9f5b2a26ccec25)	2021-04-22 21:05:50 +00:00
Rasmus Munk Larsen	54425a39b2	Make vectorized compute_inverse_size4 compile with AVX. (cherry picked from commit 85a76a16ea835fcfa7d4c185a338ae2aef9a272a)	2021-04-22 17:25:25 +00:00
Jakub Lichman	42a8bdd4d7	HasExp added for AVX512 Packet8d (cherry picked from commit 2b1dfd1ba0638e57a50d2f401412e0893064c354)	2021-04-21 12:09:21 +02:00

1 2 3 4 5 ...

6473 Commits