eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-10-16 10:01:29 +08:00

Author	SHA1	Message	Date
Alexander Grund	929bc0e191	Fix alias violation in BFloat16 reinterpret_cast between unrelated types is undefined behavior and leads to misoptimizations on some platforms. Use the safer (and faster) version via bit_cast (cherry picked from commit b5eaa4269503f77d0aa58d2f8ed9419e1ba7784d)	2021-09-20 14:25:58 +00:00
Antonio Sanchez	f046e326d9	Fix strict aliasing bug causing product_small failure. Packet loading is skipped due to aliasing violation, leading to nullopt matrix multiplication. Fixes #2327. (cherry picked from commit 3c724c44cff3f9e2e9e35351abff0b5c022b320d)	2021-09-19 18:06:17 +00:00
Antonio Sanchez	3395f4e604	Fix tridiagonalization_inplace_selector. The `Options` of the new `hCoeffs` vector do not necessarily match those of the `MatrixType`, leading to build errors. Having the `CoeffVectorType` be a template parameter relieves this restriction. (cherry picked from commit ebd4b17d2f5ca29a5c16ebd35d54d7aeda587820)	2021-09-08 15:47:39 +00:00
Antonio Sanchez	f03d3e7072	Missing EIGEN_DEVICE_FUNCs to get `gpu_basic` passing with CUDA 9. CUDA 9 seems to require labelling defaulted constructors as `EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are ignored. Without these labels, the `gpu_basic` test fails to compile, with errors about calling `__host__` functions from `__host__ __device__` functions. (cherry picked from commit 998bab4b04f26552b9875acfe113e69c7adccec4)	2021-09-02 03:21:43 +00:00
Antonio Sanchez	07cc362238	Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang. Clang doesn't like !621, needs the "g" constraint back. The "g" constraint also works for GCC >= 5. This fixes our gitlab CI. (cherry picked from commit 3a6296d4f198ffbcccda4303919b3b14d5e54524)	2021-09-01 16:40:08 +00:00
Antonio Sanchez	4ef67cbfb2	GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315 ). GCC 4.8 doesn't seem to like the `g` register constraint, failing to compile with "error: 'asm' operand requires impossible reload". Tested `r` instead, and that seems to work, even with latest compilers. Also fixed some minor macro issues to eliminate warnings on armv7. Fixes #2315. (cherry picked from commit ff07a8a63945d89301d1b29ac59d170ff9be3955)	2021-08-31 21:23:28 +00:00
Antonio Sanchez	c2b6df6e60	Disable cuda Eigen::half vectorization on host. All cuda `__half` functions are device-only in CUDA 9, including conversions. Host-side conversions were added in CUDA 10. The existing code doesn't build prior to 10.0. All arithmetic functions are always device-only, so there's therefore no reason to use vectorization on the host at all. Modified the code to disable vectorization for `__half` on host, which required also updating the `TensorReductionGpu` implementation which previously made assumptions about available packets. (cherry picked from commit cc3573ab4451853774cd5c3497373d5fe8914774)	2021-08-31 21:23:11 +00:00
Adam Kallai	277d369060	win: include intrin header in Windows on ARM intrin header is needed for _BitScanReverse and _BitScanReverse64 (cherry picked from commit 1415817d8daa7fa72ec9b26a6b9d166a1d54626a)	2021-08-31 21:22:37 +00:00
Antonio Sanchez	7aee90b8d3	Fix fix<N> when variable templates are not supported. There were some typos that checked `EIGEN_HAS_CXX14` that should have checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch in some of the `Eigen::fix<N>` assumptions. Also fixed the `symbolic_index` test when `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0. Fixes #2308 (cherry picked from commit 5db9e5c77958997856ddbccfa4a52ff22e83bef9)	2021-08-30 16:23:35 +00:00
Rasmus Munk Larsen	3147391d94	Change version to 3.4.0.	2021-08-18 13:41:58 -07:00
Antonio Sanchez	115591b9e3	Workaround VS 2017 arg bug. In VS 2017, `std::arg` for real inputs always returns 0, even for negative inputs. It should return `PI` for negative real values. This seems to be fixed in VS 2019 (MSVC 1920). (cherry picked from commit 2b410ecbefea1bf4b9d50decb946a4ebe4a73f98)	2021-08-18 19:04:50 +00:00
Jakob Struye	1ec173b54e	Clearer doc for squaredNorm (cherry picked from commit 53a29c7e351646efe31ee85666c8f268f8e0d462)	2021-08-18 15:12:36 +00:00
Antonio Sanchez	aef926abf6	Renamed shift_left/shift_right to shiftLeft/shiftRight. For naming consistency. Also moved to ArrayCwiseUnaryOps, and added test. (cherry picked from commit fc9d352432b81210f73d71caecbd7dc5505d6ab8)	2021-08-18 14:44:31 +00:00
Antonio Sanchez	f1032255d3	Add missing PPC packet comparisons. This is to fix the packetmath tests on the ppc pipeline. (cherry picked from commit 2cc6ee0d2e76e88fe1476f6b0eae12edb68b1c8a)	2021-08-17 15:33:55 +00:00
Chip-Kerchner	f57dec64ef	Fix unaligned loads in ploadLhs & ploadRhs for P8. (cherry picked from commit 8dcf3e38ba9913021ce6a831836a59217e21baf2)	2021-08-17 12:48:36 +00:00
andiwand	cd474d4cd0	minor doc fix in Map.h (cherry picked from commit 5c6b3efead69636dec1599aa54dab4617755013c)	2021-08-16 14:26:39 +00:00
Chip-Kerchner	0b56b62f30	Reverse compare logic in F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8). (cherry picked from commit e07227c411cb5ed5c6252b594fe841867bd19f6a)	2021-08-13 18:01:15 +00:00
Chip Kerchner	44cc96e1a1	Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+ (cherry picked from commit 66499f0f172d0758360043e9c578761c0f7d50cd)	2021-08-12 21:39:17 +00:00
Rasmus Munk Larsen	6d2506040c	* revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B. * This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B. Authored by @awoniu. (cherry picked from commit 8ce341caf2947e4b5ac4580c20254ae7d828b009)	2021-08-11 18:11:26 +00:00
ChipKerchner	13d7658c5d	Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl). (cherry picked from commit 413bc491f1721afdb9802553b13a5b7aba67ed3b)	2021-08-10 20:40:54 +00:00
Gauri Deshpande	93bff85a42	remove denormal flushing in fp32tobf16 for avx & avx512 (cherry picked from commit e6a5a594a7f3cbe2f9843d4ef57a10d478cbb818)	2021-08-09 22:15:42 +00:00
Rasmus Munk Larsen	4e0357c6dd	Avoid memory allocation in tridiagonalization_inplace_selector::run. (cherry picked from commit a5a7faeb455efd7f6edb1138eda2e37546039b7d)	2021-08-06 21:48:00 +00:00
Antonio Sanchez	5b83d3c4bc	Make inverse 3x3 faster and avoid gcc bug. There seems to be a gcc 4.7 bug that incorrectly flags the current 3x3 inverse as using uninitialized memory. I'm pretty sure it's a false positive, but it's hard to trigger. The same warning does not trigger with clang or later compiler versions. In trying to find a work-around, this implementation turns out to be faster anyways for static-sized matrices. ``` name old cpu/op new cpu/op delta BM_Inverse3x3<DynamicMatrix3T<float>> 423ns ± 2% 433ns ± 3% +2.32% (p=0.000 n=98+96) BM_Inverse3x3<DynamicMatrix3T<double>> 425ns ± 2% 427ns ± 3% +0.48% (p=0.003 n=99+96) BM_Inverse3x3<StaticMatrix3T<float>> 7.10ns ± 2% 0.80ns ± 1% -88.67% (p=0.000 n=114+112) BM_Inverse3x3<StaticMatrix3T<double>> 7.45ns ± 2% 1.34ns ± 1% -82.01% (p=0.000 n=105+111) BM_AliasedInverse3x3<DynamicMatrix3T<float>> 409ns ± 3% 419ns ± 3% +2.40% (p=0.000 n=100+98) BM_AliasedInverse3x3<DynamicMatrix3T<double>> 414ns ± 3% 413ns ± 2% ~ (p=0.322 n=98+98) BM_AliasedInverse3x3<StaticMatrix3T<float>> 7.57ns ± 1% 0.80ns ± 1% -89.37% (p=0.000 n=111+114) BM_AliasedInverse3x3<StaticMatrix3T<double>> 9.09ns ± 1% 2.58ns ±41% -71.60% (p=0.000 n=113+116) ``` (cherry picked from commit 5ad8b9bfe2bf75620bc89467c5cc051fc2a597df)	2021-08-04 22:06:52 +00:00
Antonio Sanchez	237c59a2aa	Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset. The `memset` function and bitwise manipulation only apply to POD types that do not require initialization, otherwise resulting in UB. We currently violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and bitwise operations are applied byte-by-byte in the generic implementations. This is causing issues for scalar types that do require initialization or that contain non-POD info such as pointers (#2201). We either break them, or force specializations of these functions for custom scalars, even if they are not vectorized. Here we modify these functions for scalars only - instead using only scalar operations: - `pzero`: `Scalar(0)` for all scalars. - `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars. - `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars - `pand`, `por`, `pxor`, `pnot`: use operators `&`, `\|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise. For non-scalar types, the original implementations are used to maintain compatibility and minimize the number of changes. Fixes #2201. (cherry picked from commit 3d98a6ef5ce0ba85acaee4ffffc53f0f21bd8fd2)	2021-08-03 16:32:59 +00:00
Antonio Sanchez	3dc42eeaec	Enable equality comparisons on GPU. Since `std::equal_to::operator()` is not a device function, it fails on GPU. On my device, I seem to get a silent crash in the kernel (no reported error, but the kernel does not complete). Replacing this with a portable version enables comparisons on device. Addresses #2292 - would need to be cherry-picked. The 3.3 branch also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get fully working. (cherry picked from commit 7880f10526a11dc5544426c54c5763de576bf285)	2021-08-03 16:15:44 +00:00
hyunggi-sv	7adc1545b4	fix:typo in dox (has->have) (cherry picked from commit 02a0e79c701da7aa8dfad79b13cd1e7fae46d634)	2021-08-03 00:54:41 +00:00
Antonio Sanchez	c0c7b695cd	Fix assignment operator issue for latest MSVC+NVCC. Details are scattered across #920, #1000, #1324, #2291. Summary: some MSVC versions have a bug that requires omitting explicit `operator=` definitions (leads to duplicate definition errors), and some MSVC versions require adding explicit `operator=` definitions (otherwise implicitly deleted errors). This mess tries to cover all the cases encountered. Fixes #2291. (cherry picked from commit 9816fe59b47dc4c07967b5ee93a8e8aaa6e9c308)	2021-08-03 00:52:21 +00:00
arthurfeeney	9c90d5d832	Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector. (cherry picked from commit a77638387dd1aa2d07d2dae240cc30b303b4ef38)	2021-07-22 18:01:55 +00:00
Antonio Sanchez	5d37114fc0	Fix explicit default cache size typo. (cherry picked from commit 297f0f563d916260665d7fadc017f94f1a5e7a03)	2021-07-20 18:42:25 +00:00
Rohit Santhanam	930696fc53	Enable extract et. al. for HIP GPU. (cherry picked from commit beea14a18f76817439b4d8901d29db2e9c4a24c8)	2021-07-09 16:14:19 +00:00
Rasmus Munk Larsen	56966fd2e6	Defer to std::fill_n when filling a dense object with a constant value. (cherry picked from commit 0c361c4899c9042d2b25cd60d7826ab464caacb7)	2021-07-09 03:59:56 +00:00
Guoqiang QI	69ec4907da	Make a copy of input matrix when try to do the inverse in place, this fixes #2285 . (cherry picked from commit 4bcd42c271761dc5341f8e08ca7d357c3614cb01)	2021-07-08 17:07:54 +00:00
Rasmus Munk Larsen	05bab8139a	Fix breakage of conj_helper in conjunction with custom types introduced in !537 . (cherry picked from commit 7b35638ddb99a0298c5d3450de506a8e8e0203d3)	2021-07-02 20:59:50 +00:00
Chip Kerchner	eebde572d9	Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow (cherry picked from commit 91e99ec1e02100d07e35a7abb1b5c76707237219)	2021-07-01 23:32:38 +00:00
Antonio Sanchez	8190739f12	Fix compile issues for gcc 4.8. - Move constructors can only be defaulted as NOEXCEPT if all members have NOEXCEPT move constructors. - gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter. (cherry picked from commit 6035da5283f12f7e6a49cda0c21696c8e5a115b7)	2021-07-01 23:18:10 +00:00
Antonio Sanchez	b6db013435	Fix inverse nullptr/asan errors for LU. For empty or single-column matrices, the current `PartialPivLU` currently dereferences a `nullptr` or accesses memory out-of-bounds. Here we adjust the checks to avoid this. (cherry picked from commit 154f00e9eacaec5667215784c7601b55024e2f61)	2021-07-01 22:57:25 +00:00
Dan Miller	1f6b1c1a1f	Fix duplicate definitions on Mac (cherry picked from commit eb047759030558acf0764d5d2f913f4f84cf85a8)	2021-07-01 20:49:05 +00:00
Alexander Karatarakis	517294d6e1	Make DenseStorage<> trivially_copyable (cherry picked from commit 60400334a92268272c6bf525da89eec5e99c3e5a)	2021-07-01 20:48:47 +00:00
大河メタル	94e2250b36	Correct declarations for aarch64-pc-windows-msvc (cherry picked from commit c81da59a252b3479753b2eada26ee0cf46280bd0)	2021-06-30 04:10:04 +00:00
Rasmus Munk Larsen	380d0e4916	Get rid of redundant `pabs` instruction in complex square root. (cherry picked from commit 5aebbe9098f53f01c99eed67b52725397e955280)	2021-06-29 23:27:09 +00:00
Rohit Santhanam	e83af2cc24	Commit 52a5f982 broke conjhelper functionality for HIP GPUs. This commit addresses this. (cherry picked from commit 2d132d17365ffc84c0cc7a7da9b8f7090e94b476)	2021-06-25 19:56:18 +00:00
Rasmus Munk Larsen	413ff2b531	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code. (cherry picked from commit bffd267d176410a517a0fe9afa6dde99c213c08a)	2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen	a235ddef39	Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations. (cherry picked from commit 52a5f9821235e5a9f7e9b3e0198d45d42a1cb267)	2021-06-24 23:30:42 +00:00
Antonio Sanchez	c2c0f6f64b	Fix fix<> for gcc-4.9.3. There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` replacement. Fixes ##2267 (cherry picked from commit 35a367d557078462a0793c88c44dcad64fc63698)	2021-06-21 17:26:07 +00:00
Antonio Sanchez	ee4e099aa2	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps. (cherry picked from commit 12e8d57108c50d8a63605c6eb0144c838c128337)	2021-06-17 17:11:08 +00:00
Chip-Kerchner	9fc93ce31a	EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. (cherry picked from commit ef1fd341a895fda883f655102f371fa8b41f2088)	2021-06-16 22:14:17 +00:00
Antonio Sanchez	1374f49f28	Add missing ppc pcmp_lt_or_nan<Packet8bf> (cherry picked from commit 9e94c5957000c38a6553552c96a7a27b1fc2860d)	2021-06-15 22:12:22 +00:00
Rasmus Munk Larsen	47722a66f2	Fix more enum arithmetic. (cherry picked from commit 13fb5ab92c3226f7b9be20882b0418d53516d35a)	2021-06-15 16:40:35 +00:00
Antonio Sanchez	5e75331b9f	Fix checking of version number for mingw. MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC) 10-win32 20210110`, which causes the version extraction to fail. Added support for this with tests. Also added `make_unsigned` for `long long`, since mingw seems to use that for `uint64_t`. Related to #2268. CMake and build passes for me after this. (cherry picked from commit ad82d20cf649ba8c07352f947fd25766d0328df2)	2021-06-12 00:02:26 +00:00
Rasmus Munk Larsen	1cb1ffd5b2	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled. (cherry picked from commit fc87e2cbaa65e7e93a2c695ce5a9dc048a64a985)	2021-06-11 02:57:02 +00:00

1 2 3 4 5 ...

6691 Commits