eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2025-06-04 18:54:00 +08:00

Author	SHA1	Message	Date
Tobias Wood	2bf8fe1489	NEON Complex Intrinsics	2024-08-22 22:46:16 +00:00
Rasmus Munk Larsen	32d95bb097	Add vectorized implementation of tanh<double>	2024-08-21 02:29:45 +00:00
Frédéric Chapoton	6331da95eb	fixing a lot of typos	2024-07-30 22:15:49 +00:00
Charles Schlosser	fb95e90f7f	Add truncation op	2024-04-29 23:45:49 +00:00
Damiano Franzò	888fca0e2b	Simd sincos double	2024-04-15 21:12:32 +00:00
Antonio Sánchez	a73970a864	Fix arm32 issues.	2024-01-23 22:04:55 +00:00
Tobias Wood	f38e16c193	Apply clang-format	2023-11-29 11:12:48 +00:00
Charles Schlosser	81b48065ea	Fix arm32 float division and related bugs	2023-08-29 00:36:07 +00:00
Antonio Sánchez	6e4d5d4832	Add IWYU private pragmas to internal headers.	2023-08-21 16:25:22 +00:00
Antonio Sánchez	7465b7651e	Disable FP16 arithmetic for arm32.	2023-06-26 18:39:42 +00:00
Rasmus Munk Larsen	df1049ddf4	Small packet math cleanup.	2023-04-04 16:14:32 +00:00
Antonio Sánchez	2c8011c2dd	Fix arm builds.	2023-03-20 16:59:38 +00:00
Rasmus Munk Larsen	ce62177b5b	Vectorize atanh & add a missing definition and unit test for atan.	2023-02-21 03:14:05 +00:00
Antonio Sánchez	384269937f	More NEON packetmath fixes.	2023-02-14 21:45:25 +00:00
Antonio Sánchez	2dfbf1b251	Fix NEON make_packet2f.	2023-02-14 16:52:07 +00:00
Antonio Sánchez	0a5392d606	Fix MSVC arm build.	2023-02-08 21:46:37 +00:00
Sean McBride	d70b4864d9	issue #2581 : review and cleanup of compiler version checks	2023-01-17 18:58:34 +00:00
Arthur	311cc0f9cc	Enable NEON pcmp, plset, and complex psqrt	2022-12-22 05:38:34 +00:00
Arthur Feeney	c4fb6af24b	Enable NEON pabs for unsigned int types	2022-12-19 17:07:36 +00:00
Charles Schlosser	9b6d624eab	fix neon	2022-11-08 20:03:01 +00:00
Rasmus Munk Larsen	7e398e9436	Add missing return keyword in psignbit for NEON.	2022-11-04 16:13:09 +00:00
Charles Schlosser	82b152dbe7	Add signbit function	2022-11-04 00:31:20 +00:00
Rasmus Munk Larsen	c475228b28	Vectorize atan() for double.	2022-10-01 01:49:30 +00:00
Rasmus Munk Larsen	bd393e15c3	Vectorize acos, asin, and atan for float.	2022-08-29 19:49:33 +00:00
Antonio Sánchez	28e008b99a	Fix sqrt/rsqrt for NEON.	2022-02-15 21:31:51 +00:00
Antonio Sánchez	6b60bd6754	Fix 32-bit arm int issue.	2022-02-04 21:59:33 +00:00
Kolja Brix	8d81a2339c	Reduce usage of reserved names	2022-01-10 20:53:29 +00:00
Erik Schultheis	cc11e240ac	Some further cleanup	2021-12-06 18:01:15 +00:00
Alex Druinsky	6bb6a6bf53	Vectorize fp16 tanh and logistic functions on Neon Activates vectorization of the Eigen::half versions of the tanh and logistic functions when they run on Neon. Both functions convert their inputs to float before computing the output, and as a result of this commit, the conversions and the computation in float are vectorized.	2021-10-27 16:09:16 +00:00
Kolja Brix	afa616bc9e	Fix some typos found	2021-09-23 15:22:00 +00:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	ff07a8a639	GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315 ). GCC 4.8 doesn't seem to like the `g` register constraint, failing to compile with "error: 'asm' operand requires impossible reload". Tested `r` instead, and that seems to work, even with latest compilers. Also fixed some minor macro issues to eliminate warnings on armv7. Fixes #2315.	2021-08-31 20:20:47 +00:00
Han-Kuan Chen	ab28419298	optimize predux if architecture is aarch64	2021-08-25 19:18:54 +00:00
derekjchow	66ca41bd47	Add support for vectorizing logical comparisons.	2021-07-23 20:07:48 +00:00
大河メタル	c81da59a25	Correct declarations for aarch64-pc-windows-msvc	2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen	bffd267d17	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.	2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen	fc87e2cbaa	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.	2021-06-11 02:35:53 +00:00
Antonio Sanchez	dba753a986	Add missing NEON ptranspose implementations. Unified implementation using only `vzip`.	2021-05-25 18:25:35 +00:00
Antonio Sanchez	172db7bfc3	Add missing pcmp_lt_or_nan for NEON Packet4bf.	2021-04-27 14:12:11 -07:00
David Tellenbach	4811e81966	Remove yet another comma at end of enum	2021-03-18 23:30:00 +01:00
Antonio Sanchez	82d61af3a4	Fix rint SSE/NEON again, using optimization barrier. This is a new version of !423, which failed for MSVC. Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to prevent operations involving `X` from crossing that barrier. Should work on most `GNUC` compatible compilers (MSVC doesn't seem to need this). This is a modified version adapted from what was used in `psincos_float` and tested on more platforms (see #1674, https://godbolt.org/z/73ezTG). Modified `rint` to use the barrier to prevent the add/subtract rounding trick from being optimized away. Also fixed an edge case for large inputs that get bumped up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-05 08:54:12 -08:00
Antonio Sánchez	9a663973b4	Revert "Fix rint for SSE/NEON." This reverts commit e72dfeb8b9fa5662831b5d0bb9d132521f9173dd	2021-03-03 18:51:51 +00:00
Antonio Sanchez	e72dfeb8b9	Fix rint for SSE/NEON. It seems sometimes with aggressive optimizations the combination `psub(padd(a, b), b)` trick to force rounding is compiled away. Here we replace with inline assembly to prevent this (I tried `volatile`, but that leads to additional loads from memory). Also fixed an edge case for large inputs `a` where adding `b` bumps the value up a power of two and ends up rounding away more than just the fractional part. If we are over `2^digits` then just return the input. This edge case was missed in the test since the test was comparing approximate equality, which was still satisfied. Adding a strict equality option catches it.	2021-03-03 09:41:46 -08:00
Antonio Sanchez	1e0c7d4f49	Add print for SSE/NEON, use NEON rounding intrinsics if available. In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the current rounding mode. For NEON, we use the provided intrinsics for rint/floor/ceil if available (armv8). Related to #1969.	2021-02-27 22:42:07 +00:00
Antonio Sanchez	29ebd84cb7	Fix NEON sqrt for 32-bit, add prsqrt. With !406, we accidentally broke arm 32-bit NEON builds, since `vsqrt_f32` is only available for 64-bit. Here we add back the `rsqrt` implementation for 32-bit, relying on a `prsqrt` implementation with better handling of edge cases. Note that several of the 32-bit NEON packet tests are currently failing - either due to denormal handling (NEON versions flush to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).	2021-02-26 14:08:40 -08:00
Antonio Sanchez	e19829c3b0	Fix floor/ceil for NEON fp16. Forgot to test this. Fixes bug introduced in !416.	2021-02-25 20:39:56 -08:00
Antonio Sanchez	5529db7524	Fix SSE/NEON pfloor/pceil for saturated values. The original will saturate if the input does not fit into an integer type. Here we fix this, returning the input if it doesn't have enough precision to have a fractional part. Also added `pceil` for NEON. Fixes #1969.	2021-02-25 14:39:26 -08:00
Antonio Sanchez	6cf0ab5e99	Disable fast psqrt for NEON. Accuracy is too poor - requires at least two Newton iterations, but then it is no longer significantly faster than `vsqrt`. Fixes #2094.	2021-02-23 19:52:55 -08:00
Antonio Sanchez	7ff0b7a980	Updated pfrexp implementation. The original implementation fails for 0, denormals, inf, and NaN. See #2150	2021-02-17 02:23:24 +00:00
Ashutosh Sharma	f702792a7c	missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel)	2021-02-16 16:33:59 +00:00

1 2 3 4

194 Commits