Retest this please.
Retest this please.
Confirmed, these updates have improved the perforce in my little test to what I get from the raw HIP kernel test
What about View::operator[] ?
Not needed it anyway does 64bit for 1D, and in C++23 I don’t want to introduce the shitty overflow semantics we have right now. I want to get rid of that behavior in the medium term anyway.
cc @rgayatri23 @seyonglee
For the OpenACC backend, this issue is addressed in a pending OpenACC PR (#7920).
I don’t quite get the argument that it forces to guard against the version. Doesn’t it always if I deprecate/remove/introduce-something new?
I don’t quite get the argument that it forces to guard against the version. Doesn’t it always if I deprecate/remove/introduce-something new?
Because there is no overlap of versions with the old syntax to avoid and the new thing to use instead, the guards are necessary on this occasion. Because that deprecation is introduced so close to 5.0, users do not have as much time to migrate that we typically like to give them.
oh you mean we didn’t deprecate in 4.7 so you didn’t get the warning in the same thing. I mean whatever we can do deprecated 5.
We currently support Kokkos 4.5, but also test against develop. I decided to write a simple helper to create the state pool as appropriate:
template<typename ExecSpace>
auto randomPool(const std::uint64_t seed, const int num_states)
{
using Pool = Kokkos::Random_XorShift64_Pool<ExecSpace>;
if constexpr (std::is_constructible_v<Pool, std::uint64_t, std::uint64_t>)
{
return Pool(seed, num_states);
}
else
{
Pool pool{};
pool.init(seed, num_states);
return pool;
}
}
Thanks for sharing your solution.
There is currently no consensus to change the deprecation vehicle. I will keep this issue open a bit longer to give an opportunity to others to weigh in.
@crtrott I updated the pull request as disucssed in the developer meeting. Namely
Proposed a modified heuristic in https://github.com/kokkos/kokkos/pull/8512
clang format
Mentioning https://github.com/kokkos/kokkos/pull/7924 to get both benchmarks in while we’re at it.
Using launch bounds
to control the block size with HIP may affect compiler optimizations, though I guess in your case it didn’t make a difference. Using chunk size
would be preferred because you could reuse the same compiled kernel and the same policy.
I don’t see any interaction with ChunkSize
/ chunk_size
in e.g. https://github.com/kokkos/kokkos/blob/b3ec45c33bc8151b92d63a764f01281911c83cf9/core/src/HIP/Kokkos_HIP_BlockSize_Deduction.hpp#L23-L35 so I think for HIP we only have the LaunchBounds
mechanism.
ChunkSize
controls the block size only for SYCL not for the other backends.
ChunkSize
controls the block size only for SYCL not for the other backends.
Sorry, I misread the documentation
No consensus to deprecate, we are proceeding with #8486
on GH200 with cuda 12.8.1 I see the same speedup as in https://github.com/kokkos/kokkos/pull/8495
Test_Atomic<__int128>/100000/iterations:10 7.29 s 7.27 s 10 Passed=1 Size of type=16 Time atomic=11.8644 Time non-atomic=806.09u Time serial=626.88u Value atomic=4.99995G Value non-atomic=160.648k Value serial=4.99995G
Please check the performance on the host side on a system that has 126-bit atomics support
Please check the performance on the host side on a system that has 126-bit atomics support
On X86-64 with OPENMP I see a similar speedup with this (and linking -latomic
and excluding the macro generation for the device trait to make it compile)
NEW
Test_Atomic<__int128>/100000/iterations:10 0.330 s 0.305 s 10 Passed=1 Size of type=16 Time atomic=0.593478 Time non-atomic=0.408238 Time serial=971.4u Value atomic=4.99995G Value non-atomic=188.618M Value serial=4.99995G
OLD
Test_Atomic<__int128>/100000/iterations:10 3.20 s 2.94 s 10 Passed=1 Size of type=16 Time atomic=4.10126 Time non-atomic=0.0639739 Time serial=642u Value atomic=4.99995G Value non-atomic=121.247M Value serial=4.99995G
Since pretty much all the changes are in desul
it should rather be discussed there.
serial.view_64bit (60744 ms)
cuda.view_64bit (733 ms)
sycl.view_64bit (1872 ms)
hip.view_64bit (5081 ms)
Since most of the changes are for SYCL, I’m fine with just changing that one build if we think it’s too expensive otherwise.
I spoke too soon when I said that our tests are passing with ROCm 7.0. Everything passes when core dump is enabled but the death tests fail when it is disabled. This is because of
Exceptions occurring during a kernel execution will not abort the process anymore but will return an error unless core dump is enabled.
This does not show up in our CI?
I spoke too soon when I said that our tests are passing with ROCm 7.0. Everything passes when core dump is enabled but the death tests fail when it is disabled. This is because of
Exceptions occurring during a kernel execution will not abort the process anymore but will return an error unless core dump is enabled.
This does not show up in our CI?
No because the container has core dump enable same as Frontier
(2024.2.1 is the new minimum required in Kokkos 5.0)
@seyonglee would you be a able to look at the OpenACC failure?
@seyonglee would you be a able to look at the OpenACC failure?
I will check the issue.
@seyonglee would you be a able to look at the OpenACC failure?
I will check the issue.
Temporarily disabled failing OpenACC tests in TestBitManipulationBuiltins
unit test.
Temporarily disabled failing OpenACC tests in
TestBitManipulationBuiltins
unit test.
Thank you
Should this be guarded with KOKKOS_ENABLE_DEPRECATED_CODE_4
or KOKKOS_ENABLE_DEPRECATED_CODE_5
?
To be honest I’m not sure it’s worth deprecating the old simd names or even changing them in our code. Our stuff is spelled differently anyway with being in the Kokkos namespace. But if @crtrott had a different opinion I’m fine following that
Please add to the 5.0 changelog
What behavior do we get when/o the changes in the source files?
The data pointer doesn’t change (movnig a pointer doesn’t change it’s value).
Note that https://github.com/kokkos/kokkos-kernels/pull/2771 introduced a new header with the Kokkos 4 header and that is not handled in the current version of this PR
Good point, I’ll fix the header for KokkosKernels_ArithTraits
Any chance to deprecate the header as well and redirect to a new header with appropriate prefix?
Would you be willing to move the traits definition to a new <KokkosKernels_ArithTraits.hpp>
header and have < Kokkos_ArithTraits.hpp >
just include it along with a using directive in the Kokkos:: namespace.
We’ll need to fix Trilinos for these changes, e.g.
In file included from /root/Trilinos/packages/stokhos/src/sacado/kokkos/vector/tpetra/Stokhos_Tpetra_MP_Vector.hpp:29:
/root/Trilinos/packages/stokhos/src/sacado/kokkos/vector/linalg/Kokkos_ArithTraits_MP_Vector.hpp:24:7: error: cannot specialize a dependent template
24 | class ArithTraits< Sacado::MP::Vector<S> > {
@lucbv just to check, these changes only apply to versions >= 5.0 ? (So I know how to guard in Trilinos)
I suppose you could do “`C++ #if __has_include(KokkosKernels_ArithTraits.hpp) // since 5.0
Yes this would only apply to versions of Kokkos >= 5.0 so for a 4.7.2 release we would not want this change.
Outdated (unsupported compilers)
@crtrott ping
I was planning to submit separate PRs for different backends to keep each PR small. Regarding your second comment, what would be the workaround for the overflow error? @romintomasetti @masterleinad
Something like
should do the trick.