You can consider this post as an extension to previously written SIMD usage in C++, C# and Rust.
Example verified on OpenJDK 17 build running on Linux.
Seems that CPU intrinsics are no longer domain of low level AOT compiled languages. We have pretty nice and easy API in .NET Core 3 and 2021 is the first year to include official language support for SIMD in Java. Since JDK 16 there is finally possibility to write explicit vectorized code, though in incubation stage. JDK 17 was released about a month ago, yet API is still young enough to not hit the general-availability channel. Work for SIMD in Java 18 continues.
Java SIMD support is a fruit of cooperation between Intel and Oracle architects.
While we haven’t been able to use SIMD directly, JVM was using intrinsics for a while now. For example, in Java 9 Arrays.mismatch
was added which has scalar and vector implementation.1 This new API will let us - instead of relying on JVM decisions - write efficient code ourselves.
So let’s just start digging in.
Adding Vector incubator module
Java SIMD API is still experimental and may change based on feedback. See JEP 417.
Unfortunately, I haven’t been able to get it working in latest Intellij Idea (2021.2.2) and found suggestions on one forum that for now just use command line, so let’s do it.
The only thing we need to add is vector module so the entire command line looks like so:
1
java --add-modules=jdk.incubator.vector Main.java
API description
We can use SIMD by instancing Vector<e>
class where e
denotes to a boxed version of either byte, short, int, long, float or double. We can also use specialized IntVector
which extends Vector<Integer>
. Oracle also decided to call vector size a Shape. Shape can be of 64, 128, 256, 512 or MAX bits. There is one more new phrase we should remember: species. Specie is a combination of element type Vector operates on and its shape.
Standarization
All the classes and methods resides in jdk.incubator.vector.*
module. For this post we will need to import 2 of them:
1
2
import jdk.incubator.vector.VectorSpecies;
import jdk.incubator.vector.FloatVector;
C# architects decided to use generics for distinction between type that vector operates on (e.g.
Vector<int>
), Java on the other hand went with specialized functions (e.g.IntVector
) that extends genericVector<>
.
Safety
Like C# we can fetch optimal vector size by calling:
1
2
final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;
int length = SPECIES.length();
But unlike C# we can also ask for max possible vector size supported on hardware and by runtime (but not necessarily optimal): FloatVector.SPECIES_MAX
. We can go even further and use explicit sizes: FloatVector.SPECIES_64, FloatVector.SPECIES_128, FloatVector.SPECIES_256, FloatVector.SPECIES_512
.
Warning: while
SPECIES_64
andSPECIES_128
should be portable in 99% cases, other shapes cannot be considered as such. I verified explicitSPECIES_512
on my machine which does not support AVX-512 and while the program was running without any crashes; it used scalarized code. It’s actually one of the goals: if some intrinsics are not available, then dograceful degradation
to the most optimal scalar code as fallback.
Pattern mentioned above holds for all other types, we can set the size of the specialized vector by XXXVector.SPECIES_YYY
where XXX
is the type and YYY is the vector size in bits (e.g. ShortVector.SPECIES_256
).
Example
1
2
3
4
5
6
7
8
9
10
11
final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;
float[] arr = new float[1_200_000];
Arrays.fill( arr, 23.74F );
var v8sum = FloatVector.zero( SPECIES );
for( int i = 0; i < arr.length; i+=SPECIES.length() ) {
var v8temp = FloatVector.fromArray( SPECIES, arr, i );
v8sum = v8sum.add( v8temp );
}
Although API is very similar to one offered by C#, there are also many explicit operations that we can use. You can .broadcast()
, .blend()
, .fma()
, etc. We can also use VectrorMask<>
type.
Veryfing Assembly
This is an interesting part. Although this exact logic worked in case of C++, Rust and C# it does not in case of Java.
To check the machine-generated assembly use:
1
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly --add-modules=jdk.incubator.vector com/company/Main.java > asm.log
Produced assembly shows that we are not using YMM registers. I can only assume that it’s because of early incubation phase. Maybe something wrong with JDK I was using, and it’s degrading my code gracefully to scalar version… I need to recheck it one more time once it hit stable channel.
Footnotes
https://cr.openjdk.java.net/~vlivanov/talks/2017_Vectorization_in_HotSpot_JVM.pdf ↩