Home SIMD usage in Java
Post
Cancel

SIMD usage in Java

You can consider this post as an extension to previously written SIMD usage in C++, C# and Rust.

Example verified on OpenJDK 17 build running on Linux.

Seems that CPU intrinsics are no longer domain of low level AOT compiled languages. We have pretty nice and easy API in .NET Core 3 and 2021 is the first year to include official language support for SIMD in Java. Since JDK 16 there is finally possibility to write explicit vectorized code, though in incubation stage. JDK 17 was released about a month ago, yet API is still young enough to not hit the general-availability channel. Work for SIMD in Java 18 continues.

Java SIMD support is a fruit of cooperation between Intel and Oracle architects.

While we haven’t been able to use SIMD directly, JVM was using intrinsics for a while now. For example, in Java 9 Arrays.mismatch was added which has scalar and vector implementation.1 This new API will let us - instead of relying on JVM decisions - write efficient code ourselves.

So let’s just start digging in.

Adding Vector incubator module

Java SIMD API is still experimental and may change based on feedback. See JEP 417.

Unfortunately, I haven’t been able to get it working in latest Intellij Idea (2021.2.2) and found suggestions on one forum that for now just use command line, so let’s do it.

The only thing we need to add is vector module so the entire command line looks like so:

1
java --add-modules=jdk.incubator.vector Main.java

API description

We can use SIMD by instancing Vector<e> class where e denotes to a boxed version of either byte, short, int, long, float or double. We can also use specialized IntVector which extends Vector<Integer>. Oracle also decided to call vector size a Shape. Shape can be of 64, 128, 256, 512 or MAX bits. There is one more new phrase we should remember: species. Specie is a combination of element type Vector operates on and its shape.

Standarization

All the classes and methods resides in jdk.incubator.vector.* module. For this post we will need to import 2 of them:

1
2
import jdk.incubator.vector.VectorSpecies;
import jdk.incubator.vector.FloatVector;

C# architects decided to use generics for distinction between type that vector operates on (e.g. Vector<int>), Java on the other hand went with specialized functions (e.g. IntVector) that extends generic Vector<>.

Safety

Like C# we can fetch optimal vector size by calling:

1
2
final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;
int length = SPECIES.length();

But unlike C# we can also ask for max possible vector size supported on hardware and by runtime (but not necessarily optimal): FloatVector.SPECIES_MAX. We can go even further and use explicit sizes: FloatVector.SPECIES_64, FloatVector.SPECIES_128, FloatVector.SPECIES_256, FloatVector.SPECIES_512.

Warning: while SPECIES_64 and SPECIES_128 should be portable in 99% cases, other shapes cannot be considered as such. I verified explicit SPECIES_512 on my machine which does not support AVX-512 and while the program was running without any crashes; it used scalarized code. It’s actually one of the goals: if some intrinsics are not available, then do graceful degradation to the most optimal scalar code as fallback.

Pattern mentioned above holds for all other types, we can set the size of the specialized vector by XXXVector.SPECIES_YYY where XXX is the type and YYY is the vector size in bits (e.g. ShortVector.SPECIES_256).

Example

1
2
3
4
5
6
7
8
9
10
11
final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;
  
float[] arr = new float[1_200_000];  
Arrays.fill( arr, 23.74F );  
  
var v8sum = FloatVector.zero( SPECIES );
  
for( int i = 0; i < arr.length; i+=SPECIES.length() ) {
	var v8temp = FloatVector.fromArray( SPECIES, arr, i );  
	v8sum = v8sum.add( v8temp );
}

Although API is very similar to one offered by C#, there are also many explicit operations that we can use. You can .broadcast(), .blend(), .fma(), etc. We can also use VectrorMask<> type.

Veryfing Assembly

This is an interesting part. Although this exact logic worked in case of C++, Rust and C# it does not in case of Java.

To check the machine-generated assembly use:

1
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly --add-modules=jdk.incubator.vector com/company/Main.java > asm.log

Produced assembly shows that we are not using YMM registers. I can only assume that it’s because of early incubation phase. Maybe something wrong with JDK I was using, and it’s degrading my code gracefully to scalar version… I need to recheck it one more time once it hit stable channel.

Footnotes

  1. https://cr.openjdk.java.net/~vlivanov/talks/2017_Vectorization_in_HotSpot_JVM.pdf 

This post is licensed under CC BY 4.0 by the author.