In this video Charlie Hunt explains the history and implementation of this new feature. The video is not actually about Compact Strings. Compact Strings are only introduced as a case study to explain how with a lot of work, the three legged stool of, latency, throughput and memory footprint can all be improved together.
If you have the time I definitely recommend watching the whole video - although the actual part on Compact Strings start at 26:24.
If you want a 5 minute overview here are the highlights:
- String density (JEP 254 Compact Strings) is a feature of JDK 9.
- Aims were to reduce memory footprint without affecting any performance - latency or throughput as well maintaining full backward compatibility.
- JDK 6 introduced compressed strings but this was never brought forward into later JVMs. This is a complete rewrite.
- To work out how much memory could be saved 960 disparate java application heap dumps were analysed.
- Live data size of the heap dumps were between 300MB and 2.5GB.
- char[] consumed between 10% and 45% of the live data
- vast majority of chars were only one byte in size (i.e. ASCII)
- 75% of the char arrays were 35 chars or smaller
- On average reduction in application size would be 5-15% (reduction in char[] size about 35-45% because of header size)
- The way it will be implemented is that if all chars in the String use only 1 byte (the higher byte is 0) then a byte[] will be used rather than char[] (IS0-8859-1/Latin1 encoding). There will a leading bye to indicate which encoding was used.
- UTF8 not used because it supports variable length chars and is therefore not performant for random access.
- private final byte coder on the String indicates the encoding. Note the room to support many more encodings in the future.
- For all 64 bit JVMs no extra memory was needed for the extra field because of the 'dead' space needed for 8 byte object alignment.
- Throughput doesn't suffer as tested with 400 JMH benchmarks available online.
- The reason for this is that String is highly optimized in that there 55 specific JVM features for String alone.
- Latency also improved tested with industry benchmark SPECjbb2015 also regression tested on SPECjbb2005
- Feature can be enabled and disabled with -XX:+CompactStrings but will be enabled by default.