From the code samples it's hard to tell whether or not this has to do with de-se...

marginalia_nu · on Sept 22, 2024

Author here, I'm away from my computer atm, but I can cook up a repo with each test in a few hours when I get home.

I designed the tests as a drag race because that mimics my real world usage.

antonhag · on Sept 22, 2024

That's nice - I'd encourage you to play around with attaching e.g. JMC [1] to the process to better understand why things are as they are.

I tried recreating your DataInputStream + BufferedInputStream (wrote the 1brc data to separate output files, read using your code - I had to guess at ResultObserver implementation though). On my machine it roughly in the same time frame as yours - ~1min.

According to Flight Recorder:

  - ~49% of the time is spent in reading the strings (city names). Almost all of it in the DataInputStream.readUTF/readFully methods.
  - ~5% of the time is spent reading temperature (readShort)
  - ~41% of the time is spent doing hashmap look-ups for computeIfAbsent()
  - About 50GB of memory is allocated - %99.9 of it for Strings (and the wrapped byte[] array in them). This likely causes quite a bit of GC pressure.

Hash-map lookups are not de-serialization, yet the lookup likely affected the benchmarks quite a bit. The rest of the time is mostly spent in reading and allocating strings. I would guess that that is true for some of the other implementations in the original post as well.

[1] https://github.com/openjdk/jmc

edit: better link to JMC

marginalia_nu · on Sept 22, 2024

JMC is indeed a valuable tool, though what you see in any java profiler is to be taken with a grain of salt. The string parsing and hash lookups are present in most of the implementations, yet some of them are up to 10 times faster than the DataInputStream + BufferedInputStream code.

It doesn't seem like it can be true that 90% of the time is spent in string parsing and hash lookups if the same operation takes 10% of the time when reading from a filechannel and bytebuffer.

antonhag · on Sept 22, 2024

Aren't the versions that take 10% of the time only reading each city name once, and then doing an array lookup rather than a hashmap lookup?

marginalia_nu · on Sept 22, 2024

Nope, see for example "Custom 1":

  var buffer = ByteBuffer.allocate(4096);
  try (var fc = (FileChannel) Files.newByteChannel(tempFile, 
                        StandardOpenOption.READ)) 
  {

    buffer.flip();

    for (int i = 0; i < records; i++) {

        if (buffer.remaining() < 32) {
            buffer.compact();
            fc.read(buffer);
            buffer.flip();
        }

        int len = buffer.get();
        byte[] cityBytes = new byte[len];
        buffer.get(cityBytes);
        String city = new String(cityBytes);
        int temperature = buffer.getShort();

        stats.computeIfAbsent(city, k -> new ResultsObserver())
             .observe(temperature / 100.);
    }
  }

antonhag · on Sept 22, 2024

My bad - I got confused as the original DIS+BIS took ~60s on my machine. I reproducing the Custom 1 implementation locally (before seeing your repo) and it took ~48s on the same machine. JFR (which you honestly can trust most of the time) says that the HashMap lookup now is ~50% of the time and the String constructor call being ~35%.

haglin · on Sept 22, 2024

JFR only samples running Java methods.

I would guess at least some of the bottlenecks are in hardware, the operating system or in native code (including the JVM) in this case.

cowwoc · on Sept 22, 2024

Hi,

Please add https://github.com/apache/fury to the benchmark. It claims to be a drop-in replacement for the built-in serialization mechanism so it should be easy to try.

marginalia_nu · on Sept 22, 2024

Will do!