Hashing performance problem and how it was solved

Recently I run into a problem where hashing some data had a big impact on the application performance. There are few factors that influenced this but it required 2.5x more instances, imagine when you have 15 instances of you application in production and it requires 38 to serve the same traffic.

In order to understand the problem and to find a what to fix I had to dig deeper. It this post will talk about what and how I found the problem.

The setup was:

  • hashing algorithm: SHA256
  • you find in some Map<String, List<String>> data that should be hashed
  • data that should be hashed matches some regex patterns
  • when your pattern matches, the content is replaced with the hashed one.
  • Guava Hashing class was used

Because there were some iterations over data and string operations, the initial thought was that hashing algorithm is slow and should be replaced. In order to prove the theory and compare their performance, JMH (Java Microbenchmarking Harness) was used.

Setup JMHproject:

mvn archetype:generate 
  -DinteractiveMode=false 
  -DarchetypeGroupId=org.openjdk.jmh 
  -DarchetypeArtifactId=jmh-java-benchmark-archetype 
  -DgroupId=com.iftodi 
  -DartifactId=jmh.hashing 
  -Dversion=1.0

Add Guava and Apache Commons Lang dependencies for Hashing and RandomStringUtils classes.


    com.google.guava
    guava
    28.0-jre


    org.apache.commons
    commons-lang3
    3.9

Benchmark:

@State(Scope.Thread)
public class HashingBenchmark {

    private static final int STRING_LENGTH = 26;

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    public void testMethod(Blackhole blackhole) {
        HashCode hashedString = Hashing.sha256().hashString(RandomStringUtils.random(STRING_LENGTH),
            Charset.defaultCharset());
        blackhole.consume(hashedString);
    }

}

SHA256:

Throughput:

Result "com.iftodi.HashingBenchmark.testMethod":
  232.410 ±(99.9%) 5.036 ops/ms [Average]
  (min, avg, max) = (214.934, 232.410, 238.504), stdev = 6.723
  CI (99.9%): [227.374, 237.446] (assumes normal distribution)
Benchmark                     Mode  Cnt    Score   Error   Units
HashingBenchmark.testMethod  thrpt   25  232.410 ± 5.036  ops/ms

Average time:

Result "com.iftodi.HashingBenchmark.testMethod":
  0.004 ±(99.9%) 0.001 ms/op [Average]
  (min, avg, max) = (0.004, 0.004, 0.004), stdev = 0.001
  CI (99.9%): [0.004, 0.004] (assumes normal distribution)
Benchmark                    Mode  Cnt  Score    Error  Units
HashingBenchmark.testMethod  avgt   25  0.004 ±  0.001  ms/op

SHA1:

Throughput:

Result "com.iftodi.HashingBenchmark.testMethod":
  245.732 ±(99.9%) 2.870 ops/ms [Average]
  (min, avg, max) = (233.948, 245.732, 251.042), stdev = 3.831
  CI (99.9%): [242.862, 248.602] (assumes normal distribution)
Benchmark                     Mode  Cnt    Score   Error   Units
HashingBenchmark.testMethod  thrpt   25  245.732 ± 2.870  ops/ms

Average time:

Result "com.iftodi.HashingBenchmark.testMethod":
  0.004 ±(99.9%) 0.001 ms/op [Average]
  (min, avg, max) = (0.004, 0.004, 0.004), stdev = 0.001
  CI (99.9%): [0.004, 0.004] (assumes normal distribution)
Benchmark                    Mode  Cnt  Score    Error  Units
HashingBenchmark.testMethod  avgt   25  0.004 ±  0.001  ms/op

From the results there is no big difference that should have a big impact to the performance.

It was necessary to dig deeper and few interesting things were found:

I think this also may affect performance but I have not benchmarked this, it’s a theory:

  • when using streams, use them correctly, when I was debugging I found code that instead of doing collect(Collectors.toList()) at the end, they were doing a forEach and adding data to a list.

Lasă un răspuns

Adresa ta de email nu va fi publicată. Câmpurile obligatorii sunt marcate cu *

Acest site folosește Akismet pentru a reduce spamul. Află cum sunt procesate datele comentariilor tale.