Hashing performance problem and how it was solved

Recently I run into a problem where hashing some data had a big impact on the application performance. There are few factors that influenced this but it required 2.5x more instances, imagine when you have 15 instances of you application in production and it requires 38 to serve the same traffic.

In order to understand the problem and to find a what to fix I had to dig deeper. It this post will talk about what and how I found the problem.

The setup was:

  • hashing algorithm: SHA256
  • you find in some Map<String, List<String>> data that should be hashed
  • data that should be hashed matches some regex patterns
  • when your pattern matches, the content is replaced with the hashed one.
  • Guava Hashing class was used

Because there were some iterations over data and string operations, the initial thought was that hashing algorithm is slow and should be replaced. In order to prove the theory and compare their performance, JMH (Java Microbenchmarking Harness) was used.

Setup JMHproject:

Add Guava and Apache Commons Lang dependencies for Hashing and RandomStringUtils classes.

Benchmark:

SHA256:

Throughput:

Average time:

SHA1:

Throughput:

Average time:

From the results there is no big difference that should have a big impact to the performance.

It was necessary to dig deeper and few interesting things were found:

I think this also may affect performance but I have not benchmarked this, it’s a theory:

  • when using streams, use them correctly, when I was debugging I found code that instead of doing collect(Collectors.toList()) at the end, they were doing a forEach and adding data to a list.

Lasă un răspuns

Adresa ta de email nu va fi publicată. Câmpurile obligatorii sunt marcate cu *

Acest sit folosește Akismet pentru a reduce spamul. Află cum sunt procesate datele comentariilor tale.