Pattern Recognition and Optimization in Mendix

I recently received a few requirements for a project related to security of passwords.

  1. If a user attempts to change their password and reuse three of the same characters in sequence, the password change should be rejected.

    1. Example: ‘!abc1abc4’ is rejected because ‘abc’ is repeated somewhere in the string
  2. If a user attempts to change their password and has in common a 6-character sequence with their user name, the password change should be rejected

    1. Example: User name = ‘pgriffin’ and the new password is ‘!3riffin#$’, it is rejected because ‘riffin’ is consistant in both

In this post I will detail how I solved it with one microflow and then how I optimized it. If you like to dive into the weeds, read along!

The first thing I did was evaluate the two requirements and determine if I could solve them with a single microflow. After all, both stories were ultimately about comparing parts of two strings to find a match (or not). The number of characters to test is variable. One story asks for 3, while the other is 6. I then thought more broadly about how I could build this in a way it could be reused for multiple purposes and came up with the idea to build in an offset, meaning that my test string could start at the nth position in the string instead of always at 0 (first character in the string). Additionally, I determined that I should make the number of patterns recognized before breaking out and flagging them should also be variable to allow for the maximum amount of reuse possibilities.

Like most developers, my immediate reaction was to go search and find a java method that accomplishes what I need. However, I wanted to challenge myself to start with a blank whiteboard and develop the algorithm naturally, and more importantly, in a Mendix Microflow. Occasionally, I field questions about the ‘limits’ of the Mendix platform and I wanted to prove that I could develop complex algorithms using the visual code modeler that Mendix provides, and also make it efficient.

On my whiteboard, I started to map out what needed to happen with each, which helped me frame the problem as well as identify the variables needed. Let’s look at the first problem.

Problem 1: 3 characters in string shouldn’t match

I passed in the test string twice (into the SourceString and the TestString paramteters). I passed in the number of characters to test (I setup as a constant) and an offset of 1 because I wanted to test from the second character; otherwise I would always get a match on the first three because the two strings were identical. Remember, I’m trying to isolate if a pattern is repeated. Here’s an example of what I needed:

Source String: ‘!bc1abc12’

Test String: ‘!bc1abc12’

TestSubString (pass 1): ‘!bc’ in SourceString more than once? No

TestSubString (pass2): ‘bc1’ in SourceString more than once? Yes

I wrote a microflow to do this, where I grabbed the first three characters in my source string, offset the test sting start positon by 1 so I didn’t test the first three characters that would of course match, and then looped through all of the test string possibilities until it didn’t find a match, increased the start position of my Source Substring by 1 and repeated the process. Here is what that microflow looks like:

 

Ever wonder what a For loop nested in another For loop looks like in Mendix? There you go. This code tested accurately. No matter where the pattern was in the test string, if it was found in the source string twice it was flagged and returned accordingly.

The Second Problem: 6 characters in the user name cannot be matched

This is the same problem and therefore used the same code. The only differences were that I passed in two different strings and the offset was 0 because I needed to start at the very beginning.

Source String: ‘pgriffin’

Test String: ‘!3riffin#$’

TestSubString (pass 1): ‘!3riff’ in SourceString? No

TestSubString (pass 2): ‘3riffi’ in SourceString? No

TestSubString (pass 3): ‘riffin’ in SourceString? Yes

This tested and worked flawlessly using the same microflow as the 3 character test.

Most developers would call it a day and move on, but I had a sinking feeling that while this worked in under a second for these small strings, if I really wanted to make this a reusable function, it needed to be optimized. I went back to the drawing board and came up with another way to accomplish the same logic using Lists based on temp tables, iterator loops, and list operations. I ended up writing a lot more code to accomplish the same thing. The result is this:

 

I then ran performance tests. I generated a random 10,000-character string (larger than nearly anything needed for something like this), and after some cleanup, I got the first 700 characters to pass without finding any duplicates in the entire string. The first function I built took 17 seconds to process through the 10,000-character strings until it found a match at character 742; the second method managed to find the same match in 6 seconds! Now, there is a mathematical reason for this but I’ll explain it in more layman’s terms.

The first function iterated through the first sequence and then compared all sequences in the entire string against it, then progressed one position and repeated the process from there. Yes, each time it moved forward in the string, there was one less sequence to test (it doesn’t retest preceding values that were tested), but I was still asking the engine to process this logic sequentially.

In the second function, I built two lists that contained all of the possibilities of the strings being tested. I then iterated through the first list, but took advantage of the Mendix list operation ‘filter’ that returns all values in the list that match the iteration value. If that number is greater than the NumberOfMatchesAllowed parameter value, then you know you have at least one pattern recognized more than allowed; otherwise you move on to the next substring in the SourceString list and repeat the process. The list operations ‘filter’ and ‘count’ are much, much faster than asking the engine to step through the logic in serial. The in-memory Mendix engine that handles this is optimized for just this. That is why I was able to achieve a nearly 60% increase in production.

In conclusion, I hope that this helps even one of you who stumble across this. I was able to solve an algorithm using just the Mendix platform and then further optimize it using the tools provided. Sometimes writing more steps in a piece of code can actually optimize it!