Programming Divertimentos

README

This article presents a Java implementation of a "legible word scrambler" and contrasts it with an equivalent Kotlin implementation that showcases Kotlin's strengths. The reader is assumed to be proficient with Java.

Github Repository

What's in a Word?

A word is a sequence of letters (duh!). A letter is an alphabetic character coming in various flavors:

lowercase
UPPERCASE
Àcçëntúãtêd

A legible scrambled word is a modified word where

The first and last letters are left undisturbed
The inner letters are shuffled so some or all of them change their positions

We'll be inmepltmneig a leglibe wrod sramecblr tcwie (oh, yaeh!). Frsit in Jvaa, for the reaedr to conecnt and tehn in Koitln for the radeer to be enlgtnieehd.

More on Scrambled Words

A legible scrambled word must retain its first and last letters and randomly shuffle its inner letters at will...

... but restrictions apply:

It must be at least four letters in length for scrambling to make sense
Inner letters must contain at least two distinct letters

To ascertain the four-letter rule, consider the following words:

Word	Comment
I	No inner letters to speak of
We	No inner letters to speak of
You	Only one inner letter; can't shuffle
They	Four letters; now we can shuffle! (Tehy)

Having four or more letters is not enough, though. Consider now the following words:

Good
Seed
Coool (teen spelling)

Note how, despite their being four letters or more in length, the inner letters are all the same. Such words would all scramble to themselves and are not to be processed.

The Art of Shuffling

Shuffling relies on a randomizer. However, and especially for short words, a randomizer may occasionally yield the same, original inner letter ordering. This can be quite annoying for our goal. Thus, shuffling must be repeated until a different inner letter order is produced

Baetnigs wlil cntinoue utinl morael imporevs

For long words, on the other hand, too much shuffling may cause the scrambled word to become unreadable.

A string similarity metric measures how similar two strings are. Similarity scores oscillate between 0.0 (unrelated) and 1.0 (identical.) Applying the normalized Levenshtein similarity metric to the word unmistakable and a few of its scrambled forms we get:

Similarity	Legibility	Shufflings
0.83	😃	Unmistabkale Unlmistakabe Unmstakaible
0.50	😐	Utamislkabne Uniasbamktle Unmbtlaikase
0.25	😔	Uaknlibsmtae Usialbntamke Usinabmlatke
0.16	😖	Uaaltmbnksie Utambkilnsae Ualmakbnitse

An interesting improvement could be to keep shuffling until we find a scrambled form that has a Levenshtein similarity of at least 0.75 with its source word. For the sake of simplicity, we won't implement any such refinements here.

Scrambling Words within Free-form Text

Scrambling words within text requires scrambling only words while leaving all other, non-alphabetic content (whitespace, punctuation) unchanged.

Consider the following quote and its scrambled form:

🄸'🄼 🅃🅆🄾 🅆🄸🅃🄷 🄽🄰🅃🅄🅁🄴 ― 🅆🄾🄾🄳🅈 🄰🄻🄻🄴🄽

🄸'🄼 🅃🅆🄾 🅆🅃🄸🄷 🄽🄰🅃🅁🅄🄴 ― 🅆🄾🄳🄾🅈 🄰🄴🄻🄻🄽

Note that nothing changes length and (except for inner letters, of course) nothing else changes position either.

We can clone the original text into a character array and operate on this copy overwriting only the inner letter word regions with their scrambled incarnations.

All we need is the starting and ending position of each word; neat!

Armed with this knowledge we're ready for our Java implementation.

All code below split for readability on small-screen devices

First in Java...

Our Java implementation, in all its shining glory, is:


x
public class WordScrambler {
  // 4+ letters, 2+ distinct inners
  // \p{IsLatin} equals [a-zA-ZÀ-ÿ]
  // Range [À-ÿ]: accented letters
  private static final Pattern WORD_REGEX =
    Pattern.compile("\\p{IsLatin}(\\p{IsLatin})\\1*(?!\\1)\\p{IsLatin}\\p{IsLatin}+");
  // Scramble words within text
  public static String
    scrambleWords(String text) {
    // Copy input text to output array
    final var result =
      text.toCharArray();
    // Create randomizer for this run
    final var random = new Random();
    // Examine text looking for matches
    WORD_REGEX.matcher(text).results()
      .forEach(match -> {
        // 2nd letter
        final var start =
          match.start() + 1;
        // Penultimate letter
        final var end = match.end() - 1;
        final var length = end - start;
        do {
          // Shuffle inner letter array
          for (var i = start;i < end;i++){
            // Choose random inner index
            final var rndIdx = start +
              random.nextInt(length);
            // Swap current/random chars
            final var save = result[i];
            result[i] = result[rndIdx];
            result[i] = save;
          }
          // Ensure shuffling took place!
        } while (
            IntStream.range(start, end)
              .allMatch(i ->
                result[i] ==
                  text.charAt(i))
          ); // do/while
      }); // forEach
    // Return scrambled text as string
    return new String(result);
  }
}

To the trained Java eye, the above code should be self-explanatory (or so we hope 😉)

We offer plenty of clarifying commentary below so read on!

...Then in Kotlin

Having shown the Java implementation, the Kotlin one can now be presented in all its dazzling splendor:


xxxxxxxxxx
package wscrambler
import java.io.File
// 4+ Latin letters, 2+ distinct inners
private val WORD_REGEX =
  """\p{IsLatin}(\p{IsLatin})\1*(?!\1)\p{IsLatin}\p{IsLatin}+"""
    .toRegex()
// Scramble words within text
fun scrambleWords(text: String): String{
  // Copy input text to output array
  val result = text.toCharArray()
  // Examine text looking for matches
  WORD_REGEX.findAll(text)
    .forEach { match->
      // Define range of inner letters
      val range = match.range.first + 1
        until match.range.last
      do {
        // Shuffle inner letter array
        for (i in range) {
          // Choose random region index
          val rndIdx = range.random()
         // Swap current/random chars
          result[rndIdx]=result[i].also {
              result[i] = result[rndIdx]
          }
        }
          // Ensure shuffling took place!
      } while (range.all {
          result[it] == text[it]
        })
    }
  // Return scrambled text as string
  return String(result)
}

Again, to the trained Java eye, the above code should be readily understandable. Anything obscure in either implementation should hopefully be clarified by the explanations below.

Kotlin is Familiar

Seasoned Java developers should have little difficulty parsing the above Kotlin code even if a few constructs don't have obvious Java counterparts.

This is by design: Kotlin was conceived to depart as little as possible from established Java syntax. This is also true of API's: JVM Kotlin builds upon familiar Java API's while retaining compatibility and enriching them in intuitive ways.

Thus, when we say:


xxxxxxxxxx
val reader = File(filename).reader()

we're actually implying:


xxxxxxxxxx
val reader: java.io.InputStreamReader =
 /*new*/ java.io.File(filename).reader()

which is Kotlinese for Java's:


xxxxxxxxxx
final var reader =
  new InputStreamReader(
    new FileInputStream(
      new File(filename)));

Kotlin is Compact

Our complete, commented Java implementation is 84 lines while the Kotlin one is a mere 54.

Unlike some other languages, such brevity is not achieved at the expense of readability; quite the contrary. It could be argued that, to the casual reader, the Kotlin version is probably easier to follow than the Java one.

While Java lambdas allow for concise, crisp code (often one-liners), things can get complicated as shown below.

Compare the following Kotlin code (stolen from our scrambler's main method):


xxxxxxxxxx
// Collect readers from args/stdin
val readers =
  if (args.isNotEmpty())
    args.map { File(it).reader() }
  else
    listOf(System.`in`.reader())
// Swallow all readers into one string
val content =
  readers.joinToString("\n") {
    // Read all file as a string
    it.readText()
  }

and its Java counterpart:


xxxxxxxxxx
// Collect readers from args/stdin
final Stream<BufferedReader> readers;
if (args.length > 0) {
  readers = Arrays.stream(args)
    .map(filename -> {
      try {
        return new BufferedReader(
          new FileReader(filename));
      } catch (Exception e) {
        // No lambda checked exceptions
        throw new RuntimeException(e);
      }
    });
} else {
  readers = Stream.of(
    new BufferedReader(
     new InputStreamReader(System.in)));
}
// Swallow all readers into one string
final var content = readers
  .flatMap(BufferedReader::lines)
  .collect(Collectors.joining("\n"));

Here, Kotlin is much more concise (and readable) because:

Arrays have high-level methods such as isNotEmpty()
If/Else is an expression, not a statement. Thus, it can be used in assignments
Extension functions (like File.reader() and reader.readText()) enrich existing classes with handy, macro-like functionality (more on this below)
All exceptions are treated as unchecked
Common lambda patterns, like collecting results in a String, are idiomatic

👉 A note on terminology: Kotlin uniformly calls executable units functions, rather than methods. For a function to be deemed a method it must be contained in an object or a class; it must have a target. Otherwise, functions stand on their own.

(Far) Fewer Imports!

When presenting the full Java implementation above we omitted imports for brevity. We didn't have to do this with the Kotlin version: there's only one import (import java.io.File.)

The reason is that while Java's prelude (the set of classes that can be used without importing) is limited to package java.lang, Kotlin's prelude includes a carefully selected set of additional packages that covers a lot of the most commonly used classes (I/O, ranges, collections, text, etc.)

In our Java implementation, on the other hand, we require 9 imports that can be abbreviated to, at most:


xxxxxxxxxx
import java.io.*;
import java.util.*;
import java.util.regex.*;
import java.util.stream.*;

Kotlin Extension Functions

Where in Java we say:


xxxxxxxxxx
final var wordRegex =
  Pattern.compile("\\p{InLatin}{4,}")

in Kotlin, we say:


xxxxxxxxxx
val wordRegex =
  """\p{InLatin}{4,}""".toRegex()

This looks as if String possessed a toRegex() method to convert it to a regular expression (which, of course, it doesn't.)

This is an instance of extension function: a function that can be attributed to an existing class even if we don't have access to that class's source code or, as is the case with String, even if it's a system, final class!

If we wanted to implement the toRegex() extension ourselves we'd say:


xxxxxxxxxx
fun String.toRegex(): Regex {
    return Regex(Pattern.compile(this))
}

Here:

The name of the target class to be "extended" is prepended to the function name
Inside the function, this refers to the target instance of the "extended" class
The function's return type is specified by the trailing ": Regex." In Kotlin types are specified after variable names.

More on Crispness

When a function is a simple one-liner we can define it with an equals sign and do without the curly braces and the return statement. We can also omit the function's return type if it's patently obvious. The above String.toRegex() function is more idiomatically spelled as:


xxxxxxxxxx
fun String.toRegex() =
  Regex(Pattern.compile(this))

Note also that, in Kotlin, when using triple quotes around strings, we don't need to escape the contents. Thus, while in Java we have to escape:


xxxxxxxxxx
// Two backslashes
"""\\p{InLatin}{4,}"""

in Kotlin, we don't:


xxxxxxxxxx
// One backslash
"""\p{InLatin}{4,}""""""

But... We're Not Using `\p{InLatin}{4,}`

No, we aren't. When we want to ensure the inner letters contain at least 2 distinct characters then Dr. Jekyll becomes Mr. Hyde:


xxxxxxxxxx
\p{IsLatin}(\p{IsLatin})\1*(?!\1)\p{IsLatin}\p{IsLatin}+

Ouch! 😀

This regular expression is comprised of the following five parts:

\p{IsLatin}: we require the first character to be a Latin letter
(\p{IsLatin}): we require the second character to be a Latin letter as well but this time we enclose it in parentheses. This enables us to refer to this second letter (as \1) later in the same regular expression
\1*: we allow for the third character (and any subsequent characters) to be repetitions of the second letter. The second letter is referenced as \1 and the * quantifier allows it to be repeated zero or more times
(?!\1)\p{IsLatin}: the nub of our regex! We require a Latin letter such that it is not equal to the second letter ((?!\1)). This is a back-reference negation
\p{IsLatin}+: after the previous, non-equal-to-the-second letter we allow for one or more (+) trailing letters

Thus, this regular expression:

Matches...	But not...	Because...
Kotlin	C	Just 1 letter
Gödel	in	Just 2 letters
neato	Zöe	Just 3 letters
compsci	cool	All inner letters equal

Kotlin Ranges: Less Looping, More Power

Where, in our scrambleWords(String) Java method, we wrote:


xxxxxxxxxx
// Examine text looking for matches
WORD_REGEX.matcher(text).results()
  .forEach(match -> {
    // Second letter
    final var start = match.start() + 1;
    // Penultimate letter
    final var end = match.end() - 1;
    final var length = end - start;
    // ... shuffling stuff ...
  }

In Kotlin we write:


xxxxxxxxxx
// Examine text looking for matches
WORD_REGEX.findAll(text)
  .forEach { match ->
    // Define range of inner letters
    val range: IntRange =
      match.range.first + 1
        until match.range.last
    // ... shuffling stuff ...
  }

In Kotlin, ranges are first-class citizens: we can iterate over them and also treat them as lambda targets (map, filter, fold, etc.)

Thanks to Kotlin ranges the following Java code:


xxxxxxxxxx
// Shuffle inner letter array
for (var i = start; i < end; i++) {
  // Choose a random index in region
  final var rndIdx =
    start + random.nextInt(length);
  // Swap current/random chars
  final var save = result[rndIdx];
  result[rndIdx] = result[i];
  result[i] = save;
}

becomes, simply:


xxxxxxxxxx
// Shuffle inner letter array
for (i in range) {
  // Choose a random index in region
  val rndIdx: Int = range.random()
  // Swap current/random chars
  result[rndIdx] = result[i].also {
    result[i] = result[rndIdx]
  }
}

Interestingly, ranges have their own random() extension function that can be invoked without providing a randomizer. That's why our Kotlin implementation doesn't have a Random instance while the Java one does.

Note how we exploit the also extension function to simplify swapping. It may look "too idiomatic" to some, but it nicely accommodates some Kotlin showing off... 😏

Boolean Lambda Expressions

Our shuffling logic is enclosed in a do/while loop in both implementations:


xxxxxxxxxx
// Java
do {
  // ... shuffling stuff...
  // Ensure shuffling took place!
  } while(IntStream.range(start, end)
      .allMatch(i ->
        result[i] == text.charAt(i)
  }));

The multi-line while condition might look a bit unusual to some, but bear in mind the lambda inside the while is actually a boolean expression, not a statement!

Whenever all characters in the inner letter region are the same in the shuffled array and in the input text we have a false scrambling, and we must re-shuffle.

The corresponding Kotlin code is arguably simpler and more readable despite being based, too, on a boolean lambda expression:


xxxxxxxxxx
do {
  // ... shuffling stuff...
  // Ensure shuffling took place!
} while (range.all {
    result[it] == text[it] })

Here it is the implicit name assigned to the lambda parameter when one is not explicitly declared.

Note how both arrays (result) and strings (text) are uniformly indexed with the [] operator. This is also true of collections such as List and Map!

The `main` Function

A naked main function is executable from the command line. Thus, if we have:


xxxxxxxxxx
// File: Echo.kt
package example
fun main(args: Array<String>) {
  println(args.joinToString(" "))
}

We execute it with:


xxxxxxxxxx
$ kotlin example.EchoKt Testing 1 2 3...
Testing 1 2 3...

main's (synthesized) fully-qualified class name is formed concatenating:

The package name
The file name, and
The "Kt" suffix

Note the type Array<String>. Kotlin arrays are regular generic types, just like collections! They're indexed with [], as in Java, and are as efficient as their Java counterparts because, despite appearances, they are Java arrays.


xxxxxxxxxx
val words = arrayOf(
    "out", "of", "mind",
    "back", "in", "five")
println(words[2]) // prints "mind"

If a main function requires no arguments they can be omitted, and the function is still executable:


xxxxxxxxxx
fun main() = println("Greetings Earth!")

For illustration purposes here is the complete Kolin implementation of the main function for our word scrambler:


xxxxxxxxxx
// Scramble from files/stdin to stdout
fun main(args: Array<String>) {
  // Collect readers from args/stdin
  val readers =
    if (args.isNotEmpty())
      args.map { File(it).reader() }
    else
      // in: quoted because reserved
      listOf(System.`in`.reader())
  // Swallow all readers into string
  val content =
      readers.joinToString("\n") {
        // Reads entire file
        it.readText()
      }
  // Scramble words and print to stdout
  println(scrambleWords(content))
}

Packages Can Have Members

The astute reader may have noticed that, unlike in Java, in Kotlin our scrambleWords() and main() functions are not contained in a class. This is also true of the WORD_REGEX value.

Functions scrambleWords and main, as well as value WORD_REGEX, belong to their enclosing wscrambler package and, if not imported, can be referred to as:


xxxxxxxxxx
wscrambler.scrambleWords("Hey there!")

In general, a Kotlin package can directly contain objects, values, variables, functions, classes, and annotations; the entire zoo!

Executable statements are not allowed inside packages, though. They can only appear inside functions.

Kotlin Objects

We've mentioned "objects" twice so far. What on Earth is a Kotlin object?

A Kotlin object is a singleton instance that may (or may not) extend a class, implement interfaces, and have internal members.

In our example (and for the sake of simplicity) the WORD_REGEX value and the scrambleWords function belong directly to their enclosing package wscrambler.

This is something a purist may frown upon as value WORD_REGEX should be actually private to function scrambleWords (but not wastefully rebuilt on every invocation!)

One sensible way to organize things would be:


xxxxxxxxxx
package wscrambler
object WordScrambler {
  // Regex compiled only once:
  // at object initialization.
  // Inaccessible to others,
  // even in the same package.
  private val WORD_REGEX =
    """\p{IsLatin}(\p{IsLatin})\1*(?!\1)\p{IsLatin}\p{IsLatin}+"""
        .toRegex()
  fun scrambleWords(text: String):String {
    // scrambling logic goes here...
  }
} // object WordScrambler
// more package stuff ...
fun main() {
  // Function called w/qualified name
  println(WordScrambler.scrambleWords(
   "I'm two with nature ― Woody Allen"))
}

Kotlin classes can have associated companion objects holding what in Java-land we'd call static members. A companion object, however, is much more than a container for static stuff: it doesn't (have to) extend its associated class, it may implement interfaces, and it may have its own members.


xxxxxxxxxx
// Nullable constructor arg w/default
class Numberer(pattern: String? = null) {
  companion object {
    // Public constant
    const val DEF_PATTERN = "000,000"
  }
  // Private field w/fallback value (?:)
  private val formatter =
    DecimalFormat(pattern ?: DEF_PATTERN)
  fun numberLines(lines: List<String>) =
    // lines.indices: 0 until lines.size
    lines.indices.map { index ->
      formatter.format(index + 1) +
        " " + lines[index]
    }
}
fun main() {
    val numberer = Numberer("00")
    val words = listOf(
        "out", "of", "mind",
        "back", "in", "five"
    )
    // lambda w/method reference
    numberer.numberLines(words)
        .forEach(::println)
    /*
       prints:
         01 out
         02 of
         03 mind
         04 back
         05 in
         06 five
    */
}

Kotlin doesn't have the notion of static members as Java does (it doesn't need them.) It does, however, transparently play nice with Java's static members and, where they are required, annotations can be used to specify them.


xxxxxxxxxx
package wscrambler
object WordScrambler {
  private val WORD_REGEX =
    """\p{IsLatin}(\p{IsLatin})\1*(?!\1)\p{IsLatin}\p{IsLatin}+""".toRegex()
  @JvmStatic // JVM static method
  // CLI FQN: wscrambler.Scrambler
  fun main(args: Array<String>) {
    println(scrambleWords(
      args.joinToString(" "))
  }
  fun scrambleWords(text: String) {
    // Scrambling logic goes here...
  }
}

👉 Also: classes and functions are public by default (which reduces verbosity.) Classes, however, are closed (final) by default which discourages "unintended" inheritance abuse.

Conclusion

Uff! a rather long ride! You made it this far: kudos! You're on your way to becoming a fulfilled Kotlin developer.

Kotlin has gained traction in the recent years in the Android world, thanks in no small part to Google's endorsement of it as their preferred Android language. Kotlin has also gained lots of traction in the JVM backend world with Spring openly supporting its use.

Beyond the JVM, Kotlin also compiles to native binaries (via LLVM) as well as to Javascript.

What's not to love?

Code is available as a Github Repository

Wednesday, December 16, 2020