Mapping the Similarity of Speech Accents

I've been working on a fun mapping project using data provided by George Mason University's Speech Accent Archive.

Each subject recorded the same paragraph of text containing a wide variety of English language sounds:

Please call Stella. Ask her to bring these things with her from the store: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob. We also need a small plastic snake and a big toy frog for the kids. She can scoop these things into three red bags, and we will go meet her Wednesday at the train station.

You can play around with the prototype here: https://jonathantweedy.com/projects/SpeechAccents/map/

The Map shows pins for each of those samples in the subjects' respective birth locations. I would like to expand functionality to visualize birth location relative to current residence, age / length of exposure to English, etc, but for now it's just a fun way to click around and hear accents from different parts of the world.

For now, a very sloppy comparison score was generated for each unique pair of speech samples, and you can view these similarities by clicking "Compare with Color" on any given map pin. The selected pin will turn black, and the rest of the pins will appear somewhere on a grayscale, the darker ones being closer in comparison. This comparison was done using PHP's similar_text() function, and is, in all practicality, virtually useless. I would like to improve it over the holidays by writing a significantly better comparison algorithm that takes phonemic similarity into account to determine more accurate accent similarity scores between all of these samples. I've found resources for doing so, but just COVID, a sick dog, and some uptick in work over 2020 threw a wrench in that, so I'm finally revisiting it during some downtime over the holdiays. If I succeed, then the grayscale color coding on the map will become much more interesting and informative.