Snippet: Remove accents

By email:

Hi!

I have an idea for an extension. In Poland, we use characters with the accent ęóąśłżźćń. It will be great to be able to convert to eoaslzzcn with one click. There are probably similar accents in other languages. How to name? deaccent?

ęóąśłżźćń > eoaslzzcn

Solution

Here is a PopClip snippet with my attempt at this. To install: select the whole block below with the mouse then click Install Extension when PopClip pops up.

// #popclip
// name: Deaccent
// icon: square ãa
// after: paste-result
// language: javascript
/* normalise, then remove all combining diacritics characters */
let result = popclip.input.text;
result = result.normalize("NFD");

let hexValues = [...result].map(char => char.charCodeAt(0).toString(16));
print(hexValues);

result = result.replace(/[\u0300-\u036f]/g, "");

/* deal with pairs not covered by the above */
const pairs=['łl', 'ŁL'];
for (const pair of pairs) {
  result = result.replaceAll(pair[0], pair[1]);
}
return result;

Demo

CleanShot 2023-01-24 at 14.30.09

Notes

To code works by normalising the Unicode glyphs into the canonical decomposition, then removing all combining diacritic marks from the string. For some reason, the ł character does not decompose into a character + combining diacritic, so I had to add that manually. You should be able to see where modify the code to add other character pairs if needed.

EDIT:

I just realised the existing slugify extension also does this already (along with other side effects)