

You’d think things would be simple, otherwise the existence of UTF-8.
And yet for the last 17 years, every company I’ve been in has had some sort of horrible mess involving unicode and non-unicode and nobody either recognising the problem, or knowing how to solve it when they did recognise it (“well, the £ turns into a ? so we just replace any ? in the filename by a £”).
In my experience things are fine while you work in a single environment, or you have control over the entire pipeline of data. Things quickly turn into a story from the Bible when different systems start trying to communicate.