StringPairReplacer Example

From Fmepedia

StringPairReplacer always was a bit mysterious to me. I couldn't figure out what it could be used for. Our Workbench Help gives a wonderful example of making "sassy" out of "bobby", but why would anyone want to make such a conversion?

Here is the story how I found how great and useful StringPairReplacer is.

Once my dear wife found a small contract - she had to convert a database in Russian (that is, Cyrillic field names and field contents) to Excel adapted for English-speaking users. This included translating field names and transliterating the contents, that is, expressing Russian words with Latin characters:

The database itself deserves a few words. This is what we can call a "real world" data - 7 million records, 170 columns per record, hundreds of characters in some fields.

My part was simple - I had to transliterate the data, and I was sure, it would not be difficult. This task is very common among Russian-speaking people around the world. There are plenty of programs doing this kind of transformation on the Internet, so I thought it would be a quick and easy exercise. Well, it wasn't. It's hard to imagine in how many ways the programmers can do it wrong. I admit, that with a few lines all programs work well - handling the bigger datasets is a problem. Most common is the file size limitation - it seems that the whole dataset is read into one variable of varchar type - this explains limitations in 32 and 64 kb, but some limits are hard to explain - 10,000 characters, for example. Other programs work too slowly. 4-5 hours for 200-300 kb of test examples is not appropriate at all when the total size is measured in megabytes.

I spent two evenings downloading and testing free programs, shareware programs, evaluation versions of paid programs - no luck, our dataset was simply way to heavy for all of them. Being well over budget I started to think about other alternatives - writing our own application or clipping data into hundreds of smaller pieces, but then suddenly a better idea stroke me - FME! That mysterious transformer StringPairReplacer is what I need!

Now I wonder why I didn't start from the best tool for data transformation.

The set up steps are as simple as placing a transformer and typing the following replacement string parameter:

Then we pass features through this transformer and get the transliterated text. Can something be easier?

Later, I made a standalone performance test. I took the first volume of "War and Peace" by Leo Tolstoy, which contains about 15,000 lines and a million of characters. The text was translated in 3 seconds.

So, the transformer can be quite useful for users dealing with international data, diacritical or non-Latin characters. For example, German umlauts can be replaced with digraphs.

---

To be honest, I should say that eventually I found a program called Shtirlitz that can do all kinds of transliterations and encoding conversions, and seems to be able to manage huge files, but it still can't outperform FME, especially when the tranliteration is a part of a bigger data transformation procedure.

Attached Files
filesizedate
translitexample.png3.3 kB07/29/08
translitparameter.png3.5 kB07/29/08
translitworkspace.png20.7 kB07/29/08
User Comments Add a new comment