Friday, December 19, 2014

Oxygen and Processing Greek Manuscript Data

While collecting data from Codex Alexandrinus, I was discouraged by the lack of Unicode support in Windows applications. In case there are others out there wishing to do the same thing, here are a few tips based on my own experience.

First, you need a good editor. Since I was transcribing the New Testament into XML files, I needed an XML editor that would work well with Unicode Greek. I ended up selecting the Oxygen IDE, which I found very solid; it has everything a manuscript editor would want in an XML editor, it is affordable, and it is easy to use.

Second, I processed those XML files primarily using the Perl programming language. For those without any programming background, I wouldn't recommend Perl (Perl code can look a bit cryptic); otherwise, Perl worked happily with Unicode data.

Third, you may want to consider working in a Linux environment. If you have a PC, that is not a problem. Oracle's VM VirtualBox can sit on top of your Windows operating system and allow you to work in Linux (which I found to be perfectly happy with Unicode files) without having to install a whole new OS. Being able to use shell tools (like grep, for example) makes manipulating and searching Unicode data very simple. There are plenty of on-line helps available for using those tools and VirtualBox is free.

Altogether, these elements combine to provide a powerful work environment for manuscript data processing.

No comments:

Post a Comment