Friday 16 April 2010

Getting annoyed with Windows

This morning I woke up about 6am, I went to sleep again, but unlike yesterday didn't get woken up by L.

In the morning I added metadata to a few more photos. After lunch I was trying to amend the hierarchical category of some photos. I had put photos from the National Folk Museum of Korea in the Gyeongbokgung category (as the National Folk Museum of Korea is inside Gyeongbokgung). However, I now decided that it would be more sensible to have National Folk Museum of Korea as a separate category to Gyeongbokgung.

Now, while it would be easiest to use Adobe Bridge to remove and add the hierarchical keywords, Bridge has a nasty habit of adding the hierarchical keyword to the keywords, and re-arranging all existing keywords assigned to an image so they are in alphabetical order. The way I structure my keywords for Korean items is:
Translation or transcription if no translation available; Hangeul; Transcription if translation was used as first keyword; Hanja;

e.g. One of my images has the following keywords:
Nongak; 농악; 農樂; Pungmul; 풍물; 風物; Pungak; 풍악; 風樂; National Folk Museum of Korea; 국립민속박물관; Gungnip Minsok Bangmulgwan; 國立民俗博物館; Asia; Korea; 한국; Hanguk; 韓國;

Writing the keywords in this format makes it relatively easy to see how each keyword is related to the Hangeul and Hanja version of that same keyword.

Ideally, each keyword would be enclosed in an alt-lang block, since Korea; 한국; Hanguk; 韓國 aren't actually separate keywords, but rather different versions of the same keyword. Sadly the Dublin Core XMP specs don't allow for this, and I didn't think it was worth creating my own XMP keywords spec just to allow for this.

Anyway, thanks to Bridge's re-ordering of the keywords (according to the spec, keywords are an unordered list (bag), so there's not actually anything inherently wrong with Bridge doing this), my keywords get all messed up so that I can't tell what Hangeul keyword relates to what English keyword, and likewise with the Hanja and transcribed keywords.

So instead of using Bridge, I decided to use exiftool.

Removing the existing hierarchical subject is easy, just open a DOS prompt, cd to the directory where the photos are that need modifying, then run
exiftool.pl -xmp-lr:HierarchicalSubject= ./

Adding the new existing hierarchical subject should be just as easy, in my case I ran
exiftool.pl -xmp-lr:HierarchicalSubject="Places|Asia|Korea (한국; Hanguk; 韓國)|Seoul Special City (서울특별시; Seoul Teukbyeolsi; 서울特別市)|Jongno-gu (종로구; 鐘路區)|National Folk Museum of Korea (국립민속박물관; Gungnip Minsok Bangmulgwan; 國立民俗博物館)" ./
But all the non-latin characters rendered in the command prompt as question mark ? characters. I checked the XMP of the modified file, and that had question marks instead of the Korean characters as well.

So I did some googling, and found on the exiftool FAQ Special characters don't display properly in my Windows console. Following this, I changed the font to TT Lucida Console, and changed the code page to 65001 (UTF-8). Now when I ran the same command as earlier, I got squares instead of Korean characters (interestingly, when I try to copy one of the squares from the console output and paste it into this blog post, it turns back into a Korean character).

I checked the XMP of the modified file, but the Korean characters had been saved as question marks again.

According to Phil
On some Windows systems, using UTF‑8 doesn't seem to work. In this case, a Windows character set may be the best alternative: For instance, for Windows Latin1 (cp1252) type "chcp 1252" in the console to switch to cp1252, then run exiftool with "-charset cp1252" (or -L). This same technique can be used for other supported Windows code pages.


So I looked for the codepage number for Korean, according to Windows Code Pages, it seems to be 949. But when I ran chcp 949 I just got back Invalid code page.

So I decided to try a simple-as-you-can-get perl script instead
#!/usr/bin/perl

use strict;

my $output = `exiftool.pl -xmp-lr:HierarchicalSubject="Places|Asia|Korea (한국; Hanguk; 韓國)|Seoul Special City (서울특별시; Seoul Teukbyeolsi; 서울特別市)|Jongno-gu (종로구; 鐘路區)|National Folk Museum of Korea (국립민속박물관; Gungnip Minsok Bangmulgwan; 國立民俗博物館)" ./`;
print "$output\n";

Now since all this script does is make a shell call, I didn't hold out much hope of it working, since I thought it would be essentially the same as typing the command in a command prompt.

However, I ran it, and it did work! Windows is so annoying!

I managed to update one folder of files, but then went I went on to another folder, I came across another problem. When running the exiftool command to remove the Hierarchical subject from the XMP, I got an error
Error renaming ./HONCHE~1.JPG
It seemed that it didn't like a filename that had Korean characters in it.

I installed Windows Powershell, and tried executing the command from that, but that just brought up a DOS prompt which flashed up (looking like it had some error messages in it), then closed.

I looked at the Exiftool FAQ, and saw that adding new tags to a list overwrites any existing tags by default, so actually there wasn't any need to remove the old Hierarchical Subject tag before adding the new Hierarchical Subject. So I tried just running the perl script to add the new Hierarchical Subject, but I still got the same problem.

I tried downloading the windows executable version of exiftool, to see if this had the same problem (though I'm pretty sure the problem is windows, not exiftool). But I couldn't get it to process the current directory, I tried '.', './', and '.\', but all gave me a message saying Error: File not found.

I next tried cding back to the parent directory of the directory I wanted to process, and specifying the directory name. I tried various variations, including the full path to the directory, but always got No file specified.

I did some googling, and found the answer, on the exiftool home page:
In Windows, ExifTool will not process files with Unicode characters in the file name. This is due to an underlying lack of support for Unicode filenames in the Windows standard C I/O libraries


It's enough to make you wonder if the cost of a Mac might be worth it, despite the lack of power compared to a similarly priced PC.

The sun had come out, so I went in the garden for a bit. I saw a bee fly, so I tried to take a photo of it, but the camera's battery had run out. I guess I should remember to check it (or just always put it on to charge) whenever I finish using it.

So I came back inside. I decided to just rename the files with non latin characters in them, run exiftool to replace the Hierarchical Category tag, and then rename the files back to their original filename. Luckily there was only one of these images with a non latin script filename.

After updating all the images that needed to be updated (they were in various folders, so it took quite a while), I took my camera battery off charge and went out in the back garden again.

I actually saw 3 bee flies in the same place. Mainly I just got photos of one Bee fly that was quite co-operative and stayed on the fence. But being on the fence also meant there weren't a lot of angles I could shoot at, especially after it moved and went below some ivy. Really I wanted to try and get a photo of one feeding on a grape hyacinth.

After dinner me, Mauser and L watched an episode of Star Trek TNG. Most of the episodes haven't been that good so far really.

After that I went out in the garden again, but there weren't any insects around other than the midges, which were flying about and not really photographable. So I came back in and watched a silent Japanese film with music from 'Once upon a time in the West' and Jean Jacques Perrey (the film was missing a soundtrack). The film was quite boring, but it was interesting to see that it was quite Americanised, despite being made in the 1920s.

One of the women wore one of those head hugging hats (a bit like a thin beanie), they went to eat at a European restaurant (with European style tables, chairs, and cutlery), and the main woman's boss' house was very Western style as well. Actually, the only thing really japanese about the film was the main woman and her mother's dresses. One of the blokes in it also looked like Charlie Chaplin.

A difference with western films was that it had quite large title boards, with lots of text on each one, whereas western silent films tend to only have a couple of lines of text on each title board, and you have to guess what's going on from the acting.

The weather was a mixture of clouds and sun all day.

Food
Breakfast: Bowl of Maple and Pecan Crunch Cereal; Cup o' Tea.
Lunch: 2x Cheese on Toasts; Orange; 2x delee Chocolate things that Shaz and Mark made yesterday; Cup o' Tea.
Dinner: Battered Fish Portion; Peas; Potatoes; Mushrooms; Ground Black Pepper. Pudding was a Muller Light Yoghurt. Coffee.

1 comment:

Pepper said...

It is so refreshing to see that someone reads the exiftool documentation and FAQ. If I gave out gold stars, you'd get one!