professional perl programming wrox 2001 phần 10

Chia sẻ: Hà Nguyễn Thúy Quỳnh | Ngày: | Loại File: PDF | Số trang:120

Thêm vào BST

Báo xấu

61
lượt xem 8
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Các văn bản bên ngoài dấu ngoặc kép có mức nhúng từ 0, trong khi các văn bản trong dấu ngoặc kép (thể hiện trong tất cả HOA và giả định là trong kịch bản tiếng Ả Rập) có một mức độ nhúng của 1. Mỗi cấp độ có một hướng mặc định được gọi là hướng nhúng.

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: professional perl programming wrox 2001 phần 10

Unicode Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com The text outside the quotes has an embedding level of 0, whereas the text within the quotes (shown in ALL CAPITALS and assumed to be in the Arabic script) has an embedding level of 1. Each level has a default direction called the embedding direction . This direction is L (left to right) if the level number is even and R (right to left) if the level number is odd. Every paragraph has a default embedding level, and thus a default direction associated with it. This is also called the b ase direction of the paragraph. For example: A paragraph with a beginning like this in the Latin script would have a default embedding level as Level 0, and hence its base direction would be left to right. What the bidi Algorithm Does The bidi algorithm uses all these formatting codes and embedding levels for analyzing text to decide how it should be rendered. Here is briefly how it goes about doing it: It breaks up the text into paragraphs by locating the paragraph separators. This is necessary ❑ because all the directional formatting codes are only effective within a paragraph. Furthermore this is where the base direction is set. The rest of the algorithm treats the text on a paragraph-by-paragraph basis. The directional character types and the explicit formatting codes are used to resolve all the ❑ levels of embedding in the text. The text is then broken up into lines, and the characters are re-ordered on a line-by-line basis ❑ for rendering on the screen. Y FL Perl and bidi Since Perl is a language frequently used for text processing, it is natural that Perl should have bidi AM capabilities. We have an implementation of the bidi algorithm on Linux that can be used by Perl. We require a C library named FriBidi, which is basically a free implementation of the bidi algorithm, written by Dov Grobgeld. A Perl module has also been written by the same author, acting as an TE interface to the C library and is available as FriBidi-0.03.tar.gz from http://imagic.weizmann.ac.il/~dov/freesw/FriBidi . The FriBidi module enables us to do the following: Convert an ISO 8859-8 string to a FriBidi Unicode string: ❑ iso8859_8_to_unicode($string); Perform a logical to visual transformation. In other words, run the string obtained above ❑ through the bidi algorithm: log2vis($UniString, $optionalBaseDirection); This calculates the base direction if not passed as the second argument, returns the re-ordered string in scalar context, and additionally returns the base direction as the second element of the list in an array context. 1053 Team-Fly®
Chapter 25 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Convert the string obtained above to an ISO 8859-8 character set: ❑ unicode_to_iso8859_8($toDisplay); This makes sure that it is in a 'ready-to-display' format, assuming the terminal can display ISO 8859-8 characters (such as xterm). Translate a string from a FriBidi Unicode string to capRTL and vice versa: ❑ caprtl_to_unicode($capRTLString); unicode_to_caprtl($fribidiString); The capRTL format is where the CAPITAL LETTERS are mapped as having a strong right to left character property (RTL). This format is frequently used for illustrating bidi properties on displays with limited ability, such as ASCII-only displays. The following is a small example to demonstrate FriBidi's capabilities. First, we create a small file with the following text, named bidisample: THUS, SAID THE CAMEL TO THE MEN, "...there is more than one way to do it." AND THE MEN REPLIED "...now we see what you mean by bidi", RISING WITH CONTENTMENT WRIT ON THEIR FACES. This is the code to render the above file in bidi fashion: #!/usr/bin/perl # bidirender.pl use warnings; use strict; use FriBidi; my ($uniStr, $visStr, $outStr); open (BIDISAMPLE,"bidisample"); while(){ chop; # remove line separator $uniStr = caprtl_to_unicode ( $_ ); # convert line to FriBidi string $visStr = log2vis ( $uniStr ); # run it through the bidi algorithm $outStr = unicode_to_caprtl ( $visStr ); # convert it back to format # that can be displayed on # usual ASCII terminal print $outStr,"\n"; } > perl bidirender.pl "theres more than one way to do it..." ,NEM EHT OT LEMAC EHT DIAS SUHT ,"now we see what you mean by bidi..." DEILPER NEM EHT DNA .SECAF RIEHT NO TIRW TNEMTNETNOC HTIW GNISIR 1054
Unicode Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Perl, I18n and Unicode Now let us take a brief look at a solution to the problem of language barriers. A more extensive view of internationalization can be found in Chapter 26. Unicode helps us out in this matter, by providing a uniform way of representing all possible characters of all the living languages in this world. We are about to see how easy it is to enable people all over the world to understand what we are saying in their own language. This example may be tried out by anyone with a day or two of Perl experience. Although it is in no way complete, with no error checking and pretense of handling any real-world complexity, it demonstrates the ease with which Perl handles Unicode. Let us imagine the following scenario: An airport wants to have information kiosks at various locations outside the arrival lounge for foreign tourists. They need the information to be displayed in Arabic, Japanese, Russian, Greek, English, Spanish, Portuguese, and a whole host of other languages. They would like the kiosks to enable the user to view information about the city, events, weather, flight schedule, sight-seeing tours, and also be able to make and confirm reservations in affiliated hotels. Our task here is obviously to create a Perl program that is able to handle Unicode and, therefore, to an extent, solve this problem. The first thing we need to do is create a template HTML file containing a few HTML tags, but with the text replaced by text 'markers' – M1, M2, M3, and so on. We one file for each language, in the following format (obviously, all the files should contain Unicode text encoded in UTF-8): M1:charset "string corresponding to charset" M2:title "string corresponding to title" M3:heading "string corresponding to heading" M4:text "text string" To put the task in another way, we need to write a program that takes in the language name as the input and accordingly generates a file called outfile.html by filling in the template file with the strings in the language requested. The outputted file should be UTF-8 encoded Unicode. This involves a few things such as installing Unicode fonts, installing a Unicode editor, creating template HTML files, writing scripts, and so on. Let us look at these step-by-step. Installing Unicode Fonts For UNIX with the X Window System and Netscape Navigator, information regarding Unicode fonts for X11 in general can be found on http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html. The latest version of the UCS fonts package is available from http://www.cl.cam.ac.uk/~mgk25/download/ucs- fonts.tar.gz. For Windows with IE 5.5, Unicode fonts can be selected during installation or can be downloaded from http://www.microsoft.com/typography/multilang/default.htm. Another good place for links to fonts is http://www.ccss.de/slovo/unifonts.htm. Installing a Unicode Editor For UNIX with the X Window System Yudit is a good choice of an editor that supports UTF-8. Available from http://www.yudit.org/. For Windows 95 and 98 Sharmahd Computing's UniPad is a good editor, available from http://www.sharmahd.com/unipad/. For Windows NT and 2000, Notepad is able to handle Unicode. 1055
Chapter 25 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Creating the HTML Template Now we can go about creating the template HTML files and the string resource files. The next script is simply an HTML template, called templateLeft.html that we will use with our program: M2:title M3:heading M4:text Note that this is intuitively correct for languages written from left to right but for languages written the other way, we need a modified template to follow suit. So for languages such as Arabic, we can simply right-justify the displayed text using the ALIGN attribute and setting its value to RIGHT. This should be done for all text in the body of the document to be displayed in the correct direction, that is from right to left. This is the template templateRight.html that we will use with right-to-left languages: M2:title M3:heading M4:text This next image is the sample string resource file for the English language using UniPad: 1056
Unicode Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com The following image is a screenshot of a sample string file for the Arabic language. Note the rendering of Arabic text from left to right: Processing the Resource Files The fourth stage in our solution to the problem is creating the Perl script. This script will process the resource files it is given and generate the localized pages: #!/usr/bin/perl # Xlate.pl use warnings; use strict; my ($langname,$filename, $marker, $mark, $value, $wholefile, $thisval, $template, %valueof); print "Enter the language for the output html file: \n"; $langname = lc; # get language name and turn it into lowecase chomp $langname; $filename = $langname . ".str"; # generate filename from language name open(LANGFILE, "$filename"); #read in the markers & values in a hash while() { chomp($_); ($marker, $value) = split("\t", $_); $valueof{$marker} = $value; } close(LANGFILE); # use the correct template if ($langname =~ /arabic|hebrew/) { $template = 'templateRight.html'; } else {$template = 'templateLeft.html'} open(TMPLT,$template); open(OUTFILE, ">$langname.html"); $wholefile=join('', ); # slurp entire file into a string 1057
Chapter 25 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com close TMPLT; foreach $mark (keys %valueof) { $thisval = $valueof{$mark}; # get the value related to the marker $wholefile =~ s/$mark/$thisval/g; # do the replacement } print OUTFILE $wholefile; # write out complete langname.html file print "output written to $langname.html \n"; close OUTFILE; This is the big surprise – the script looks too simple. In fact, no extra processing is required to handle Unicode. Running the Script Now we can execute the code in the usual way and provide the language required: > perl Xlate.pl Enter the language for the output html file: ENGLISH output written to english.html > perl Xlate.pl Enter the language for the output html file: Arabic output written to arabic.html The Output Files After running the script and having it produce the *.html files, we can open them in a browser and see what has been written. The following is a screenshot of the english.html file generated by the script: 1058
Unicode Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Next, is the arabic.html file generated by the script. Note the direction of Arabic script as rendered by the browser: There are more examples of the same phrase written in different languages at http://www.trigeminal.com/samples/provincial.html , thanks to Michael Kaplan. A few more such sites are http://www.columbia.edu/kermit/utf8.html (hosted by the Kermit Project), http://www.unicode.org/unicode/standard/WhatIsUnicode.html (hosted by the Unicode Consortium) and http://hcs.harvard.edu/~igp/glass.html (hosted by the IGP). This simple method of replacing text markers is still widely used. However, localizing a large web site takes much more than just being able to handle Unicode strings. Things such as cultural preferences, date format, currency (which are covered in Chapter 26) need to be taken into consideration. This means we should probably turn to using methods such as HTML::Template, HTML::Mason, HTML::Embperl o r maybe something like XML::Parser in order to create an industrial strength multilingual site. Work in Progress There are still a few things about Unicode support in Perl that are under development. For instance, it is not possible right now to determine if an arbitrary piece of data is UTF-8 encoded or not. We cannot force the encoding to be used when performing I/O to anything other than UTF-8, and will have a problem if the pattern during a match does not contain Unicode, but the string to be matched at runtime does. Also the use utf8 pragma is on its way out. In order to follow the current state of Unicode support in Perl, one can join the Perl-unicode mailing list by sending a blank message to majordomo@perl.org with a subject line saying subscribe perl-unicode. 1059
Chapter 25 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Summary All said and done, we should not need to worry ourselves about the support of Unicode within Perl unless we really have to. All code we create will work just as well as it did before Unicode support came on the scene. So we only need to use it when dealing with foreign scripts for example, but more on using Perl around the world in Chapter 26. In this chapter we have looked at the details concerning the use of Perl with non-standard characters such as symbols or some foreign writings. We began with the problems we face as Perl is used across the globe and then we looked at what can be done to provide solutions. As a summary, we have: Seen how people have tackled the issue of providing an international coding system. ❑ Looked at how Unicode can be used in regular expressions and tried our hand at writing our ❑ own character property. Demonstrated how Perl can be used to deal with texts in languages that are written from right ❑ to left as opposed to left to right such as English. Provided a real world example of how we can deal with language barriers across the world. ❑ 1060
Unicode Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 1061
Chapter 25 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 1062
Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Locale and Internationalization Not everybody understands English, and not everybody expresses dates in the standard US format for example, 2/20/01. Instead, people worldwide speak hundreds of different languages and express themselves in almost as many alphabets. Furthermore, even if they do speak the same language, they Y may have different ways of expressing dates, times, and so on. This chapter aims to provide us with the tools to write Perl programs in many different languages, and to see the kinds of problems we may FL encounter along the way. We will take a look at the kinds of ways we can develop a multilingual application in Perl, be it for a AM web site or for something entirely different. Going multilingual does not only mean expressing messages in different languages, but also adapting the minor details of our site, for instance, to obey the conventions of other cultures. To use the example mentioned above, it means that we must change the presentation of a date or time. Catering for other languages will always be a good thing to do, attracting TE more visitors to our website. As an example, we may be surprised with the results if we translated a personal home page from Spanish into German. Its popularity and number of hits would rise rapidly, and visitors will be especially surprised at how the page may be instantly translated into German – if someone in Germany accessed it. The same goes for all multilingual websites: the site does its best to guess the country of origin of the user, and show messages accordingly. For instance, if we connect to Google from a Spanish domain (.es), for example melmac.ugr.es w e will immediately get the URL http://www.google.com/intl/es/, which is in Spanish. This is a neat trick that deduces the location from the T op-Level Domain (TLD) of our machine, and it will score points with the Spanish-speaking population. However, localization is not as easy as showing two different pages depending on where the client comes from. It is also a matter of knowing the language that the user, or more correctly, the user's application's client, is immersed in. The site will then show information in such a way that the client, and therefore the user, can understand it. For example, a quantity such as 3'5 will not mean much to an English person, but it means 3.5 to a Spaniard. The application must be aware of its cultural 'location', and act accordingly. Team-Fly®
Chapter 26 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com If we live in a country whose native language is not English, undergoing this process is a must. We will need to use it for undertaking tasks as simple as alphabetical sorting, showing quantities, or matching regular expressions. Our program will not work correctly, at least from the point of view of the user, if we do not use the right locale settings. The good news is that many people have already thought about this problem in depth. There are several frameworks that make writing multilingual and localized applications easier. It will come as even better news to readers of this book, that Perl has excellent support for this, as we shall soon see. In this chapter, we will see several ways of creating Perl applications for a multilingual environment, from simple tricks such as storing messages in different languages, and sorting according to local uses, to recognizing a foreign language or conjugating foreign verbs. At this early stage it is important to note that most of the programs presented in this chapter will not work in non-POSIX machines, including Win9x, and Macs. Why Go Locale? Suppose we want to create a Spanish web site to show the number of firms investing in a particular venture. Users should be able to access the site and find out the names of the primary investors, and the site should be designed accordingly. One basic hurdle to overcome would be the need to show a list of those firms in alphabetical order. First, we need to create a plain text file in which we can list the firms. Note that these are in no alphabetical order. We call this file firms.txt: Chilindrina Ántico Cflab.org Cantamornings Cántaliping Andalia Zinzun.com We can use a simple Perl command line that prints a sorted list on the screen (note that on Windows the special characters are not displayed properly): > perl -e 'print sort ;' firms.txt Andalia Cantamornings Cflab.org Chilindrina Cántaliping Zinzun.com Ántico This is not ideal though. In Spanish, 'Ch' is used as a stand-alone letter, coming after 'C' in the alphabet (although admittedly, this is slowly being phased out). The letters 'A' and 'Á' are in fact the same, the only difference being that the latter has an acute accent, so they should go together. To help solve this problem, we can find a useful tutorial included with Perl itself, by typing: > perldoc perllocale 1064
Locale and Internationalization Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com This should tell us most of what we need to know about locale, a method that most versions of UNIX use for taking alphabets and time/date expressions into account . Once we are happy with the basic principles and concepts, we can leave the reading for later on, and just add a simple part to our command line. When applied to the same file, and when issued from a computer with the Spanish locale installed, we can expect something like the following output: > perl -e 'use locale; print sort ;' firms.txt Andalia Ántico Cántaliping Cantamornings Cflab.org Chilindrina Zinzun.com There is a great likelihood that something slightly different will be displayed on our computer (if it uses Linux or another strain of UNIX), but this will allow us to gauge differences between English, Spanish, and whatever locale sorting we have. In any case, this is not completely correct even for the Spanish traditional sorting, although it is an improvement on the last attempt. The 'ch' was still not considered to be a different letter, but at least accented vowels were not alphabetized as different characters from their unaccented counterparts. The bad ordering of the 'ch' could be a bug in the locale implementation of our system (or maybe a bug in our understanding of the local alphabetization rules, as we will see later on). It could be improved for other Spanish locale implementations, so in order to fix it, we have to delve a little deeper into what locale actually means. The locale framework is concurrent with the phrase 'When in Rome, do as the Romans do' – computers have to use the local alphabet and local numbers. This is the reason why localization was included into the POSIX standard, (that is, the set of functions and files that should be understood by all UNIX platforms). There are some non-UNIX platforms, such as Windows NT (which has a slightly flawed implementation of the POSIX standard) which can use the locale set of functions, also known as NLS (for National Language Support). If one arrives at Perl from the C field, all functions, constants etc. related to locale are in the locale.h header file. This header distributes localization elements into several categories: LC_COLLATE Collation or sorting. LC_CTYPES Character types, distinguishing whether a character is alphanumeric or not. LC_MONETARY Handling monetary amounts. LC_TIME Displaying time. LC_MESSAGES Messages, not used in Perl by default. There are a few more categories that are not so widely used, whose names should give away their meanings: LC_PAPER, LC_NAME, LC_ADDRESS, and LC_MEASUREMENT, LC_IDENTIFICATION. 1065
Chapter 26 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com These constants can be used as environment variables, or else using the POSIX setlocale function. This means we have to use POSIX to have them available in our program. Going back to our problem regarding the alphabet of the Spanish language, we can locate a locale definition on the Internet, which effectively includes the traditional sorting, that is, 'ch' after 'c'. One place in which such a locale definition can be found, amongst many others, is under the name locales-es-2.1- 1mdk.noarch.rpm, at http://www.ping.be/linux/locales/. We can install it by issuing the following command: > rpm -Uvhf locales-es-2.1-1mdk.noarch.rpm The f option is needed since one of the files, LC_COLLATE, conflicts with the existing settings; the file was renamed and then moved back to its original value, (we need to do this, or the locale settings will not work properly in Spanish). One of the locales included in that package is es@tradicional. This should sort the 'ch' in the traditional way. However, this implementation does not work very well, as we will see. There is a more complete list of locales for Linux available from IBM DeveloperWorks, at http://oss.software.ibm.com/developerworks/opensource/locale/download.html. This includes locales such as Maltese, which will be required later. These locales are still in beta stages, and thus look set to change considerably in the future. After installing the package, we can create the following short script: #!/usr/bin/perl # sort.pl use warnings; use strict; use POSIX qw(LC_COLLATE setlocale strcoll); setlocale(LC_COLLATE, 'es_US'); print sort {strcoll($a, $b);} ; When we run this program we will obtain the following output: > perl sort.pl firms.txt Andalia Ántico Cántaliping Cantamornings Cflab.org Chilindrina Zinzun.com Before we continue, this program probably needs a bit of explanation. Instead of using the use locale pragma, it relies on POSIX calls to do sorting correctly. By default, locale uses the locale settings contained in the LC_* UNIX environment variables. However, in this case we want to be able to change locale in runtime. The POSIX setlocale function allows us to do this; we take the locale category we want to change as the first argument. We then take the locale we want to change it to as the second argument. It must be a valid locale, that is, it must correspond to a file already installed on our system. If it works correctly, it returns the name of the locale it has been set to; in this case, es_US. The reason for using this locale is that, for some strange reason, it seems to be the only one to use the 'traditional' Spanish ordering. 1066
Locale and Internationalization Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com The problem with setting locales using this function (instead of the pragma), is that despite what is said in the perllocale documentation, the Perl functions cmp and sort do not use it by default. For this reason we need to use an alternative form of sort, sort {expr}, and the POSIX function strcoll, which compares (or collates, hence the name) using the setting specified in the setlocale call. Delving Deeper into Local Culture Now, suppose we have a scenario where a user views the site and wants to add another firm to the list of investors. Using the current code, they would have to undertake this process in Spanish, therefore eliminating the likelihood of nonSpanish speaking firms investing through the site. At first this may seem a difficult task because it is impossible to find out where the user is from, since Spanish autonomous regions do not have their own top level domains. It would however, be possible if there was only one user in each region responsible for liaising through our web site. So, how can we use those different locales? To start with, we need to find the way locales are called. Locales usually have three, sometimes four parts, written in the following way: xx_YY.charset@dialect. Breaking this down would perhaps make it easier to understand. xx represents the language. ❑ YY represents the country. ❑ charset is the character set, such as ISO8859-15 (for Latin languages) or KOI8 (for ❑ Russian). @dialect is a particular modality, for instance, a dialectal variety, such as no@nynorsk in ❑ Norwegian. It also represents special symbols such as ca_ES@euro (which is a variant of the ca_ES locale including the euro symbol). Following this form, a typical Spanish locale would be es_ES.iso885915, or es_ES@euro, which can be found in any of the above-mentioned web sites if they are not already installed in the system. On the same page we can find locales for Basque, Catalan and Galician – the three other official languages in Spain. Getting back to the example, we decide to greet the visitors to our site in their own language and show them the local time (of our site) using the format favored by them. The site will then show them how much money each company has invested. In order to do this, we start by creating a new plain text file called firms_money.txt: Ántico 1000000 Chilindrina 2000.35 Cflab.org 123456.7 Cantamornings 6669876 Cántaliping 46168.5 Andalia 4567987 Zinzun.com 33445 We also need to remember that a Spanish language web site can be accessed from all over the world, and that there are large Spanish-speaking populations in South America, Central America, and North America. This adds an extra issue; different countries will more than likely have different currencies. Not everyone would understand the value of a given number of Pesetas offhand, and users may want to use the site to find out the amount of money invested in their own currency. A more general solution would be considering the names of the people connecting to our site from different parts of the world. 1067
Chapter 26 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com #!/usr/bin/perl # invest.pl use warnings; use strict; use Finance::Quote; use POSIX qw (localeconv setlocale LC_ALL LC_CTYPE strftime strcoll); my %clientMap = ( 'Jordi' => ['ca_ES', 'Hola', "aquí está el teu inform per"], 'Patxi' => ['eu_ES', 'Kaixo', "egunkaria hemen hire"], 'Miguelanxo' => ['gl_ES', 'Olá', "aquí está o seu relatório pra"], 'Orlando' => ['es_AR', 'Holá', "aquí está tu informe para"], 'Arnaldo' => ['es_CO', 'Holá', "aquí está tu informe para"], 'Joe', ['en_US', 'Hi', "here's your report for"] ); This script takes as its arguments the name of the client, and the name of the file that contains the firm and amount of money invested. In our case, our file is firms_money.txt. # Firm Quantity die "Usage: $0 \n" if $#ARGV
Locale and Internationalization Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Currency conversion is used in the lines below. The objective of these lines is to show how information about the local international currency name and symbol is also contained in the locale settings; they are retrieved using the localeconv POSIX function. This function returns a hash reference, but we only use the keys int_curr_symbol, which references the three letter international name of the currency, such as USD (for US dollars) and ESP (Spanish pesetas), and currency_symbol. The currency_symbol references the symbol of such currency, such as $ for the US dollar (amongst others), and £ for the British pound. #Set up currency conversion my $q = Finance::Quote->new; $q->timeout(60); my $lconv = localeconv(); chop($lconv->{int_curr_symbol}); my $conversion_rate=$q->currency("ESP",$lconv->{int_curr_symbol}) || 1.0; for (sort {strcoll($a,$b);} keys %investments) { printf ("%s %.2f ESP %.2f %s %s \n", $_, $investments{$_}, $investments{$_}*$conversion_rate, $lconv->{int_curr_symbol}, $lconv->{currency_symbol} ); } Now, we can use the script and obtain output such as the following. Of course the exact output will depend on the country we are in, the date, time, currency rates, etc. > perl invest.pl Arnaldo firms_money.txt Holá Arnaldo, aquí está tu informe para 13 dic 2000 14:56:22 CET Andalia 4567987.00 ESP 52530160.34 USD $ Ántico 1000000.00 ESP 11499630.00 USD $ Cántaliping 46168.50 ESP 530920.67 USD $ Cantamornings 6669876.00 ESP 76701106.15 USD $ Chilindrina 2000.35 ESP 23003.28 USD $ Cflab.org 123456.70 ESP 1419706.37 USD $ Zinzun.com 33445.00 ESP 384605.13 USD $ > perl invest.pl Patxi firms_money.txt Kaixo Patxi, egunkaria hemen hire 00-12-13 14:57:38 CET Andalia 4567987.00 ESP 4567987.00 ESP Pts Ántico 1000000.00 ESP 1000000.00 ESP Pts Cántaliping 46168.50 ESP 46168.50 ESP Pts Cantamornings 6669876.00 ESP 6669876.00 ESP Pts Chilindrina 2000.35 ESP 2000.35 ESP Pts Cflab.org 123456.70 ESP 123456.70 ESP Pts Zinzun.com 33445.00 ESP 33445.00 ESP Pts > perl invest.pl Orlando firms_money.txt Holá Orlando, aquí está tu informe para 13 dic 2000 14:57:59 CET Andalia 4567987.00 ESP 23995.27 USD $ Ántico 1000000.00 ESP 5252.92 USD $ Cántaliping 46168.50 ESP 242.52 USD $ Cantamornings 6669876.00 ESP 35036.33 USD $ Chilindrina 2000.35 ESP 10.51 USD $ Cflab.org 123456.70 ESP 648.51 USD $ Zinzun.com 33445.00 ESP 175.68 USD $ 1069
Chapter 26 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com > perl invest.pl Joe firms_money.txt Hi Joe, here's your report for Wed 13 Dec 2000 02:59:49 PM CET Andalia 4567987.00 ESP 24019.30 USD $ Cantamornings 6669876.00 ESP 35071.41 USD $ Chilindrina 2000.35 ESP 10.52 USD $ Cflab.org 123456.70 ESP 649.16 USD $ Cántaliping 46168.50 ESP 242.76 USD $ Zinzun.com 33445.00 ESP 175.86 USD $ Ántico 1000000.00 ESP 5258.18 USD $ Barring bugs such as the alphabetic ordering in Basque (whose alphabet does not include the letter 'c', and thus, no word starting with 'c' can possibly go before 'd') and the dancing of the 'ch' from one place to another, this should look like something native speakers would feel comfortable with. First thing we put into this program is the Finance::Quote module (available from CPAN). The main use of this module is for stock quotes, but it is basically a front end for doing searches over the Yahoo Finance servers. This also means that we must be on-line to take advantage of it. Each time we try to get a currency conversion rate, it is forwarded to a UserAgent. This requests the rate from Yahoo, manipulates it, and gives it back to us. It might also take a while, depending on the speed of the connection, and it could result in some additional errors, depending on the state of the Yahoo site. The program would work perfectly without this currency conversion, but the point of including it here is to show another aspect of localization: local currencies. The subsequent lines create a hash of arrays with the (theoretically unique) names of the customers, and set a different locale, currency, and greeting, with respect to the local language. If the name given is none of the above, it defaults to the es_ES locale and greetings. From that line on, the following action takes place: locale is set, file is read, and system time is read and converted to local time. The strftime function is used; this function formats time into a string using different options; in this case, %c instructs it to use the preferred date format for the local settings. This format will even translate weekday names, in addition to putting date and time in the local favorite arrangement. For instance, Americans prefer 12-hour clocks plus AM/PM, while many Europeans opt for 24 hour clocks. A Finance::Quote o bject is then created, setting the timeout, so that it will only wait 60 seconds for the answer. The object is used in the next line, where it is asked for the conversion rate between the Spanish peseta (ESP) and local currency. If the quantities in the file were in euros, we could have used the es_ES@euro locale instead, and conversions would have been made from euros instead of from pesetas. Finally, the hash containing the firm names and investment is printed in three columns: name, investment in pesetas, and investment translated into the local currency. It should be noted, however, that the period (.) is used as separator for decimals in English speaking countries, whilst the comma is used in Latin countries. We could even mix and match several settings; for instance, in the case of our friend, Alberto, who lives in Miami, we would have used en_US, for currency and numbers, and es_MX ( for México), or es_CU (Cuba) for sorting: setlocale(LC_COLLATE, 'es_MX'); setlocale(LC_NUMERIC, 'en_US'); setlocale(LC_MONETARY, 'en_US'); setlocale(LC_TIME, 'en_US'); 1070
Locale and Internationalization Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com We will probably be better off using the es_US locale, which can be downloaded from one of the above mentioned pages, if not already on the system. However, there are several things that may cause problems. Every time a new language shows up, we will have to go back to the original code, grok what was there, and modify it a little. What would happen, if all of a sudden the user needed another phrase? would they have to modify the whole hash table they had created from scratch? This is why we start to look for something better: a whole framework for internationalization of programs. The idea would be to have a way to store program messages in several languages, and retrieve them using a key, preferably the short version of the message. In reality there are two of them: gettext – a GNU standard tool for internationalization. It is widely used by many GNU ❑ applications. The current version at the time of writing is 0.10.35. It is directly available from http://www.gnu.org/software/gettext/gettext.html and is supported by several tools, and an EMACS mode. For more information regarding this subject, refer to a book such as Professional Linux Programming from Wrox Press, ISBN 1861003013. Locale::Maketext – a complete and purely Perl based solution. At the time of writing the ❑ latest version is 0.18. The documentation (available by typing > Perldoc Locale::Maketext, after installation) includes a synopsis. It is important to note that at present, Locale::Maketext is still in its early stages. For the purposes of our site, we have decided upon gettext, since it has very good support in Perl. The first thing we have to do is to create a so-called Portable Object (PO) file, which contains the necessary information to translate messages. The main elements of PO files are the keywords msgid and msgstr, which contain the plain (in this case, Spanish) and translated string, respectively. A file can just be created using a text editor (or EMACS + PO mode) in this form: msgid "Hola" msgstr "Olá" A set of these messages in a file, along with other information such as comments, and context, is called a catalogue, and corresponds to a domain , which is usually a language. For instance, the file above could be saved as CA.po, for the Catalan language. There are also a couple of editors we can use to edit PO files. POedit, which is available from http://www.volny.cz/v.slavik/poedit/, has a simple to use interface based on the GTK library. As an alternative, if we favor the desktop environment, KBabel is available as a part of the KDE software development kit, which can be obtained from http://www.kde.org . For the time being, we opt for the Perl way, and choose to use Locale::PO, an object oriented class for creating PO files. The following program is written using this module: #!/usr/bin/perl # pocreate.pl use warnings; use strict; use Locale::PO; my $i; 1071
Chapter 26 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com my %hash = ( "EN" => ['Hello', "here's your report for"], "EU" => ['Kaixo', "egunkaria hemen hire"], "CA" => ['Hola', "aquí está el teu inform per"], "GA" => ['Olá', "aquí está o seu relatório pra"], "FR" => ['Salut', "voici votre raport pour"], "DE" => ['Hallo', "ist hier ihr Report für"], "IT" => ['Ciao', "qui è il vostro rapporto per"] ); my @orig = ("Hola", "aquí está tu informe para"); For each element in this hash, a PO file is created. This file receives a reference to an array of Locale::PO objects, which features as main elements' a msgid, containing the key to the message, and a msgstr, the translation of the message to the language represented in the file. After saving the file using the Locale::PO->save_file_fromarray function, it is converted to an internal format using the MsgFormat command. We are using the Spanish equivalent as keys, but any other language, such as English, can be used. Theoretically, these are the files that should be handled by the team of translators in charge of localizing a program. for (keys %hash) { my @po; for ($i = 0; $i msgid($orig[$i]); $po[$i]->msgstr($hash{$_}->[$i]); } Locale::PO->save_file_fromarray("$_.po",\@po); `MsgFormat . $_ < $_.po`; } Running this program produces files similar to the following CA.po (for the Catalan language): msgid "Hola" msgstr "Hola" msgid "aquí está tu informe para" msgstr "aquí está el teu inform per" These files can be used by any program compliant with gettext (that is, any program that uses the libgettext library). In the case of Perl, there are two modules that use them: Locale::PGetText and Locale::gettext. The main difference between them is that the first is a pure Perl implementation of gettext, using its own machine readable file formats, while the second needs to have gettext installed. In both cases, .po files have to be compiled to a machine readable file. We will opt for the first, but both are perfectly valid. Indeed, a program that belongs to the Locale::PGetText module is called in the following line of the above code: MsgFormat . $_ < $_.po; 1072