intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Báo cáo khoa học: " mechanical determination of the constituents of german substantive compounds"

Chia sẻ: Nghetay_1 Nghetay_1 | Ngày: | Loại File: PDF | Số trang:12

50
lượt xem
1
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

The MT process comprises four distinctive sub-processes called the input, the identification of input forms, the translation process proper and the output. Initially certain linguistic phenomena seemed likely to prevent the complete mechanization of the identification process. The problem is the following.

Chủ đề:
Lưu

Nội dung Text: Báo cáo khoa học: " mechanical determination of the constituents of german substantive compounds"

  1. [Mechanical Translation vol.2, no.1, July 1955; pp. 3-14] 3 mechanical determination of the constituents of german substantive compounds Erwin Reifler, Far Eastern Department, University of Washington, Seattle tion of compounds by means of the mechanical The MT process comprises four distinc- identification of their constituents. This would tive sub-processes called the input, the identifi- result in a welcome reduction of the size of the cation of input forms, the translation process proper mechanical memory. It is true that the matching and the output. Initially certain linguistic phe- of each compound would be replaced by the nomena seemed likely to prevent the complete m atching of its two or more constituents, and mechanization of the identification process. The t he design of the matching mechanism would problem is the following. have to include provisions for the dissection of Identification presupposes a record of compounds into their constituents. Nevertheless, things remembered, with which everything to be because of the comparatively low frequency of identified is compared. An essential feature of all most compounds, dissection would not be very MT systems will be the “mechanical memory” frequent and would be amply compensated for by which corresponds to the bi-lingual dictionary plus the reduction in the size of the mechanical memory the knowledge at the disposal of the human trans- and the resulting decrease in access time. lator. The head entries of this memory will con- There are, however, two problems which s ist of individual free and bound forms and c omplicate the situation. One is the fact that i diomatic sequences. All input units whether the semantic content of many constituents differs they be words, portions of words, or groups of according to whether they are bound or free forms. words will first have to be identified with their The second is that the conventional written form “memory equivalents” before their “output of the majority of the compounds of certain impor- e quivalents” can be determined mechanically. tant languages lacks graphic indication of the Many important languages include large “seam” between their constituents. Moreover, numbers of compound words which, though they many compounds permit more than one dissection are mostly of low frequency, are essential for into constituents identifiable in the mechanical understanding the context in which they occur. memory. In most cases, however, only one of These compound words are made up of a compara- these is linguistically correct, whilst those in which t ively small number of constituents, many of two dissections are linguistically permissible are which also occur as free forms of higher frequency. extremely rare coincidences. Numerous examples German examples of the latter are Hoch (high) demonstrating these phenomena will be found and gefühl (feeling) in Hochgefühl (exalted feeling) below. a nd m ittag ( noon) in N achmittag ( afternoon); T hese complications are such that it N ach ( after) in Nachmittag i s an example of a seemed at first impossible to create a mechanism very high frequency constituent. which would supply only correct dissections in It is natural to think of economizing cod- every case. No wonder Professor Victor A. Oswald, ing and access time by excluding large and, in fact, in his paper Microsemantics read at the first CON- continuously increasing numbers of compounds F ERENCE ON MECHANICAL TRANSLA- from the mechanical memory, and adding instead TION at M.I.T. in June 1952, stated: “We know the comparatively few constituents which are o f no mechanical process by which this could productive—that is, are found in more than one be accomplished, but an intelligent . . . pre-editor compound—and do not occur as free forms. An c ould indicate the dissection for any sort of example is German seitig (-sided) in einseitig, context.” The only alternative to the intervention zweiseitig, etc., (one-, two-sided, etc.). Consti- of a human agent seemed to be the inclusion in the tuents which also occur as free forms are entitled mechanical memory of all compounds of the source t o a place in the mechanical memory a priori. language, an alternative hardly relished by any Such an arrangement would permit the identifica- linguist or engineer. Nor is it humanly possible, as will be seen as soon as we consider the phe- 1 This paper is a revised version of my Studies in Mechanical- nomenon of unpredictable compounding, customary Translation, No. 7, September 3, 1952.
  2. 4 e. reifler Bluter/zeugung (the begetting of children suffering in many languages and particularly extensive in from haemophilia). German, whose vocabulary is continuously being replenished by this method. Unpredictable com- The “in” in Arbeiterinformationsstelle which is either Arbeiter/informationsstelle (work- pounds can not be coded into the mechanical men information office) or Arbeiterin/formations- memory. If no mechanical solution can be found stelle (female worker formation office; wrong for the problem of the linguistically correct deter- dissection). mination of the constituents of compounds, then The “ur” in Literaturkunde which is either h uman intervention can not be eliminated from Literat/urkunde (man of letters’ document; wrong the identification process of MT. dissection) or Literatur/kunde (knowledge or text- In the following I shall show that there book of literature). a ctually is a very simple mechanical solution The problem becomes more complex when t o the problem presented by unpredictable two or more “X-factors” occur in one substan- compounds. tive compound. For example, Kulturinfiltrierung which is either Kult/ur/infiltrierung (cult earliest 1. Ascertainable and Extemporized infiltration), Kult/urin/filtrierung (cult urine Substantive Compounds. filtering; a semantically impossible interpretation) F or MT purposes we distinguish two or Kultur/infiltrierung (culture infiltration). Such kinds of substantive compounds which we abbre- coincidences are comparatively rare, for formal v iate to “SC”: and semantic reasons, and some of the dissections Ascertainable SC—that is, those which which are possible in terms of forms listed in the are long established and, therefore, can be located dictionary are not likely to prove correct for for- in German dictionaries. Examples are Kleider- mal and/or semantic reasons. Thus one would bürste, Hochachtung, Gehwerk, Nachgeschmack, rather say Allmähliche Durchdringung einer Kultur Buchstabe, Hochzeit, Unternehmer, Gegenstand, or Beeinflussung einer Kultur (gradual penetra- etc. They could all be entered into the “capital tion of a culture) than Kulturinfiltrierung. One memory.” But, as we shall see, a large number of will find Arbeiterinnenformationenstelle (office for these ascertainable SC can, without sacrificing the military formations of female laborers) instead source-target semantic clarity, be mechanically of Arbeiterinformationsstelle, and Literatenurkunde s ynthesized out of “memorized” constituents. (document of men of letters) instead of Literatur- E xtemporized SC —that is, those which kunde because Arbeiterin and Literat, though they are the result of new free composition, for example are substantive forms listed in the German dic- Marsuraniummonopolskandal. Their potential t ionary, would not be used as first constituents number is practically infinite. They can, therefore, i n these compounds. And D ichterinbrunst c an n ot be entered into any memory. only be Dichter/inbrunst (poet’s fervour), but hardly Dichterin/brunst (a poetess’ male-animal- 2. The “X-Factor” In German like sexual excitement). Substantive Compounds. Nevertheless, since the only basis for the A number of SC are characterized by what mechanical determination of the constituents of a I call an “X-factor.” It is this occurrence of X- SC is the occurrence or non-occurrence of the factors which presents the main difficulty in the memory equivalent of an input form in the MT mechanization of the determination of the consti- memory, such cases have to be considered in the tuents of SC. X denotes a letter or letter sequence solution of the problem. which could be part of the preceding as well as of In order to meet these conditions, a solu- the following constituent of a SC. See the follow- tion is suggested here for the mechanical deter- i ng examples, some of which have not yet mination of the “seam” or junction between every occurred: set of two constituents of a compound. This solu- The “t” in Wachtraum which is either tion requires a special memory apparatus based Wach/traum (day dream) or Wacht/raum (guard on the following considerations: room). T he primary aim of all translation is The “er” in Bluterzeugung which might be a ccess to the meaning of a foreign text. In MT e ither B lut/erzeugung ( blood production) or
  3. german compounds 5 The following German forms have initial the primary aim is quick access to the meaning. capitals: Access time depends largely on storage economy. a) After final punctuation marks (period, ques- If in matching every input form the whole store tion mark, exclamation mark, the colon pre- o f entries has to be scanned, then access time ceding direct discourse) all first words. will play a great role. But if, through the exhaus- tive utilization of all distinctive graphic features b) In all positions: of the different types of source forms (letter se- 1. All forms of pronouns used in address in- quence, capital initials, occurrence or absence of stead of du, and, in letter writing, all pro- space, punctuation marks, conventional diacritic n ouns (including d u) r eferring to the ad- marks, etc.) and through the use of a categorized dressed person. storage system, the different types of source forms 2. All adjectives derived from personal names can be directed to specific sections of the storage by the suffix -isch. system, then the dependence of access time on 3. All adjectives, pronouns and ordinal num- storage economy decreases in proportion to the bers in titles and in historical and geograph- increase of categorization. ical names. Consequently, full utilization of all dis- 4. All invariable word forms with the suffix t inctive graphic features of the source text and -er, derived from place names of provinces a categorization on different levels of the storage or federal states. system are important requirements of this scheme. 5. All substantives with the exception of cer- In planning the contents of the memory I have tain petrified forms and certain forms used given precedence to source-target semantic re- in idomatic expressions. q uirements over storage economy wherever All words with initial capital letter, other possible. than demonstrative adjectives, pronouns, non- adjectival adverbs, prepositions, conjunctions and 3 . The Capital Memory. interjections are directed to the capital memory. One of the facts on which this solution is (In a separate paper2 I have discussed how they based is the conventional capitalization in German a re sorted and how those not directed to the of the initial letters of all forms occurring immedi- capital memory can, immediately after input, be ately after a final punctuation mark, and of the directed to their specialized memory.) overwhelming majority of German substantive Special provision has to be made for cases forms and of a number of other forms in all posi- of initial-capital words after final punctuation tions (for examples see below). The graphic dis- marks which may belong to more than one form tinctiveness thus enjoyed by German substan- class. A striking example is Dichter ist der Hahn tives not preceded by a final punctuation mark geworden which could mean either “The faucet has makes it easy to direct them immediately to a become tighter” or “The cock has become a poet.” special memory. But since substantives also occur The ambiguity is here due to antiposition which, as first words after a final punctuation mark, cer- though not a feature of the normal word order, is tain measures have to be taken to make sure that fairly frequent in German. all substantives reach their matching centre via All substantives with initial capitals are the shortest possible route. t reated in the capital memory. Those without These measures are the dissection of initial capitals are, through the combination of compounds, economy of access time, and consid- this fact with their letter sequence and with the e rations of source-target semantics. They make fact that they are preceded by certain types of it necessary to divide the German MT memory words, highly distinctive. They can be dealt with into a number of sub-memories. One of these by mechanical processes tailored to the different sub-memories is the capital memory for the treat- problems they present. ment of all substantives. All other initial-capital words directed to At this point, it is desirable to consider the capital memory are first matched there—that German words beginning with a capital letter in some detail. 2 This subject is treated in some detail in my chapter “The Words With Initial Capital Letter. Mechanical Determination of Meaning” in Machine Trans- lation of Languages, New York (John Wiley & Sons), 1955.
  4. 6 e. reifler Ohnmacht. is, if they occur also as constituents of SC. If, h owever, no match is found there, they are c) All ascertainable SC whose target meanings p assed through the remaining memories in a cannot be inferred from the meanings of the fixed sequence. target equivalents of their constituents be- c ause the juxta-position of those meanings: 4. The Contents of the Capital Memory. 1. d oes not make sense. For example M it- C ertain forms are not included in the g ift ( dowry) composed of mit (with) and capital memory, though they may begin with a Gift (poison). c apital letter. They are: 2. makes the wrong sense. For example, a) Extemporized SC. Hochzeit, composed of hoch (high) and b) Ascertainable SC whose target meaning is “ Zeit” (time), together “high time,” but inferable from the meaning of the target equi- actually meaning “wedding” or “nup- v alents of their constituents. For example, tials.” An example showing that the dif- Hochland, composed of Hoch (high) and land f erence can sometimes be very great is ( land). The target meaning of H ochland i s Unternehmer, composed of unter, meaning “highland.” “under,” and N ehmer, m eaning “taker,” c) A ll unproductive constituents which do not the combined form actually means “con- occur as free forms; if all ascertainable SC in tractor” or “employer,” not “under- w hich they occur are listed in the capital taker.” memory. For example, Ohn in Ohnmacht 3. permits multiple interpretation because of (fainting fit). the multiple meanings of the target equi- valent of at least one of the constituents. Most capitalized forms are included in the For example, E in in Einverständnis may capital memory, as follows: m ean “in” as in E ingang ( “ingoing”— a) All non-compound substantives. t hat is “entry, entrance”) or “one” as in b) Every SC constituent which: Einklang (“unison”). In Einverständnis 1. O ccurs as a free substantive form. For ( agreement) it means “one.” example, Zeit (time) in Hochzeit (wed- ding). 5. Source-Target Semantics in the Planning 2. O ccurs as a free, though not substantive of the Capital Memory. f orm, if not all of the ascertainable SC The rules stated and exemplified in 4 and i n which it occurs are entered into the especially in 4c will prevent a large number of capital memory or if it is still productive. potential source-target ambiguities and nonsensi- An example is, Hoch- in Hochzeit. Hoch- cal target results. But there is another potential land will not be “memorized” because its cause of source-target semantic difficulties. Many target meaning “highland” is inferable SC share a first or second constituent which has f rom the meaning of the target equiva- o nly t wo p ossible meanings, one characteristic lents of the constituents, “high” and of one group of the SC concerned and the other “land.” An example showing the con- characteristic of the other group. The most satis- tinued productivity of such forms is factory solution of this problem is as follows: “grass” in Grossneptunien (the world a) If the target meanings of all SC involved can e mpire on the planet Neptune). b e inferred from the meanings of the target 3. D oes not occur as a free form, if not all e quivalents of both their constituents, then o f the SC in which it occurs are "mem- w e enter the smaller one of the two groups o rized" or if it is still productive. This of SC into the memory unless the constituent rule takes care of all compounding forms or constituents concerned are still productive such as Geschichts (history) in Geschichts- in one of their two meanings. If both groups unterricht (teaching of history), or Ur happen to have an equal number of members, in Ureinwohner meaning “aborigine” then we choose either one or the other group ( this U r- i s not of the same origin as the for “memorization.” f ree substantive form U r d enoting the b) If the target meanings of one group cannot European buffalo) as against Ohn in
  5. german compounds 7 composing forms. The free form Ur means be interred from the meanings of the target “aurochs” (primitive European bison) e quivalents of b oth their two constituents, a nd occurs as a constituent ( Ur- 1 ) o nly t hen this group is entered. in one SC, Urochs (aurochs). The free c) In all these cases we enter the two constituents form of Ur-1 belongs to the poetical style o f that group of SC which are not "memor- ized," and the constituent which both groups and is not commonly used. Wherever else share is entered into the capital memory Ur- occurs in an SC, it will be first under- stood to be “Ur-2.” “Extemporizers” w ith that meaning in the first position it has in that group of SC which are not “mem- will, therefore, avoid forming new SC w ith U r- 1 . T hey will use the more com- orized,” (see e). For example, Brech- in Brech- eisen (break-iron, i.e., crowbar) and Brech- mon synonym Auerochs (or, rarer, Urochs) stange (break-stick, i.e., crowbar), etc., means instead. Since U rochs is thus the only “break,” whereas in Brechdurchfall (vomit- SC in which U r- 1 (aurochs) will occur, d iarrhoea), B rechweinstein ( vomit-tartar, it will be entered into the capital memory t artar emetic), etc., it means “vomit.” If the in order to avoid confusion with the highly group of SC in which Brech means “break” is productive Ur-2. "Ur-2" occurs in a the smaller one, then we enter all SC of this n umber of ascertainable SC and is still g roup and enter the constituent B rech in the p roductive. It means “original, earliest, s ense of “vomit” in the first position. f irst.” The target meanings of one group d) If, as far as such cases are concerned, a con- of the ascertainable SC containing it can s tituent also occurs as a free form—that is, n ot be inferred from the meanings of the if its free form is identical with its compound- t arget equivalents of their constituents, i ng form, then there are the following two as, for example, Urkunde (document), possibilities: U rteil ( judgment). Thus, as far as the 1. T he free form has only that one of the problem of Ur-2 itself and the group two meanings of its compounding form, of SC containing it is concerned, the w hich the latter has in the group of SC p rocedure described above, especially in n ot entered. The treatment of this case b , will take care of it. But for the solu- is identical with that of a free form which tion of the problem presented by the con- has the same meaning or meanings as its trast between Ur-2 and the free form graphically identical compounding form U r certain graphio-mechanical arrange- none of whose SC are entered, as for ex- ments are necessary. These can be under- ample the free form Arbeiter and the com- stood only after a description of the pounding form Arbeiter- or -Arbeiter.) m atching procedure has been given and I n both these cases only the free form they will be discussed in a separate paper. needs to be entered. The graphio-mechan- I s hould like to say here, however, that ical arrangements in the input and match- these graphio-mechanical arrangements i ng system and in the capital memory, and the solution of the Ur vs. Ur-2 prob- required to make this possible, will be lem based on them are remarkably simple. discussed elsewhere. e) The target meanings of extemporized SC are 2. T he free form has both meanings of its m ostly inferable from the meanings of the graphically identical compounding form target equivalents of their constituents. These or it has more or entirely different mean- constituents are not likely to carry meanings i ngs. (The question of the common or t hey do not have as free forms or as compo- d ifferent origin of the free and the com- n ents of ascertainable SC. But they may pounding form plays here no role whatso- carry a meaning occurring only in SC which ever.) Here both forms have to be enter- are “memorized.” Therefore, wherever this is e d. This situation is exemplified by the the case, the criterion for the choice between free substantive form Ur, the two graphi- the two groups of compounds described in a) cally identical composing forms Ur-1 a nd U r - 2 and the SC containing these c an not be their size, but must be the con- t inued productivity of one of the two mean-
  6. 8 e. reifler ings of the constituents concerned. The group resource, etc.” and “-ecutive” from “executive”). of compounds none of whose constituents is “ I(ILTX) plus I(XIRT)” would then be the still productive will be coded into the mem- English substantive compound “literatuin-reecu- o ry. The other group will be excluded and tive.” If the right constituent is the possible t he still productive constituent or consti- “executive,” then we get the impossible “litera- tuents will be coded only with the meaning tuin-executive”; if the left constituent is the pos- characteristic of this group—which is the sible “literature,” we would arrive at “litera- meaning in which the constituent or constitu- turereecutive.” ents concerned are still productive. Also, if a 7. All Possible Types of Substantive group of compounds, which has to be “mem- Compounds With Two Constituents. orized,” because the meanings of their target Consequently we need consider only the e quivalents can not be inferred from the first five alternatives for both the first and the meanings of the target equivalents of their s econd constituent. This gives us the following constituents, has a constituent which is still 25 theoretical combinations. (For semantic reasons productive, the constituent has to be “mem- the examples given are partly unlikely to occur.) orized” too. I. 6. All Possible Types of German 1. PLT p lus PRT Substantive Constituents Senn idyll Alpine herdsman’s idyll. We shall now break down German SC, in- 2. PLT p lus IRT Senn dustrie A n impossible com- to all possible types of constituents relevant for pound. The trunk Das- t heir determination. Substantive constituents t rie f rom I ndustrie not accompanied by an “X”-factor, I call “trunk” (industry) does not occur. o r “T,” the left trunk “LT,” the right trunk 3. PLT plus P(XPRT) “ RT.” If the left constituent contains an “X”- Senn inschrift Senn, inschrift (inscrip- tion), Schrift (writing) factor, it will be denoted by “LTX,” the right (Cf. 11a) and also Sennin (Alpine constituent containing an “X”-factor by “XRT.” herdswoman) occur. If the left or right constituent occurs in the capi- 4. PLT plus P(XIRT) t al memory, their notation will have the prefix Senn industrie Alpine herdsman’s in- “ p” (possible), if they do not occur, it will have (Cf. 12) d ustry. The trunk Dustrie does not occur. the prefix “I” (impossible). Theoretically speak- 5. PLT plus I(XPRT) ing, this gives us the following types of substan- Senn ingabe Ingabe does not occur, tive constituents. (Cf. 11b) but Senn, Sennin and Gabe (gift) occur. Left Right I. PLT I. PRT II. I I. ILT II. IRT 6 ILT . plus PRT Insul halt An impossible SC. Halt I II. P(PLTX) III. P(XPRT) occurs but Insul does not IV. P(ILTX) IV. P(XIRT) occur. V. I(PLTX) V. I(XPRT) 7. ILT p lus IRT VI. I(ILTX) VI. I(XIRT) Insul dustrie An impossible SC. Nei- t her the trunk D ustrie O f these the left and right forms under o f I ndustrie n or the VI drop out at once because substantive com- trunk Insul of Insulin p ounds which have the form “I(ILTX) plus occurs. I (XIRT)” or in which either the first constitu- 8. ILT plus P(XPRT) ent has the form “I(ILTX)” or the second con- Insul intoleranz Insul does not occur, but (Cf. 16a) Intoleranz, Toleranz and s tituent the form “I(XIRT)” are linguistically also Insulin all occur. impossible in all languages. Consider, for ex- 9. ILT plus P(XIRT) ample, the following monstrosities concocted from Insul industrie An impossible SC. Both English material: “literatuin” (“literatu-” from (Cf. 17) Insulin and Industrie occur, but neither Insul “literature” and “-in” from “aspirin, insulin, nor Dustrie occur. e tc.”) and “reecutive” (“re-” from “resumption,
  7. german compounds 9 V. 10. ILT plus I(XPRT) Insul ingabe Neither Insul nor Ingabe 21. I(PLTX) plus PRT (Cf. 16b) occur, but Insulin and Steinin schrift Steinin does not occur, al- Gabe (gift) occur. though Schrift occurs. But both Stein and In- III. schrift occur. 11. P (PLTX) plus PRT 22. I(PLTX) plus IRT Sennin a) schrift Sennin, Schrift (or Gabe) Steinin sel B oth S teinin a nd S el d o b ) g abe all occur. Also Senn and n ot occur, but S tein (Cf. 3 5) Inschrift occur, but In- (stone) and Insel (island) gabe does not occur. occur. 12. P (PLTX) plus IRT 23. I(PLTX) plus P(XPRT) Sennin dustrie The trunk Dustrie does Steinin inschrift An impossible SC. Stein, (Cf. 4) not occur, but both I n- Inschrift and Schrift oc- d ustrie a nd S enn o ccur. cur, but neither Steinin 13. P(PLTX) plus P(XPRT) nor Ininschrift occur. Sennin inschrift Alpine herdswoman’s in- 24. I(PLTX) plus P(XIRT) scription. But also Senn Steinin insel An impossible SC. Stein and Schrift occur, though and Insel occur, but nei- Senninin and Ininschrifl ther Steinin nor Ininsel do not occur. occur. 14. P(PLTX) plus P(XIRT) 25. I(PLTX) plus I(XPRT) Sennin industrie Alpine herdswoman’s in- Steinin ingabe An impossible SC. Stein dustry. Senn, Sennin and and Gabe occur, but nei- Industrie all occur, but ther Steinin nor Iningabe D ustrie a nd I nindustrie occur. do not occur. Of these 25 combinations 2, 6, 7, 9, 15, 17, 15. P(PLTX) plus I(XPRT) 20, 23, 24 and 25 are linguistically impossible. Of Sennin ingabe An impossible SC. Senn, the remaining 15 combinations, 3 and 1la, 4 and 12, Sennin and Gabe occur, but neither Ingabe nor 5 a nd l1b, 8 and 16a, and 10 and 16b represent Senninin nor Iningabe the same SC; 3 and 11a present, moreover, two occur. possible dissections of the same SC (i.e. Senn/ IV. inschrift, A lpine herdsman’s inscription, and 16. P(ILTX) plus PRT Sennin/schrift, Alpine herdswoman’s writing). Insulin a) toleranz Insulin tolerance or in- Thus only 5, 8, 10, and 12 can be ignored. This b) gabe sulin gift. Intoleranz oc- leaves us with the following eleven possible types (Cf. 8 & 10) curs, Ingabe does not oc- of SC: cur; the important fact is, however, that Insul does 1,3,4 not occur. 11 a & b, 13, 14 17. P (ILTX) plus IRT 16 a & b, 18, 19 Insulin dustrie An impossible SC. B oth 21 and 22. (Cf. 9) Insulin and Industrie Of these eleven types only two types with occur, but neither In- an identical graphic form, 3 and 11a, are ambigu- s ul n or D ustrie o ccur . 18. P(ILTX) plus P(XPRT) ous. From the point of view of the matching mech- Insulin information Insulin information. In- anism these two types are only one type, so that sulin, Information and only ten types remain. Thus only in one out of ten Formation all occur, but possible types will the matching mechanism have Insul, Insulinin and In- to supply a double answer. (But see “Compounds information do not occur. 19. P(ILTX) plus P(XIRT) W ith An X-Factor,” section II, below.) In all Insulin Industrie Insulin industry. Neither other cases the answer will be unique. Further- Insul, Dustrie, Insulinin more, since all the unique answers and the one nor Inindustrie occur. double answer are obtained in one to four match- 20. P(ILTX) plus I(XPRT) ing steps, the remaining ten types present only Insulin ingabe An impossible SC. Insulin four possible matching situations with which the and Gabe occur, but nei- ther Insul, Ingabe, nor design engineer has to deal. For these I refer to Insulinin occur. Section 10, below.
  8. 10 e. reifler 8. Matching Procedure for Substantives jected to the same process until the memory equivalents of all substantive components have W hich Have A Complete Memory been located. The constituents established by this Equivalent And For Substantive process are individually translated in their original Constituents. sequence. As we have seen in 4, only free substan- All substantives not found as complete tive forms and productive substantive constitu- e ntries or determined through the matching ents are entered into the capital memory. Substan- p rocess described above appear on the target tive constituents which also occur as free, though side in their original form. In the following each completed matching not substantive, forms are entered only as com- p rocedure will be called “one matching step.” pounding forms. Thus the “substantivized” adjec- tive Rot (Das Rot der Vorhange passt nicht zur 9. Matching Procedure For Farbe der Teppiche “the red of the curtain does Mechanical Determination Of not suit the colour of the carpets”), the compound- Constituents Of All ing forms Rot (Rotstift, red crayon), -gelb- and Substantive Compounds. “grün” (das Rotgelbgrün der bolivianischen I . Left To Right Matching. Handelsflagge “the red-yellow-green of the Boli- P (PLTX) v ian merchant flag”), and M it- i n the sense of A. If RT has no memory equivalent, ( Sennin/ “co-” (Mitarbeiter, Mitbesitzer, Mitbürger, co- I RT P(PLTX) IRT worker, co-owner, co-citizen) etc., will be entered, d ustrie, Schülerin/vasion, c f. 7/12), then but not the free adjective forms rot, gelb, grün, the matching mechanism feeds back LT (Senn, h och, n or the free preposition form m it. T hese Schüler, male student) and XRT (Industrie, will be entered in their own specialized memories. I nvasion) a nd determines the memory code O n the other hand SC like M itgift a nd M ittag f or LT and XRT. would be “memorized.” P (ILTX) The capital memory is subdivided into B. If RT has a memory equivalent, ( Insulin/ s ections characterized by the number of com- P RT P(ILTX) PRT ponent minimal symbols (space and letter sym- toleranz, Insulin/gabe, cf. 7/16), then the bols) of entries. Thus entries with five minimal matching mechanism feeds back LT (Insul) symbols will be in the five-symbol section, en- and, tries with four symbols in the four-symbol section, I LT and so forth. Within each section the order is l.if LT has no memory equivalent, (Insul/ alphabetical. The input mechanism counts the P (XPRT) ILT P(XPRT) m inimal symbols of each form fed into it and intoleranz, Insul/ingabe, cf. 7/8,10), then directs those forms which have not previously the matching mechanism supplies the mem- been directed to other memories2 at once to the o ry code for LTX (Insulin) p lus RT (Tol- capital memory section indicated by the number eranz, Gabe). of symbols. P LT Such an arrangement will go far to cut 2 . If LT has a memory equivalent, ( Stein/ down the access time: substantives are checked P (XPRT) only against the capital memory, and within the inschrift, cf. 7/21), then the matching mech- capital memory only against memory equivalents a nism feeds back XRT (Inschrift) a nd, with the same number of letters. If the memory P LT counterpart of a substantive form does not occur a) if XRT has no memory equivalent, (Senn/ in the section characterized by the number of its I (XPRT) PLT I(XPRT) s ymbols, the matching mechanism ignores the ingabe, Wäscher/inzeichen, cf. 7/5), then l ast symbol and checks the remainder against the matching device supplies the memory the section with the next smaller number of sym- code for LTX (Sennin, Wäscherin, laun- bols. This process is repeated until the first agree- d ress) plus RT (Gabe, Zeichen, mark). ment is found. The sequence of symbols previously P LT ignored is then fed back as a new input and sub-
  9. german compounds 11 b ) If XRT has a memory equivalent, (Senn/ 10. Number of Matching Steps P (XPRT) Necessary for Mechanical Dissection inschrift, cf. 7/3 and 11a), then the of Substantive Compounds with m atching mechanism has to supply two Two Constituents. answers: the memory code for The matching mechanism always deter- L TX plus RT ( Sennin/schrift) a nd for mines first the longest memory equivalent. We L T plus XRT ( Senn/inschrift). are here concerned with the number of matching I I. Right-To-Left Matching. steps of only those SC which do not occur in the N ote:Left-To-Right m atching presents the simpler engi- capital memory. We distinguish the following neering problem. Right-To-Left matching has the a dvantage that it tackles first the final constituent possibilities: which can only be the compounding form of an existing a) No constituent occurs in the memory. or non-existing (cf. “-nahme” in “Landnahme” land b) O nly one constituent occurs in the memory. taking) substantive and contains all the grammatical c) Both constituents occur in the memory. i nformation there is about the SC in which it occurs. I LT Those with only one or no constituent A . If LT has no memory equivalent, ( Insul/ occurring in the capital memory are at once di- P (XPRT) ILT P(XPRT) rected to the output print system and put out in intoleranz, Insul/ingabe, cf. 7/10), then the their source form as are all other words not found matching device feeds back LTX (Insulin) and in the memory. R T ( Toleranz, Gabe) a nd determines the F or SC both of whose constituents occur m emory code for LTX and RT. i n the capital memory we distinguish between: P LT a) C ompounds without an “X”-factor. B. If LT has a memory equivalent, (Senn/ b) C ompounds with an “X”-factor. P (XIRT) PLT P(XIRT) In the following only “left-to-right” i ndustrie, Schüler/invasion, cf. 7/4), then the matching will be considered. matching mechanism feeds back RT (Dustrie, The examples represent types of com- Vasion) and, pounds. They need not actually occur. P (PLTX) Compounds Without An “X”-Factor l .if RT has no memory equivalent, (Sennin/ F or compounds without an “X”-factor I RT P(PLTH) IRT (i.e. Nach/geschmack, “after-taste,” Senn/idyll, dustrie, Schülerin/vasion, cf. 7/12), then the “Alpine herdsman’s idyll”; cf. 7/1) we receive a matching mechanism supplies the memory u nique answer after the last letter (in right-to- code for LT (Schüler, Senn) plus XRT (In- left order) of the second constituent (that is, the vasion, Industrie). g of -geschmack and the i of -idyll) has been ig- I (PLTX) nored by the matching mechanisms—that is, after 2 . If RT has a memory equivalent, ( Steinin/ the first matching step. The determination of Nach- P RT a nd S enn- a s largest memory equivalents—that s chrift, c f. 7/21), then the matching mech- is, as first constituents—determines -geschmack a nism feeds back LTX (Steinin) a nd, and -idyll as second constituents. a) if LTX has no memory equivalent, Compounds With An “X”-Factor I (PLTX) PRT I. Compounds Always Yielding A Unique Answer (Steinin/schrift), then the matching device A . A fter The First Matching Step s upplies the memory code for LT ( Stein) C ompounds yielding a unique answer p lus XRT ( Inschrift). a fter the f irst m atching step because the form b) If LTX has a memory equivalent, with first trunk plus “X” (Steinin- in the follow- P (PLTX) PRT ing examples) does not exist. (Sennin/schrift, cf. 7/11), then the match- The following facts can be ignored by the ing mechanism has to supply two answers: machine and the memory designers: t he memory code for 1. The second trunk exists: L T plus XRT ( Senn/inschrift) a nd for S teinin-schrift (Cf. 7/21. Solution: S tein/ L TX plus RT ( Sennin/schrift). inschrift, stone inscription.)
  10. 12 e. reifler does not exist: 2 . The second trunk does not exist: 1. There is only one “X”-factor between Steinin-sel (Cf. 7/22. Solution: Stein/insel, t he two trunks: “stone island.”) Sennin-gabe (Cf. 7/5. Solution: Sennin/ B. A fter The Second Matching Step gabe, “Alpine herdswoman’s gift.”) C ompounds yielding a unique answer 2. There are two identical “X”-factors be- after the second matching step because the second tween the two trunks. The following facts trunk (-dustrie, -vasion in the following examples) c an be ignored by the planners: does not exist. a) The trunk of the second constituent The following facts can be ignored by the (-dustrie in the following example) planners: does not exist: l . The first constituent has only one “X”- Sennin-industrie (Cf. 7/14. Solution: factor: Sennin/industrie, “Alpine herds- Sennin-dustrie (Cf. 7/4. Solution: Senn/ woman’s industry.”) industrie, “Alpine herdsman’s industry.”) b) The trunk of the second constituent 2. The first constituent has two “X”-factors: (-schrift in the following example) Arbeiterin-vasion (Solution: Arbeiter/ exists: Sennin-inschrift (Cf. 7/13. Solution: invasion, “workmen’s invasion.”) Sennin/inschrift, “Alpine herdswoman’s C. A fter The Third Matching Step inscription.”) Compounds yielding a unique answer II. Compounds Yielding A Double Answer After a fter the t hird m atching step because the first the Fourth Matching Step Unless the "Ur"- trunk (Insul- in the following examples) does not P roblem Solution Is Incorporated In the exist: Matching Mechanism. 1. There is only one “X”-factor between Compounds all of whose trunks (Literat t he two trunks. The following facts can and Welt in the following example) and forms b e ignored by the planners: w ith t runk plus "X"-factor a s well as " X"-factor a) T he second trunk can not have an “X”- p lus trunk (Literatur a nd U rwelt i n the follow- f actor prefix (-ingabe i n the following ing example) occur in the capital memory, but example does not exist): whose left trunk (Literat) does not occur as a left Insulin-gabe (Cf. 7/16b. Solution: In- constituent of SC, would, unless the “UR”-prob- sulin/gabe, “ insulin gift.”) lem solution (cf. 5/Db) is applied, yield a double b) T he second trunk can have an "X"- a nswer after the f ourth matching step. factor prefix (-intoleranz in the follow- S uch compounds are, for formal and ing example exists): semantic reasons, rare coincidences: Insulin-toleranz (Cf. 7/16a. Solution: Literatur-welt: Insulin/toleranz, “insulin tolerance.”) Solution a) Literatur/welt, world 2. There are two identical “X”-factors be- of literature—correct dissection. tween the two trunks. The following facts Solution b) Literat/urwelt literary c an be ignored by the planners: man’s primeval world—wrong dissection. a) The second trunk (-dustrie in the follow- Since Literat cannot be a first constitu- ing example) does not exist: ent, the Ur-problem solution is applicable and a Insulin-industrie (Cf. 7/19. Solution: unique answer will be supplied by the matching Insulin/industrie, “insulin industry.”) mechanism after the third matching step: the b) T he second trunk ( -formation i n the compounding form Literat- will not be found in following example) exists: Insulin- the capital memory. information (Cf. 7/18. Solution: Insulin/ information.) The case of the following Russian ex- ample is similar: D. After The Fourth Matching Step rybo-lovu C ompounds yielding a unique answer Solution a) :rybo/lovu, to a fisher- after the fourth matching step because the form man—correct dissection. with “X”-factor plus second constituent (-ingabe, -inindustrie, -ininschrift in the following examples)
  11. german compounds 13 t he procedure is repeated until all constituents S olution b ) :ryb/olovu, to the tin are determined. of the fishes—wrong dissection. Let us assume that all non-compounded Both trunks ryb (genitive plural of ryba, constituents of Grieselbärintelligenzexperiment f ish) and - lovu ( compounding form meaning “to occur in the capital memory. The first longest a catcher”; cf. ptitse/lovu, to a fowler, and kryso/ signal sequence with a memory equivalent es- lovu, to a rat-catcher), and also the composing tablished by the matching device will then be f orm r ybo- ( a trunk-plus-“X”-factor form) and Griesel- (grizzly), and Bärintelligenzexperiment the free form olovu meaning “to the tin” (an “X”- will be fed back. Note the “X”-factor -in- after factor-plus-trunk form) will occur in the capital B är. Bär m eans “bear,” B ärin “ female bear.” memory. The connective vowel - o- is an “X”- The first longest signal sequence now established factor. But the trunk ryb cannot be a first con- will be Bärin, and -telligenzexperiment will be stituent and the compounding form ryb- will, fed back. Since no portion of this rest can be found therefore, not be found in the capital memory. in the memory (-telligenz does not exist), the C onsequently, the matching mechanism will matching device will feed back Bär (cf. 9/I), locate s upply a unique and the correct answer after its memory equivalent and feed back Intelligenz- t he third matching step. experiment. It will now establish Intelligenz as I II . C ompounds to Which the "Ur"-Problem the first longest signal sequence occurring in the Solution Cannot Be Applied and Which, capital memory and E xperiment as the last Therefore, Always Yield a Double Answer constituent. Solution: A fter the Fourth Matching Step. Griesel/Bär/Intelligenz/Experiment, For compounds in which two dissections Grizzly bear intelligence experiment. are formally correct and semantically valid, the “Ur”-problem solution is not applicable. These 12. Vocabulary Research: Lexical w ill, therefore, always yield a double answer Information Required. after the fourth matching step. Such composita The solution suggested in the preceding are, however, extremely rare coincidences: pages for the mechanical determination of the 1. “Sennin-schrift” constituents of all substantive compounds indi- Solution a): Sennin/schrift, Alpine c ates the type of qualitative and quantitative herdswoman’s writing (cf. 7/11a). lexical information required for the planning of Solution b): Senn/inschrift, Alpine the capital memory and the matching mechanism. herdsman’s inscription (cf. 7/3). T he most important points of this information 2. “Wacht-raum” are: Solution a): Wacht/raum, guard room. 1. How many and which non-compound substan- Solution b): Wach/traum, waking tives, substantive compounds and non-sub- dream, daydream. s tantive forms belonging to the general lan- In such cases the MT mechanism will g uage, or only to a specialized language, are supply two alternative translations. eligible for the capital memory? 1 1. The Mechanical Dissection of 2. H ow many and which ascertainable SC can Substantive Compounds With More b e “synthesized” without any loss in source- Than Two Constituents. target semantic clarity? 3. How many signal number sections will be The solution for the mechanical dissection necessary? What will be the number of source of SC with two constituents includes the solution forms in each section? for the mechanical dissection of SC with more than 4. How many and which eligible forms are unpro- two constituents. For the matching mechanism d uctive, have been productive, a re s till pro- such composita are nothing but SC with two ductive: are or are not "X"-factor forms; have immediate constituents, namely the largest first n on-distinctive, distinctive o r b oth types o f signal sequence which has a memory equivalent, c omposing forms; c an only occur as l eft con- plus the rest. Once the longest first signal sequence stituents ( cf. L ehr-), o r only as right constitu- with a memory equivalent is established, the ents (cf. -lehre, -kandidat, -nahme), or as both m atching mechanism feeds back the rest, and
  12. 14 e. reifler world). (cf. Arbeiter-, -arbeiter); which forms cannot, or 8. Since all German words after a final punctuation are not likely to, occur as constituents of proper m ark have a initial capital letter, vocabulary names of source language origin (cf. Erziehung, r esearch will also have to determine all ascer- e ducation, V erwundung, w ounding, T isch, tainable substantives whose graphic form— t able, S essel, c hair, etc., etc.); which forms a part from the initial capital letter—is identi- o nly occurring as r ight c onstituents a re not c al with that of a form belonging to another listed in dictionaries (cf. -nahme)? form class. 5. I n how many and which cases do the f ree a nd 9. A nother important category which should be t he n on-distinctive c ompounding forms have e stablished in the course of this vocabulary the same, different, or only two meanings, o ne research is all two-initial-letter combinations c arried only by the free, the other only by the possible in the source language and the size of compounding form? the membership in each combination group. To 6. I n how many and which cases does the com- g o beyond the second initial letter would not p ounding form have the s ame meaning i n. a ll be practical because three-letter words are fre- SC in which it occurs (cf. Arbeiter-, -arbeiter); quent. The membership of each signal-number when does it have two meanings, one associated s ection of the capital memory could then be w ith o ne, the o ther w ith a s econd group of SC further subdivided into groups of source forms in which it occurs? with the same two-initial-letter combinations. 7. H ow many and which SC p ermit double dis- The matching mechanism would then compare section? To how many and which ones can each source form only with those memory equi- the "Ur"-problem solution be applied, i.e.: valents in the signal-number section concerned a) How many and which “X”-factor forms which have the same two-initial-letter sequence. h ave a “possible” trunk or an “impossible” T his procedure would further reduce access trunk? t ime to a degree where it would be negligible b) How many and which “X”-factor forms occur f rom the MT point of view. w ith the same “X”-factor? 13. Conclusion c) H ow many and which “X”-factors occur? The mechanical identification—demon- I m ay add here that some “X”-factors strated here for the German language—of all are, for morphological reasons, of frequent occur- compounds which are not included in the mechan- r ence (for example - er-, -in- a nd - ur-); o thers, ical memory and lack graphic indication of the for formal and semantic reasons, are rare (for ex- b oundaries between their constituents is, of a mple the - t- i n W achtraum). c ourse, applicable to other languages. Only “X”-factors can be easily located in the m inor modifications in the mechanical design vocabulary by determining whether, after one or and in the programming will be necessary to take more final or initial letters of a productive or care of differences in the graphic distinctiveness p otential substantive constituent are dropped, of form classes, such as the absence of the capital- the remaining letter sequence represents another ization of substantives, other than proper names, p roductive or potential substantive constituent. in non-initial positions. Other minor adjustments Examples are the finals and initials in Wacht in this scheme will make it possible to eliminate (guard), Wach- (waking), Traum (dream), Raum from the mechanical memory most free and bound (room), in relation to Wachtraum; Traum (dream), forms of dual nationality which has been treated T rau- ( wedding), Mahnung (exhortation), separately. Ahnung (foreboding), in relation to Traumahnung The importance of the mechanization of ( dream foreboding); L ehrer ( teacher), L ehr- this part of the identification process of MT lies (teaching), Erzeugnis (produce), Zeugnis (certifi- in the fact that it solves the problem of unpre- cate), in relation to Lehrerzeugnis (teacher’s dictable compounds and makes possible a sub- certificate); Bärin (female bear), Bär (male bear), stantial reduction in the size of the mechanical Instinct (instinct)—containing an “impossible” memory with a resultant decrease in access time. trunk -stinkt—in relation to Bärinstinkt; Kultur The compound effect of these results in the lower- (culture), Kult (cult), Urwelt (primeval world), i ng of the cost of MT is obvious. Welt (world), in relation to Kulturwelt (civilized
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2