Thursday, December 20, 2007

Taking Vinyl On A Plane

Subtleties for hyphenation and Spanish package

Following a query in the list ES-TEX I learn that the TeX line-breaking does not behave as expected to certain signs that have a script (or at least does not behave according to English typographic standards.) These are the most shocking behavior: When a script

  1. ( - ) appears in an expression, the line may be cut at that point (logical behavior), but the words that are attached to the script can not be divided (shocking behavior .) So if we supercalifraglístico-espialidoso, the line may be cut off just after the dash, leaving "espialidoso" to the next line, but instead "supercalifragilisticexpialidocious" not be divided. As a very long word, is likely to be protruding from the right margin. If "supercalifragilisticexpialidocious-espilidoso 'complete it in one line, the script is still visible, so it appears as a compound word (a more realistic example:" logical-mathematical system).
  2. command \\ - inserts a hyphen . This tells TeX that can divide the word at that point, if appropriate. However, when I put that command in a word, it keeps the word can be divided by any other place. Again, the example would be supercalifragilisticexpialidocious \\-espialidoso . By using \\ - instead of - , if the word can be completed in a line, the script does not appear (not a compound word). If not fit, then cut the specified point in the script and no other. The command can be used multiple times within the same word, to specify different possible cut points. It is useful to replace the TeX hyphenation rules when applying some bad.
  3. The long dashes, dash and semirraya, obtained with TeX commands and --- - behave like the dash - explained above. That is, it allows the line to be divided after the script, and prevents the division of the word that is attached to it. This is certainly unwanted behavior in Castilian, as it is customary to use the dash as a parenthesis to make paragraphs, like this, and according to the rules of TeX it is possible that the script is opening the final paragraph of a line, while the word that should be attached to it passes to the next line. For the same reason, if there is a punctuation mark after the closing script, "as here," the sign could be just the beginning of the next line. Luckily

English package of babel has it all planned, as we read in the documentation . This package defines a number of useful shortcuts to deal with these problems, so that the behavior of TeX in these cases is controlled by the author.

These are shortcuts that solve each of the above problems:

  1. Shortcut: "= Lets you write a compound word, so that the words on the sides of the script may also be cut. The English manual discouraged their use (may produce a result that ugly), but I think cases like supercalifragilisticexpialidocious "= espialidoso are justified.
  2. Shortcut: "- Adds an optional cut-off point to a word. The words left and right " - can still be divided using the rules to use TeX. An example. The word "semi-opaque" in principle could be divided by TeX for different points, according to the rules of English patterns "is TeX will consider the possibility "semi if they are attached to another word. Not, however, preclude the words that are attached to them can be cut. So, if I ~ --- example ~---, cut points will be considered by TeX "-eg beginning with "r" and the first ends in a vowel (as "semi-scratch), if run together without the hyphen should be doubled 'r' (and thus lead to" semirraya). Although there is no agreement on this standard (recommended by the RAE), if we apply, English provides shortcut "rr and we could put semi " rray . This has the effect that if the word fits together in a line will appear as "semirraya '(with' r 'twice), but if you have to leave after" semi-"then appear on the next line' line '( with the 'r' single).
  3. addition the above is the problem of the division of accented words. This problem occurs only if we OT1-encoded sources as these sources are accented versions of the vowels, then TeX "simulates" these characters by the method of inserting two at the same point (the voice and accent), but This confuses the scripting mechanism to stop working for that word from the point where the accent appears. The English package also seeks to address this problem, but warns that the solution is not perfect. T1 fonts in this problem no longer appears since these sources do have accented versions of each vowel and need not faking. Therefore, if you see that LaTeX poor divide your words marked, be sure to use a font with T1 encoding.