[Date Prev][Date Next]
[Thread Prev][Thread Next]
[Date Index]
[Thread Index]
[New search]
To: "Erica Chapin" <erica.chapin@xxxxxxxxxxxx>, "Jason Aiken" <jason.aiken@xxxxxxxxxxxxx>, <framers@xxxxxxxxx>
Subject: RE: Seeking SGML character entity to Unicode mapping/filter
From: "Peter Ring" <pri@xxxxxx>
Date: Mon, 16 Apr 2001 16:02:22 +0200
Importance: Normal
In-Reply-To: <NEBBJKAAGLALMMEDGIIFOEGGCHAA.erica.chapin@starfish.com>
Sender: owner-framers@xxxxxxxxx
AFAIK, there is no 100% official answer, if your question is: "How do I map ISO-8879:1986 character entities to Unicode, e.g., as hexadecimal numerical character references?" Here are two almost-offical resources: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT and: http://www.docbook.org/xml/4.1.2/ent/ The file at the Unicode site appears to have lost end-of-lines totally, but a bit of massage with a text editor should make it legible. The character entity files at the DocBook site are very complete and may be your best bet. You could rather easily set up a sed or perl script to do a translation based on these files. If you just want a quick-and-dirty table of Latin 1 (ISO-8859-1) vs. HTML 4 character entities, there's of course http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html. To avoid common misconceptions about many of the Latin 1 characters, study http://mirror.subotnik.net/jkorpela/latin1/index.html. For a comprehensive coverage of the ISO-8859 alphabet soup, see http://czyborra.com/charsets/iso8859.html. To learn more about character code issues, see http://mirror.subotnik.net/jkorpela/chars.html. If you want to know more about Unicode, you could start at http://www.cl.cam.ac.uk/~mgk25/unicode.html or http://www.unicode.org/. If you want to play around with character sets, try http://www.eki.ee/letter/. In case you need to recode SGML or XML files, you may have to use a proper parser in order to avoid recoding character data that shouldn't get recoded. I have not tried this, but I would try James Clark's SGML-to-XML converter 'sx', http://www.jclark.com/sp/. In case you have an OmniMark license or or still have one of the free versions, there's the Unimap script at http://www.xmeta.com/omlette/. For general recoding tasks, there's a nifty utility for recoding called 'recode', http://www.iro.umontreal.ca/contrib/recode/HTML/index.html. It does not (yet?) recode from ISO-8879 (SGML) character entities. You can, however, recode from HTML 4 character entities to Unicode (e.g. UTF-8) like this: recode h4..u8 < inputfile > outputfile get a table of the HTML 4 characters in Unicode like this: recode h4/test8..dump < /dev/null and a lot more that you'll never ever need. recode is available in most Unix/Linux/*BSD distributions. It is also ported to Win32, http://www.weihenstephan.de/~syring/win32/UnxUtils.html, and compiles out-of-the-box on Cygwin, http://sources.redhat.com/cygwin/. Kind regards, Peter Ring -----Original Message----- From: owner-framers@omsys.com [mailto:owner-framers@omsys.com]On Behalf Of Erica Chapin Sent: Friday, 13 April, 2001 7:33 PM To: Jason Aiken; framers@omsys.com Subject: RE: Seeking SGML character entity to Unicode mapping/filter I would also be interested in this info, so if it is not too specialized to be of general interest, perhaps it could go to the whole list - or - please include me on your reply to Jason. and a good friday it is! Erica > -----Original Message----- > From: owner-framers@omsys.com > [mailto:owner-framers@omsys.com]On Behalf > Of Jason Aiken > Sent: Friday, April 13, 2001 10:07 > To: framers@omsys.com > Subject: Seeking SGML character entity > to Unicode mapping/filter > > > Greetings FrameMuddlahs, > > I'm wondering if any of you SGML-crazed > Unicode fanatics blessed with the > pleasure of publishing in 10+ languages > might have any idea where to look for a > complete table or filter mapping SGML > character entities to appropriate > Unicode values. If you don't know what > I'm talking about, consider yourself > lucky. If you do, you know that > Unicode-friendly tools are kinda nice (ahem!). > > For example, to map small zeta for > Greek, I'd need a Unicode value for &zgr;. > > Surely someone has gone through the > pain of making SGML applications work > for Unicode stuff? > > Any insults, commiserations, scathing > grammatical criticisms, cryptic clues, > URLs, or actual assistance is deeply > appreciated. > > Maybe I should go post this on the > Adobe User-to-User forum for FM+SGML... > > Happy Good Friday the Thirteenth (muahahahah), > Jason > > > ** To unsubscribe, send a message to > majordomo@omsys.com ** > ** with "unsubscribe framers" (no > quotes) in the body. ** > ** To unsubscribe, send a message to majordomo@omsys.com ** ** with "unsubscribe framers" (no quotes) in the body. ** ** To unsubscribe, send a message to majordomo@omsys.com ** ** with "unsubscribe framers" (no quotes) in the body. **