Welcome to the development home of the u8u16 high-speed UTF-8 to UTF-16 conversion software

This site contains the SVN repository for u8u16, high-speed UTF-8 to UTF-16 conversion software based on the parallel bit stream technology developed by Prof. Rob Cameron. The software is made available as open source software under the terms of Open Software License 3.0 by International Characters, Inc., an SFU spin-off company. The software demonstrates high-speed encoding form conversion several times faster than typical industry standard iconv implementations.

See also the following paper.

Robert D. Cameron, A Case Study in SIMD Text Processing with Parallel Bit Streams - UTF-8 to UTF-16 Transcoding, Proceedings of the 2008 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Salt Lake City, Utah, Feb. 20-23, 2008, pp. 91-98.

Browse the source code, download and read the documentation, or download u8u16-0.92.tar.bz2.

Development Notes.

UTF-8 to UTF-16 transcoding is widely cited as a significant bottleneck in XML processing.

Giuseppe Psaila, "On the Problem of Coupling Java Algorithms and XML Parsers" (Invited Paper), 17th International Conference on Database and Expert Systems Applications (DEXA'06), September 2006, pp. 487-491.

  • Breaking down the instruction counts by subroutine revealed

the top 3 expensive parser components: ... (2) transcoding the input document encoding to UTF-16. Matthias Nicola and Jasmi John, "XML Parsing: A Threat to Database Performance" Proceedings of the Twelfth International Conference on Information and Knowledge Management, New Orleans, Louisiana, 2003.

  • In DOM-based parsing, transcoding is reported as consuming 45% of

parsing time, while in SAX-based parsing, transcoding accounts for 52%. Eric Perkins, Margaret Kostoulas, Abraham Heifets, Morris Matsa, Noah Mendelsohn. "Performance Analysis of XML APIs" XML 2005, Atlanta, Georgia, November 2005.

PPoPP reprint below is copyright 2008 Association for Computing Machinery. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in the Proceedings of the 2008 SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP '08. Feb, 20-23, 2008, Salt Lake City, Utah, USA. Copyright (C) 2008 ACM 978-1-59593-960-9/08/0002...$5.00