Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG
unicode logo

What is Unicode?

  • by Jason Koivu

unicodeUnicode began as a project in 1987 between Apple and Xerox engineers in response to a need for an international standard of representation for every character in all major languages of the world. As the exchange of information and data became more prevalent electronically and internationally, there was a need for a unified code that could be read on any platform running any program. Prior to the development of Unicode, the primary ASCII coding scheme which used an 8-bit character representation only allowed for 256 characters.

These early Unicode pioneers discovered that there were about 27,000 characters in the modern world and this resulted in a 16-bit fixed length character code which allowed for 65,000 characters, enough even for future expansion. Joe Becker, one of the Xerox engineers, coined the term Unicode from their requirements for a universal,  uniform, and unique bit sequence to represent characters.

The initial success of Unicode naturally relied on its adoption by other companies. Early in its development, major computer manufacturers, networking and software companies began making significant contributions to the design. In addition to Xerox and Apple, participating companies included Metaphor, Claris, Research Libraries Group, Sun, Microsoft, SHARE, IBM, Pacific Rim, Aldus, NeXT, and Novell.

By 1991, Unicode, Inc. was incorporated with the original purpose to standardize, extend, and promote the Unicode character encoding. The original release date of Unicode was in October of that year. Version 1.0.0 contained codes for 7,161 characters. The most recent version, 6.0.0, was released in October 2010 and provided codes for 109,449 characters from the world’s alphabets, ideograph sets, and symbol collections.

IRI first began developing support for Unicode data in CoSort Version 7.5. But with the release of CoSort 9.5 there was a major re-design necessary to support the updates in characters that occurred.

CoSort’s Sort Control Language (SortCL) program supports Unicode files and fields which may be mapped to database tables. SortCL can collate (sort), merge, join or convert Unicode characters and numerals in delimited or fixed-position fields.

Conversion between Unicode and single-byte (e.g. ASCII) or native multi-byte characters (e.g. Chinese GBK/Big5, Japanese, and Korean) is supported.  Conversion between a variety of numeric data formats and Unicode digits is supported in both CoSort (SortCL) and NextForm, IRI’s standalone data migration package.

Database Reorgs – Why They Matter
Eclipse: A Brief History
CoSort data exchange programming language SortCL Unicode

Related articles

Connecting MariaDB and MySQL to…
The IRI Platform
IRI Data Migration and Modernization
Real-time Database Data Replication
Getting Started with IRI Ripcurrent
Mapping DB Data Types
Connecting to Microsoft Access in…
Creating New Tables in IRI…
Automating IRI Jobs Using File…
Migrating RDB Data via the…
Introducing the ASN.1 Format and…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact