Character encoding of SAP data

Direct Link enables users to work with both non-Unicode and Unicode SAP systems. The way Direct Link reads character fields from these systems depends on the edition of Analytics you are using.

Analytics Unicode Edition

If you are using the Unicode edition of Analytics, data is always read directly from the SAP system. When you connect to a Unicode SAP system, data is downloaded as Unicode and displayed as Unicode data in Analytics. When you connect to a non-Unicode system, data is downloaded in the code page it is encoded in and displayed in Analytics using that code page. The Unicode edition of Analytics supports ASCII, EBCDIC, and Unicode character data.

Analytics Non-Unicode Edition

If you are using the non-Unicode edition of Analytics to access an SAP system, the behavior depends on the type of system you are accessing. If you access a non-Unicode SAP system, data is downloaded in the code page it is encoded in and displayed in Analytics using that code page.

If you access a Unicode SAP system, the Unicode SAP fields must be converted before they can be read in Analytics because the non-Unicode edition of Analytics does not include a Unicode data type. When you run a Direct Link query on a Unicode SAP system from the non-Unicode Edition of Analytics, character fields are converted from Unicode to ASCII fields when the query results are extracted from the SAP system. The character encoding of the computer that submits the query determines the code page used to encode the results of the query. For example, if you submit a query to a Unicode SAP system from a workstation where the language is set to English (United States), the code page will be set to Windows-1252, which is the code page for displaying English ASCII characters in Windows.

You need to keep the following points in mind if you need to import data from a Unicode SAP system with the non-Unicode edition of Analytics:

  • The Regional and Language Options settings in the Windows Control Panel determine the code page setting on the computer. Direct Link gets this code page information when a query is submitted and converts the data from Unicode to the appropriate code page when the data is extracted.

  • If the characters in your SAP data span more than one code page (e.g., a column has some rows with Russian text and others with Chinese text) some of the data (i.e., the text that doesn’t use the code page the computer is set to) will not be interpreted correctly and will be displayed as unrecognizable characters.

  • The record length and field length cannot be calculated accurately for double-byte code pages that do not always use two characters.

  • If a character cannot be converted to the destination code page, it will be replaced by a number sign character (#).

  • Only text fields are affected by code page conversions. Numeric and datetime fields are not affected.