[Info-Ingres] base64(), UTF-16, UTF-8 madness

Grant Croker grant.croker at ingres.com
Fri May 25 03:21:11 CDT 2007


Martin,

you appear to be sane and everyone else is mad, at least the person who 
wrote the MySQL base64 encoding/decoding code is :

 > echo "" | awk '{ORS=""}{ print "\x67\x1D"}'  | base64
Zx0=

 > cat base64.php
<?php
$str = b"\x67\x1D";
echo base64_encode($str);
?>
 > php base64.php
Zx0=

regards

grant

On 24/05/07 12:02, martin.bowes at ctsu.ox.ac.uk scribbled:
> Hi Everyone,
>
> This is a little off-topic but, I'm at my wits end on this one....
>
> I've been asked to write an OME function that does a base64 encoding 
> on nvarchar and nchar types.
>
> Now this seems simple enough...
>
>    *
>       Allow for Ingres being little endian when storing the unicode
>       (UTF- 16) characters.
>
> ie U+671D is stored as 1D67
>
>    *
>       Allow for standard rules on 'short strings' by padding with zero
>       bytes, and overwriting output with a requisite number of '='..
>
>    *
>       Divide the input into 6bit chunks and then use that value as an
>       offset into the standard base64 array of characters ie. A - Z, a
>       - z, 0-9, +, /.
>
>
> So 671D is 0110 (6) 0111(7) 0001(1) 1101(D) 0000 0000
> Which in groups of 6 becomes:
> 011001 (25) == Z, 110001 (49) == x, 110100 (52) == 0
>
> Hence we should get a return of 'Zx0='.
>
> Trouble is that's not what MySQL gives my programmers on the same 
> data. It insists that this is a string starting with 's6\'. I've 
> counter checked this conversion with some web based conversion 
> utilities and they seem to agree.
>
> So it occurred that the problem was that MySQL must be using UTF-8 to 
> represent the character. Which is cool, so I thought I can convert the 
> UTF-16 into UTF-8 and convert the output of that into base64.
>
> Trouble is that in UTF-8, U+671D becomes E6 9C 9D, which when 
> converted to base64 becomes the string: '5pyd'. I've confirmed this 
> UTF-16 --> UTF-8 conversion using Ingres to copy the nvarchar into a 
> file and running 'od -ax' on that file.
>
> If I decode the s6\ string it means that my first UTF-8 character must 
> be B3 AF D1. But that's not well formed UTF-8!
>
> Does anyone have any idea what I'm doing wrong?
>
> Martin Bowes
> --
> Random Duckman Quote #114:
> King Chicken: How dare you insult me in front of my wife, whose still
> dangerously coherent.
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Info-Ingres mailing list
> Info-Ingres at kettleriverconsulting.com
> http://www.kettleriverconsulting.com/mailman/listinfo/info-ingres
>   


-- 
Grant Croker, grantc at php.net   
Software Engineer, Ingres - http://ingres.com 
tel: +44 (0)1753 559505 UK / +34 676 518209 España
--
SCRANTON (n.)
A person who, after the declaration of the bodmin (q.v.), always says,'... But I only had the tomato soup.'



More information about the Info-Ingres mailing list