[Info-Ingres] base64(), UTF-16, UTF-8 madness
Grant Croker
grant.croker at ingres.com
Fri May 25 03:21:11 CDT 2007
Martin,
you appear to be sane and everyone else is mad, at least the person who
wrote the MySQL base64 encoding/decoding code is :
> echo "" | awk '{ORS=""}{ print "\x67\x1D"}' | base64
Zx0=
> cat base64.php
<?php
$str = b"\x67\x1D";
echo base64_encode($str);
?>
> php base64.php
Zx0=
regards
grant
On 24/05/07 12:02, martin.bowes at ctsu.ox.ac.uk scribbled:
> Hi Everyone,
>
> This is a little off-topic but, I'm at my wits end on this one....
>
> I've been asked to write an OME function that does a base64 encoding
> on nvarchar and nchar types.
>
> Now this seems simple enough...
>
> *
> Allow for Ingres being little endian when storing the unicode
> (UTF- 16) characters.
>
> ie U+671D is stored as 1D67
>
> *
> Allow for standard rules on 'short strings' by padding with zero
> bytes, and overwriting output with a requisite number of '='..
>
> *
> Divide the input into 6bit chunks and then use that value as an
> offset into the standard base64 array of characters ie. A - Z, a
> - z, 0-9, +, /.
>
>
> So 671D is 0110 (6) 0111(7) 0001(1) 1101(D) 0000 0000
> Which in groups of 6 becomes:
> 011001 (25) == Z, 110001 (49) == x, 110100 (52) == 0
>
> Hence we should get a return of 'Zx0='.
>
> Trouble is that's not what MySQL gives my programmers on the same
> data. It insists that this is a string starting with 's6\'. I've
> counter checked this conversion with some web based conversion
> utilities and they seem to agree.
>
> So it occurred that the problem was that MySQL must be using UTF-8 to
> represent the character. Which is cool, so I thought I can convert the
> UTF-16 into UTF-8 and convert the output of that into base64.
>
> Trouble is that in UTF-8, U+671D becomes E6 9C 9D, which when
> converted to base64 becomes the string: '5pyd'. I've confirmed this
> UTF-16 --> UTF-8 conversion using Ingres to copy the nvarchar into a
> file and running 'od -ax' on that file.
>
> If I decode the s6\ string it means that my first UTF-8 character must
> be B3 AF D1. But that's not well formed UTF-8!
>
> Does anyone have any idea what I'm doing wrong?
>
> Martin Bowes
> --
> Random Duckman Quote #114:
> King Chicken: How dare you insult me in front of my wife, whose still
> dangerously coherent.
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Info-Ingres mailing list
> Info-Ingres at kettleriverconsulting.com
> http://www.kettleriverconsulting.com/mailman/listinfo/info-ingres
>
--
Grant Croker, grantc at php.net
Software Engineer, Ingres - http://ingres.com
tel: +44 (0)1753 559505 UK / +34 676 518209 España
--
SCRANTON (n.)
A person who, after the declaration of the bodmin (q.v.), always says,'... But I only had the tomato soup.'
More information about the Info-Ingres
mailing list