SM: Unicode, encoding used

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

SM: Unicode, encoding used

Davi Leal-2
Hi,

About the "JS string type implementation", I have read at js/src/jsstr.h that
"A JS string is a counted array of unicode characters".  According to the
ECMA-262 specification, the Unicode character encoding must be UTF-16.

We have the JavaScript SpiderMonkey C Engine js-1.5-rc4a.tar.gz embedded in
our application.  What character encoding is our SpiderMonkey using?.


That is the function stack we get across the SpiderMonkey source code. At the
end we get a "text/html; charset=iso-8859-1" string. I would like to know why
that iso-8859-1 charset is set:

 1------------------------------------------------

  JS_CallFunctionValue(JSContext *cx, JSObject *obj, jsval fval,
     uintN argc, jsval *argv, jsval *rval)

 2------------------------------------------------

  js_InternalCall(JSContext *cx, JSObject *obj, jsval fval,
     argc=0, argv, rval)

 3------------------------------------------------

  js_Invoke(JSContext *cx, argc=0, JSINVOKE_INTERNAL)

 4------------------------------------------------

  js_Interpret(JSContext *cx, &v)

 5------------------------------------------------

  js_Invoke(JSContext *cx, argc=1, 0)

 6------------------------------------------------

  JSStackFrame *fp, frame;

  frame.argv = sp - argc;

  if (argc)
     memcpy(newsp, frame.argv, argc * sizeof(jsval));
  frame.argv = newsp;

  native(JSContext *cx, JSObject *frame.thisp, argc=1,
      jsval *frame.argv=0x81582b8, &frame.rval)

 7------------------------------------------------

  JS_GetStringBytes( JS_ValueToString(JSContext *cx, argv=0x81582b8)


The JS_GetStringBytes function returns the string
  "text/html; charset=iso-8859-1"
Where does SpiderMonkey set that value?.

From where does it get the "text/html" part, and from where does it get the
"charset=iso-8859-1" path?. Does it get from the default system locale and
encoding?.  Is the JSContext or the JSRuntime related to it?.

Note that the application which has the SpiderMonkey embedded does not use the  
setlocale()  function, so I think the  POSIX locale  is the default.

What SpiderMonkey's functions manage the decoding and encoding of JS Strings
to others character encodings?.

Regards,
David

_______________________________________________
mozilla-jseng mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-jseng
Reply | Threaded
Open this post in threaded view
|

Re: SM: Unicode, encoding used

Brendan Eich
Davi Leal wrote:
> Hi,
>
> About the "JS string type implementation", I have read at js/src/jsstr.h that
> "A JS string is a counted array of unicode characters".  According to the
> ECMA-262 specification, the Unicode character encoding must be UTF-16.

Without combining sequences or non-BMP code points being counted as
single characters by string length, note well.  A string's length
property just counts 16 bit storage units.

> We have the JavaScript SpiderMonkey C Engine js-1.5-rc4a.tar.gz embedded in
> our application.

That's quite old, we have fixed many bugs and made many enhancements,
without sacrificing API compatibility.  Please upgrade as you can.

>  What character encoding is our SpiderMonkey using?.

[snip]

>   native(JSContext *cx, JSObject *frame.thisp, argc=1,
>       jsval *frame.argv=0x81582b8, &frame.rval)
>
>  7------------------------------------------------
>
>   JS_GetStringBytes( JS_ValueToString(JSContext *cx, argv=0x81582b8)
>
>
> The JS_GetStringBytes function returns the string
>   "text/html; charset=iso-8859-1"
> Where does SpiderMonkey set that value?.

It doesn't.  If you are showing the first argument (*frame.argv, or
frame.argv[0] more naturally), in hex, and it looks like that string, it
was passed into the native function being called here.  Go up the stack
to js_Invoke's caller and show that frame.

> From where does it get the "text/html" part, and from where does it get the
> "charset=iso-8859-1" path?. Does it get from the default system locale and
> encoding?.  Is the JSContext or the JSRuntime related to it?.

The engine did not generate this string.  A server may have, since it is
a MIME type.  Look to your embedding code, in particular the frame that
called that js_Invoke whose native receives this string as its first
argument.

/be
_______________________________________________
mozilla-jseng mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-jseng
Reply | Threaded
Open this post in threaded view
|

Re: SM: Unicode, encoding used

Davi Leal-2
You was right Brendan. It was an external function which generated the string.

Now the problem is solved.

Thanks,
Davi



Brendan Eich wrote:

> > The JS_GetStringBytes function returns the string
> >   "text/html; charset=iso-8859-1"
> > Where does SpiderMonkey set that value?.
>
> It doesn't.  If you are showing the first argument (*frame.argv, or
> frame.argv[0] more naturally), in hex, and it looks like that string, it
> was passed into the native function being called here.  Go up the stack
> to js_Invoke's caller and show that frame.
>
> > From where does it get the "text/html" part, and from where does it get
> > the "charset=iso-8859-1" path?. Does it get from the default system
> > locale and encoding?.  Is the JSContext or the JSRuntime related to it?.
>
> The engine did not generate this string.  A server may have, since it is
> a MIME type.  Look to your embedding code, in particular the frame that
> called that js_Invoke whose native receives this string as its first
> argument.
_______________________________________________
mozilla-jseng mailing list
[hidden email]
http://mail.mozilla.org/listinfo/mozilla-jseng