Tuesday, December 20, 2011

New Emscripten tutorial: C/C++ to JavaScript now easier than ever with "emcc"

A new compiler frontend for Emscripten, emcc, has landed recently. emcc can be used basically as a drop-in replacement for gcc, making it much easier to compile C and C++ into JavaScript. For example,

   emcc src.cpp

will generate a.out.js, and

  emcc src.cpp -o src.html

will generate a complete HTML file with the compiled code as embedded JavaScript, including SDL support so the code can render to a Canvas element. Optimizing code is now easy as well,

  emcc -O2 src.cpp

will generate optimized code (optimizing in LLVM, the Emscripten compiler itself, the Closure Compiler, and the Emscripten JavaScript optimizer). (Note that there is an even faster setting, -O3, see the docs for more.)

emcc is presented in more detail in the new Emscripten Tutorial. Check it out! Feedback is welcome :)

Saturday, December 10, 2011

Typed Arrays by Default in Emscripten

Emscripten has several ways of compiling code into JavaScript, for example, it can use typed arrays or not (for more, see Code Generation Modes). I merged the 'ta2 by default' branch into master in Emscripten just now, which makes one of the typed array modes the default. I'll explain here the reason for that, and the results of it.

Originally Emscripten did not use typed arrays. When I began to write it, typed arrays were supported only in Firefox and Chrome, and even there they were of limited benefit due to lack of optimization and incomplete implementation. Perhaps more importantly, it was not clear whether they would ever be universally supported in all browsers. So to generate code that truly runs everywhere, Emscripten did not use typed arrays, it generated "plain vanilla" JavaScript.

However, that has changed. Firefox and Chrome now have mature and well-performing implementations of typed arrays, and Opera and Safari are very close to the same. Importantly, Microsoft has said that IE10 will support typed arrays. So typed arrays are becoming ubiquitous, and have a bright future.

The main benefits of using typed arrays are speed and code compatibility. Speed is simply a cause of JS engines being able to optimize typed arrays better than normal ones, both in how they are laid out in memory and how they are accessed. Compatibility stems from the fact that by using typed arrays with a shared buffer, you can get the same memory behavior as C has, for example, you can read an 8-bit byte from the middle of a 32-bit int and get the same result C would get. It's possible to do that without typed arrays, but it would be much, much slower. (There is however a downside to such C-like memory access: Your code, if it was not 100% portable in the first place, may depend on the CPU endianness.)

Because of those benefits, I worked towards using typed arrays by default. To get there, I had to fix various problems with accessing 64-bit values, which are only a problem when doing C-like memory access, because unaligned 64-bit reads and writes do not work (due to how the typed arrays API is structured). The settings I64_MODE and DOUBLE_MODE control reading those 64-bit values: If set to 1, reads and writes will be in two 32-bit parts, in a safe way.

Another complication is that typed arrays cannot be resized. So when sbrk() is called to a value that is larger than the max size, we can't easily enlarge the typed arrays we are using. The current implementation will create new typed arrays and copy the old values into them, which will work but is potentially slow.

Typed arrays have already worked in Emscripten for a long time (in two modes, even, shared and non-shared buffers), but the issues mentioned in the previous two paragraphs limited their use in some areas. So the recent work has been to smooth over all the missing pieces, to make typed arrays ready as the default mode.

The current default in Emscripten, after the merge, is to use typed arrays (in mode 2, with a shared buffer, that is, C-like memory access), and all the other settings are set to safe values (I64_MODE and DOUBLE_MODE are both 1), etc. This means that all the code that worked out of the box before will continue to work, and additional code will now work out of the box as well. Note that this is just the defaults: If your makefile sets all the Emscripten settings itself (like defining whether to use typed arrays or not, etc.), then nothing will change.

The only thing to keep in mind with this change is that by default, you will need typed arrays to run the generated code. If you want your code, right now, to run in the most places, you should set USE_TYPED_ARRAYS to 0 to disable typed arrays. Another possible issue is that not all JS console environments support typed arrays: Recent versions of SpiderMonkey and Node.js do, but the V8 shell has some issues (note that this is just a problem in the commandline shell, not in Chrome), so if you test your generated code using d8 then it will not work. Instead, you can test it in a browser, or by using Node.js or the SpiderMonkey shell for now.

Monday, December 5, 2011

Emscripten in node.js and on the web

Until now, to use Emscripten to compile LLVM to JavaScript you had to install a JavaScript engine shell (like SpiderMonkey's or V8's), both to run Emscripten itself and to run the generated code. This meant you had to get the latest source code of one of those shells and build it, which isn't hard but isn't super convenient either. So over the weekend I landed support for running Emscripten itself in node.js and in web browsers, as well as support for running the generated code in node.js (it always ran in browsers).

What this means is that if you have node.js, Python and Clang, you have everything you need to use Emscripten. For more, see the updated Getting Started page. (Regarding running Emscripten itself in a web browser, see src/compiler.html. This isn't really intended as a serious way to use it, but there are some interesting use cases for it, or will be.)

It is still strongly recommended to install the JavaScript engine shells themselves, though. One reason is the trunk engine shells are the very latest code, so to see the maximum speed code can run you should use them. Also, some tests require the SpiderMonkey shell because the others do not yet fully support the latest typed arrays spec. But, if you already have node.js installed anyhow, it is now easier to use Emscripten because you can just use that.

Tuesday, November 15, 2011

Code Size When Compiling to JavaScript

When compiling code to JavaScript from some other language, one of the questions is how big the code will be. This is interesting because code must be downloaded on the web, and large downloads are obviously bad. So I wanted to investigate this, to see where we stand and what we need to do (either in current compilers, or in future versions of the JavaScript language - being a better compiler target is one of the goals there).

The following is some preliminary data from two real-world codebases, the Bullet physics library (compiled to JavaScript in the ammo.js project) and Android's H264 decoder (compiled to JavaScript in the Broadway project):

Bullet

.js        19.2  MB
.js.cc      3.0  MB
.js.cc.gz   0.48 MB

.o          1.9  MB
.o.gz       0.56 MB

Android H264

.js       2,493 KB
.js.cc      265 KB
.js.cc.gz    61 KB

.o          110 KB
.o.gz        53 KB

Terms used: 

.js         Raw JS file compiled by Emscripten from LLVM bitcode
.js.cc      JS file with Closure Compiler simple opts
.js.cc.gz   JS file with Closure, gzipped

.o          Native code object file
.o.gz       Native code object file, gzipped

Notes on methodology:
  • Native code was generated with -O2. This leads to smaller code than without optimizations in both cases.
  • Closure Compiler advanced optimizations generate smaller JS code in these two cases, but not by much. While it optimizes better for size, it also does inlining which increases code size. In any case it is potentially misleading since its dead code elimination rationale is different from the one used for LLVM and native code, so I used simple opts instead.
  • gzip makes sense here because you can compress your scripts on the web using it (and probably should). You can even do gzip compression in JS itself (by compiling the decompressor).
  • Debug info was not left in any of the files compared here.
  • This calculation overstates the size of the JS files, because they have the relevant parts of Emscripten's libc implementation statically linked in. But, it isn't that much.
  • LLVM and clang 3.0-pre are used (rev 141881), Emscripten and Closure Compiler are latest trunk as of today.
Analysis

At least in these two cases it looks like compiled, optimized and gzipped JavaScript is very close to (also gzipped) native object files. In other words, the effective size of the compiled code is pretty much the same as you would get when compiling natively. This was a little surprising, I was expecting to see the size be bigger, and to then proceed to investigate what could be improved.

Now, the raw compiled JS is in fact very large. But that is mostly because the original variable names appear there, which is basically fixed by running Closure. After Closure, the main reason the code is large is because it's in string format, not an efficient binary format, so there are things like JavaScript keywords ('while', for example) that take a lot of space. That is basically fixed by running gzip since the same keywords repeat a lot. At that point, the size is comparable to a native binary.

Another comparison we can make is to LLVM bitcode. This isn't an apples-to-apples comparison of course, since LLVM bitcode is a compiler IR: It isn't designed as a way to actually store code in a compact way, instead it's a form that is useful for code analysis. But, it is another representation of the same code, so here are those numbers:

Bullet

.bc         3.9  MB
.bc.gz      2.2  MB

Android H264

.bc         365 KB
.bc.gz      258 KB

LLVM bitcode is fairly large, even with gzip: gzipped bitcode is over 4x larger than either gzipped native code or JS. I am not sure, but I believe the main reason why LLVM bitcode is so large here is because it is strongly and explicitly typed. Because of that, each instruction has explicit types for the expressions it operates on, and elements of different types must be explicitly converted. For example, in both native code and compiled JS, taking a pointer of one type and converting it to another is a simple assignment (which can even be eliminated depending on where it is later used), but in LLVM bitcode the pointer must be explicitly cast to the new type which takes an instruction.

So, JS and native code are similar in their lack of explicit types, and in their gzipped sizes. This is a little ironic since JS is a high level language and native code is the exact opposite. But both JS and native code are pretty space-efficient it turns out, while something that seems to be in between them - LLVM bitcode, which is higher than native code but lower than JS - ends up being much larger. But again, this actually makes sense since native code and JS are designed to simply execute, while LLVM bitcode is designed for analysis, so it really isn't in between those two.

(Note that this is in no way a criticism of LLVM bitcode! LLVM bitcode is an awesome compiler IR, which is why Emscripten and many other projects use it. It is not optimized for size, because that isn't what it is meant for, as mentioned above, it's a form that is useful for analysis, not compression. The reason I included those numbers here is that I think it's interesting seeing the size of another representation of the same compiled code.)

In summary, it looks like JavaScript is a good compilation target in terms of size, at least in these two projects. But as mentioned before, this is just a preliminary analysis (for example, it would be interesting to investigate specific compression techniques for each type of code, and not just generic gzip). If anyone has additional information about this topic, it would be much appreciated :)

Friday, October 7, 2011

JSConf.eu, Slides, SQLite on the Web

I got back from JSConf.eu a few days ago. I had never been to JSConf before, and it was very interesting! Lots of talks about important and cool stuff. The location, Berlin, was also very interesting (the mixture of new and old architecture in particular). Overall it was a very intensive two days, and the organizers deserve a ton of credit for running everything smoothly and successfully.

I was invited to give a talk about Emscripten, the LLVM to JavaScript compiler I've been working on as a side project over the last year. Here are my slides from the talk, links to demos are in them. There was also a fourth unplanned demo which isn't in the slides, here is a link to it.

If you've seen the previous Emscripten demos, then some of what I showed had new elements, like the Bullet/ammo.js demo which shows the new bindings generator which lets you use the compiled C++ objects directly from JS in a natural way. One demo was entirely new though, SQLite ported to JS. I haven't had time to do any rigorous testing of the port or to optimize it for speed. However it appears to work properly in all the basic tests I tried: creating tables, doing selects, joins, etc. With WebSQL not moving forward as a web standard, compiling SQLite to JS directly might be the best way to get SQL on the web. The demo is just a proof of concept, but I think it shows the approach is feasible.

Saturday, June 25, 2011

Long-term support for Firefox?

A debate is currently ongoing regarding long-term support of Firefox, with some surprised that Firefox 4 is being EOL'ed, and what that means for the viability of Firefox in the enterprise, if Mozilla is going to release a new version every 6 weeks.

Without stating an opinion either way (since I don't have one), there is one potential solution here that I haven't seen mentioned enough. The whole point of open source is that anyone, anywhere, can take the source code and customize it for any purpose they see fit. If there is indeed a need for long-term support for Firefox in the enterprise (and again, I am voicing no opinion on the matter), then anyone can do that. It doesn't need to be Mozilla. Anyone can form a new company, do the work to backport security fixes and QA them, and sell long-term support for Firefox (or this could be done in various other ways).

This is possible since Firefox is 100% open source. It isn't possible with proprietary software like IE, and isn't possible with software that mixes the two (like Chrome: you can sell long-term support for Chromium, but it will not have Chrome-only features like print preview, etc., so it would be a different product).

I want to stress the fact that you can sell open source software. Complying with Firefox's license doesn't preclude that. I won't get into the details, but this model works fine, just ask Red Hat. In fact I assume Red Hat already does exactly this, it sells supports for old versions of Firefox as part of Red Hat Enterprise Linux for a very long time.

Mozilla is driving Firefox forward as fast as possible, but Mozilla doesn't 'own' Firefox in the sense that it is the only party that can do things with it. Anyone can. If there is a business need for long-term support for Firefox, anyone can serve that need by selling that support.

Note that this is not a case of me saying "if you want some feature, you can always do it yourself". The situation we are talking about here is enterprise users. Big corporations do pay for support for the software they use, it is worthwhile for them to pay for such support. My point is that open source software like Firefox fits into this model perfectly - if there is a business need, anyone can step in and fill that need.

Again, I don't have an opinion myself as to whether such support is important - I don't know enough about the enterprise software market to have such an opinion. I also have nothing to do with Mozilla policy and planning, I just write code. My only point is that Firefox is open source, and that means it is business-friendly in the sense that you are not under the control of a single vendor selling you a product: You can customize the software yourself, or you can get someone else to do it for you. As an open source advocate, I wanted to point this out, since Firefox is the only major browser that is 100% open source, and that means there are solutions to a situation where people say there is a need for something, but Mozilla does not currently do that thing.

Wednesday, June 8, 2011

Device Orientation API Changes

We recently moved from our temporary Mozilla-specific onMozOrientation API for device orientation events - which let you detect what angle you are holding your mobile phone, for example - to the W3C DeviceOrientation spec.

The main difference in the orientation data is that onMozOrientation returned an (x,y,z) vector, whereas the W3C spec returns (alpha,beta,gamma) Euler angles.

The easiest way to understand what those mean is to look at the examples in the W3C spec linked to before (search for "The following code extracts illustrate basic use of the events"). For now on Android we don't give the alpha value, which is the compass heading (azimuth). You can use the beta and gamma values though. Basically, if the user holds the device flat on a surface pointing up, then tilting it towards the user or away changes beta, and tilting to the right or to the left changes gamma.

We hope to implement the rest of the spec soon.

Friday, June 3, 2011

No more stackQuota()

If you use SpiderMonkey, and have had to run code like this:
js -e "stackQuota(100000000)" myFile.js
Then as of Bug 644241 landing on tracemonkey, calling stackQuota is no longer necessary (it will cause an error, as stackQuota no longer exists).

For more details see the bug. In brief summary, the script stack quota originally limited the amount of memory scripts could use, but its effectiveness diminished over time, while it has been causing more problems as people run larger JavaScript files and keep hitting the quota (we had a bug in Firefox, then another bug for worker threads, and as mentioned above it was necessary to call stackQuota in the console). So the script stack quota has been removed. A followup bug has plans to introduce a new way to limit the amount of memory scripts can use.

Monday, April 11, 2011

Rendering PDFs in JavaScript...?

I released Emscripten 1.0 over the weekend, which came with a demo of rendering PDFs entirely in JavaScript (warning: >12MB will be downloaded for that page). Emscripten is an LLVM-to-JavaScript compiler which allows running code written in C or C++ on the web. In the linked demo, Poppler and FreeType were compiled to JavaScript from C++.

The goal of the demo was to show Emscripten's capabilities. Over the last year it has gotten very usable, and can probably compile most reasonable C/C++ codebases (albeit with some manual intervention in some cases). It is my hope that Emscripten can help against the tendency to write non-web applications, such as native mobile applications (for iOS, Android, etc.) or using plugins on the web (Flash, NaCl, etc.). Simply put, the web is competing with these platforms. Emscripten can make the web a more attractive platform for developers, by letting them use their languages of choice, such as C, C++ or Python (without necessarily compromising on speed: the code generated by Emscripten can be optimized very well, and it is my hope that things like type inference will make it very fast eventually).

Meanwhile, getting back to the PDF rendering demo, I was thinking: How about making a Firefox plugin with it, that is, that when a PDF is clicked in Firefox it is shown in an internal PDF viewer? Aside from the novelty, I think this would be cool to do because it would be an extremely secure PDF viewer (since it would be entirely in JavaScript). If you are a plugin or frontend hacker and think it's a cool idea too, please get in touch and let's make it happen! :)

Friday, March 18, 2011

massdiff - Diff for Massif Snapshots

Massif (part of Valgrind) is a super-useful tool to find out how a program allocates memory. It gives very detailed graphs about exactly which lines of code do allocations that are not free'd later. For example,

96.42% (22,771,482B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->39.75% (9,388,032B) 0x8058AC1: arena_malloc_large (jemalloc.c:3831)
| ->39.75% (9,388,032B) 0x8058DB9: arena_malloc (jemalloc.c:3856)
|   ->34.74% (8,204,288B) 0x8058E98: imalloc (jemalloc.c:3866)
|   | ->34.27% (8,093,696B) 0x805D833: malloc (jemalloc.c:5882)
|   | | ->12.23% (2,887,680B) 0x6BBB63F: sqlite3MemMalloc (sqlite3.c:14221)
  ...

This shows a snapshot at a particular point in time, during which the whole program allocated ~22MB of memory, of which almost 3MB was due to SQLite.

One limitation though is you can't easily see what changes from one snapshot to another, which is important in order to see if there are any gradual memory increases (these may or may not be actual leaks). So I wrote a small python script, massdiff, which diffs two Massif snapshots. Here is an example of the output:


Diffing snapshots 30 50

- (heap allocation functions) malloc/new/new[], --alloc-fns, etc. - 22,286,738
    [ diff: +502,088 ]
+ (heap allocation functions) malloc/new/new[], --alloc-fns, etc. - 22,788,826

-   0x8058927: arena_malloc_small (jemalloc.c:3794) -  7,494,594
      [ diff: +499,552 ]
+   0x8058927: arena_malloc_small (jemalloc.c:3794) -  7,994,146

-     0x8058D9D: arena_malloc (jemalloc.c:3854) -  7,494,594
        [ diff: +499,552 ]
+     0x8058D9D: arena_malloc (jemalloc.c:3854) -  7,994,146

-       0x8058E98: imalloc (jemalloc.c:3866) -  6,055,702
          [ diff: +499,232 ]
+       0x8058E98: imalloc (jemalloc.c:3866) -  6,554,934

-         0x805D833: malloc (jemalloc.c:5882) -  5,552,542
            [ diff: +499,232 ]
+         0x805D833: malloc (jemalloc.c:5882) -  6,051,774

-           0x52D2FA3: js_malloc (jsutil.h:213) -  1,635,252
              [ diff: +432,064 ]
+           0x52D2FA3: js_malloc (jsutil.h:213) -  2,067,316

-             0x52D9D27: JSRuntime::malloc(unsigned int, JSContext*) (jscntxt.h:1358) -  1,635,252
                [ diff: +432,064 ]
+             0x52D9D27: JSRuntime::malloc(unsigned int, JSContext*) (jscntxt.h:1358) -  2,067,316

-               0x52D9DEC: JSContext::malloc(unsigned int) (jscntxt.h:2027) -  1,600,532
                  [ diff: +431,904 ]
+               0x52D9DEC: JSContext::malloc(unsigned int) (jscntxt.h:2027) -  2,032,436

-                 0x5D5B41D: JSObject::allocSlots(JSContext*, unsigned int) (jsobj.cpp:4032) -    161,456
                    [ diff: +344,832 ]
+                 0x5D5B41D: JSObject::allocSlots(JSContext*, unsigned int) (jsobj.cpp:4032) -    506,288

-                   0x5D5B594: JSObject::growSlots(JSContext*, unsigned int) (jsobj.cpp:4078) -    161,456
                      [ diff: +344,832 ]
+                   0x5D5B594: JSObject::growSlots(JSContext*, unsigned int) (jsobj.cpp:4078) -    506,288


The diff only shows what changed between the two revisions, and shows that in in tree format, just like Massif's ms_print does. Each group of three lines shows a line from the first snapshot on top, the corresponding line from the later snapshot on bottom, and between them the difference.

The output here is from loading about:blank 500 times in Fennec. There is overall ~500K of additional allocation (so, ~1K per page load), of which JSObject::growSlots is responsible for ~340K (this is later all deallocated at once, presumably due to GC being run).

So far this has been useful in helping discover one specific case, bug 641663, and I'm still investigating some additional issues. Hopefully it can be a useful tool for other people too. To use it,
  • Run Massif on your program, see here and here
  • Run ms_print on the output
  • Run massdiff on that file, with the two snapshot numbers you want to diff as parameters

Wednesday, February 16, 2011

High-Level Fennec Profiling

I've been working on a series of patches to let us do high-level profiling on Fennec (and Firefox). The goal is to get a "big picture" view of what processing happens in Fennec, so we can investigate what should be optimized. More specifically, the idea is to see what events, runnables, and IPC messages are run, and how much time is spent on each.

Here is some example data, from ~9 seconds of panning and zooming on the Mozilla crash stats page, in the parent process. Notes:
  • Event 10 is a mouse event, and 37 is a simple gesture. So these are events that are triggered by the panning and zooming actions. A lot of these events happen, and they can take up to 1/20th of a second to process.
  • I am not sure why the RefreshDriver is called here (since this is the parent process). Perhaps worth looking into.
  • Layers Update IPC messages are working very well, with a mean processing time of 0.0004 seconds.
And here is some data from loading google.com (nonmobile). Notes:
  • AsyncAssociateIconToPage and InsertVisitedURIs take a while, but they are at least on a side thread.
  • RecvGetValue is web storage. We are considering what to do with that in bug 627635.
  • nsGlobalWindow.cpp:8834 is likely a setTimeout or Interval that google.com created.
  • We receive 39 RecvHTMLDNSPrefetch IPC messages. Offhand I don't know why so many would occur on a blank search page. Filed bug 634653.

So, the data shows on a high level what code is run, unlike a low-level profiler (like oprofile) that shows how much individual functions or lines of code take. Low-level profilers are very useful of course, but sometimes it is also good to get higher-level data (and it isn't always possible or easy to deduce that from the low-level data). Also, running low-level profilers depends on the OS, so you need one to be available for your OS (a problem on Android - but see the very cool piranha), and even if you do have one, your data is for that OS only, and not necessarily directly comparable to data from a different low-level profiler on a different OS (in other words, you are changing both profiler and OS, and not just OS).

Of course this approach has limitations as well, so a combination of both approaches is the best thing.

The patches are as follows:
  • Timer logger: A script that automatically rewrites the Fennec source code to add identifying strings to all timers, and code to generate log output for that. This is an improvement of my previous IPC profiling patch.
  • Runnables and events logger: Another rewriting script, that does something similar for Runnables. Also adds log output for Runnables and Events.
  • IPC traffic logger: A few hooks to generate log output for each IPC message.
  • Android log processer: Tiny script that splits an android adb logcat dump into separate files for each process, that are ready for further processing. This is only needed on Android.
  • Data analyzer: Processes the generated logs and creates readable output (like the data shown above)
To use the patches, apply them, then run timer_rewriter.py and runnable_rewriter.py. Build Fennec (or Firefox), then run it, and capture the log output into a file. You should have a separate file for each process; on Android, use the android_prettifier.py script mentioned above. Then run the analyze_profiling_output.py script on the data to get the summary.