Spidermonkey JIT improvements in FF53

On 23th of January the code of Firefox 53 already merged into the stabilization tree. While working on the next releases the code of FF53 has time to stabilize before release on April 18th.

In FF53 a lot has happened. Narrowing down on the JITs, the following was committed:

CacheIR

CacheIR improved drastically in this release. The goal of this project is twofold. One part is to unify the inline caches (IC) stubs in Baseline and IonMonkey. As a result we will only have to implement a new stub once anymore, leading to less code duplication. Secondly it uses an intermediate representation allowing us to reuse parts between stubs.

Starting in this release IonMonkey uses this infrastructure for generating ICs. Also new ICs were ported and we have now complete coverage of JSOP_GETPROP (e.g. reading out obj.prop where obj is an object) and JSOP_GETELEM (e.g. reading out array[42]) in CacheIR. Besides this milestone inline caches for getting DOM expando properties (e.g. properties added on DOM objects) were added, getting own properties of expandos on DOM proxies and lookups of plain data properties on the WindowProxies.

Our regular contributor evilpie helped a lot with this effort and implemented a logger that shows when we are missing specific stubs. This allowed us to find missing edge cases on popular websites. This enabled optimizations notably on Google Docs and Twitter. This work will continue in FF54.

WebAssembly

Since we implemented the draft specification of WebAssembly, we haven’t stopped improving it, be it for throughput or for compilation time and we’ve been polishing our implementation to fix bugs and incorporate last-minute spec changes

In order to improve the experience we have moved validation on the helper thread and we’re doing more of the compilation in parallel. Lastly we added some optimizations to achieve better parallelism while compiling. As a result the compilation of WebAssembly code should be smoother.

IonMonkey

IonMonkey also got its fair share of improvements in this release.

On Google docs we noticed a lot of compilation time was spend in a particular function “FlagAllOperandsAsHavingRemovedUses”. We were able to decrease the time spent in the loop in that function by removing some extra checks. As a result this is now a very tight loop and not visible in profiles anymore.

We adjusted a part of our engine, IonBuilder, which job is to create an SSA graph from a JS script, to return a “Result” type. This annotation will makes it easier to differentiate between different kinds of failures and correctly act on them. It indicated places where we didn’t handle out of memory failures correctly. In the future it will also allow to backtrack after an inlining failure and continue compiling without having to restart over.

Another improvement that happened to IonBuilder is that we now split the creation of the Control Flow Graph (CFG) and the rest of what IonBuilder does. IonBuilder has a lot of different roles and as a result could be cleaner. Also this code is one of the few parts that cannot run on the background thread. This split simplifies the IonBuilder code a bit and allows us to cache the CFG. A recompilation should be a little bit faster now.

Taahir Ahmed added extra code to allow us to constant fold powers. With this code IonMonkey can now use the result of a power with constant, instead of executing it every time at runtime.

Addition to the team

I’m also happy to announce that Ted Campbell has joined the JIT team. He started January 9th and is located in the Toronto office. He is helping the CacheIR project and will also look into making new ECMAScript 2016 features faster in IonMonkey.

Closing notes

This is not a full list of the changes that happened, but should cover the big ones. If you want the full list I would recommend you read the bug list. I want to thank everybody for their hard work. If you are interested in helping out, we have a list of mentored bugs at bugsahoy or you can contact me (h4writer) online at irc.mozilla.org #jsapi.

Spidermonkey JIT improvements in FF52

Last week we signed off our hard work on FF52 and we will start working on FF53. The expected release date of this version is the 6th of March. In the meantime we still have time to stabilize the code and fix issues that we encounter.

In the JavaScript JIT engine a lot of important work was done.

WebAssembly has made great advances. We have fully implemented the draft specification and are requesting final feedback as part of a cross-browser Browser Preview in the W3C WebAssembly Community Group. Assuming the Browser Preview concludes without major changes before 52 is released, we’ll enable WebAssembly by default in 52.

Step by step our CacheIR infrastructure is improving. In this release primitive value getprop was ported to the CacheIR infrastructure.

GC sometimes needs to discard the compilations happening in the helper threads. It seemed like we waited for the compilation to stop one after another. As a result it could take a while till all compilations were discarded. Now we signal all threads at the same time to stop compiling and afterwards wait for all of them to finish. This was a major win in our investment to make sure GC doesn’t interrupt the execution for too long.

The register allocator also received a nice improvement. Sometimes we were adding spills (stack to/from registers moves) while they were not needed. A fix was added to combat this.

Like in every release a long list of differential bugs and crashes have been fixed as well.

This release also include code from contributors:

  • Emanuel Hoogeveen improved our crash annotations. He noticed that we didn’t save the crash reason in our crash reports when using “MOZ_CRASH”.
  • Tooru Fujisawa has been doing incredible work throughout the codebase.
  • Johannes Schulte has landed a patch that improves code for “testArg ? testArg : 0.0” and also eliminates needless control flow.
  • Sander Mathijs van Veen made sure we could use unsigned integer modulo instead of a double modulo operation and also added code to fold additions with multiple constants.
  • André Bargull added code to inline String.fromCodePoint in IonMonkey. As a result the performance should now be equal to using String.fromCharCode
  • Robin Templeton noticed that we were spewing incorrect information in our debug logs and fixed that.
  • Heiher has been updating MIPS to account for all changes we did to platform dependent code for WebAssembly.

I want to thank every one of them for their help! They did a tremendous job! If you are interested in helping out, we have a list of mentored bugs at bugsahoy or you can contact me (h4writer) online at irc.mozilla.org #jsapi.

Tracelogger gui updates

Tracelogger is one of the tools JIT devs (especially me) use to look into performance issues and to improve the JS engine of Firefox performance-wise. It traces which functions are executing together with extra information like which engine is running. How long compilation took. How many times we are GC’ing and if we are calling VM functions …

I made the GUI a bit more powerful. First of all I moved the computation of the overview to a web worker. This should help the usability of the tool. Next to that I made it possible to toggle the categories on and off. That might make it easier to understand the graphs. I also introduced a settings popup. In the settings popup you can now choice to see absolute (cpu ticks) or relative (%) timings.

Screenshot from 2016-05-03 09:36:41

There are still a lot of improvements that are possible. Eventually it should be possible to zoom on graphs, toggle scripts on/off, see full times of scripts (instead of self time only) and maybe make it possible to show another graph (like a flame chart). Hopefully one day.

This is off course open source and available at:
https://github.com/h4writer/tracelogger/tree/master/website

Year in review: Spidermonkey in 2014 part 3

In the first two parts I listed the major changes in the Javascript engine of Mozilla Firefox in 2014 and enumerated the major changes that happened starting from Firefox 29 till Firefox 34. In part 3 I will iterate the major changes in the JavaScript engine in the last two releases that were developed in 2014.

If you haven’t read the first parts yet, I would encourage you to do that first.
<- Year in review: Spidermonkey in 2014 part 1
<- Year in review: Spidermonkey in 2014 part 2

Firefox 35

JIT RegExp.prototype.exec and RegExp.prototype.test

When Firefox 32 was released the regular expression engine was replaced with Irregexp. The new engine had just like its predecessor a small jit where regular expressions get compiled to native code. And just like its predecessor the easiest way to embed the regular expression engine is to use C code. Consequently a normal execution of a regular expression looked as following. The js code goes into C code preparing the regular expression engine jit, whereafter the regular expression engine gets called. In this release we eliminated the middle step (the c code) and now jump directly from JS jit to regular expression jit, removing the overhead the c code provided when calling RegExp.prototype.exec or RegExp.prototype.test.

Read more about this in the bug report

GVN + UCE combined

Just like most compilers Ionmonkey has an optimization called Global Value Numbering (GVN). It tries to remove or replace redundant instructions. In our implementation it is also the place where most replaces based on inputs happen, like constant folding, identity removal … After this pass we run Unreachable Code Elimination (UCE), which eliminates branches which are never taken. Optimizations taking place during GVN can improve the efficiency of UCE. More folded instructions can increase how many code that can be found to be dead. On the other hand removed code can again make it possible for GVN to optimize some extra instructions. Before Firefox 35 we only ran both passes once. As a result we sometimes didn’t find the most optimal code. With this release GVN and UCE are now combined, making it possible to have the same optimizations as running GVN and UCE multiple times after each other, but doing so in only one pass.

Learn more about this in an explanatory video

Lazy linking when recompiling code

The compilation of Ion code happens in three phases. First we have the graph creation phase (happens in IonBuilder), afterwards we do all sort of optimizations on that graph, ending with a linking phase, which finishes the compilation. In this sequence only the optimizing part can be done off the main thread. The other two phases block execution of JavaScript code. Lazy linking is about improving the last phase. Currently linking happens eagerly. As soon as a graph is ready we will try to link it. Even if we won’t ever execute that code (again) or if it gets invalidated due to better types. With lazy linking we wait until we want to execute that code before linking.

Read more about this in the bug report.

Compile non-CNG functions

Compile and Go (CNG) functions give extra performance since the caller cannot modify objects on the scope chain between compilation and execution [1]. With this warranty compilers can optimize access to these objects better. Now IonMonkey can only compile such functions. Non-CNG were stuck in the baseline compiler. In Firefox 34 and 35 these limitations were mostly removed. Given our most important class of non-CNG functions are in addons and chrome content. This will give again a nice boost to performance of these.

Read more about this in the bug reports: bug 1064777, bug 1045529 and bug 911570

Firefox 36

Baseline compile generators

Like mentioned in part 1, Firefox 30 saw the introduction of ES6 generators. In that release only supported for the interpreter (our first tier) was added. This was because the initial implementation tried to support this feature touching as little code as possible. But this method also disabled support for higher tiers. Six releases later we are now proud to also have support for generators in our second tier, the baseline compiler and that even before ES6 is released.

Wingo did the beginning of this huge task and has written a blogpost about it.
Read the blogpost about support of compiling generators in Baseline

ES6 Symbols

For the first time in a very long time a new primitive type was added to the engine. This all has to do with the upcoming spec. of ES6 Symbols. Nexto null, undefined, boolean, number and string, symbol is now present on that list. A Symbol is a unique and immutable primitive value. Without going too deep it can enable hiding of properties or fix name clashes between properties. It also can helps with not breaking existing codebases when new property names to the language are introduced. Not immediately something people need to use, but it can open new and maybe better ways to do some things.

Read more about this in the developer reference
Stackoverflow: why bring symbols to JS

Selfhosting String.prototype.substr, String.prototype.substring and String.prototype.slice

In Firefox 20 the selfhosting infrastructure landed. Since that release we can implement JavaScript features in JavaScript itself, instead of writing it in C. During runtime this selfhosted function will just get executed like somebody would have scripted in JavaScript. The major improvement here is that we remove the overhead from calling from JavaScript out to C and back. This gave improvements for e.g. “array.map(function() { /* … */})” since the C step was fully eliminated between calling the function and the function given in the argument. In this release substr, substring and slice are now also selfhosted. For these functions the speedup is mostly because the edge case checks (start is positive and length is smaller than string length) are now done in JavaScript and IonMonkey can reason about them and potentially remove those checks!

Read more about this in the bug report

Year in review: Spidermonkey in 2014 part 2

In the first part of this series I noted that the major changes in Spidermonkey in 2014 were about EcmaScript 6.0 compliance, JavaScript performance and GC improvements. I also enumerated the major changes that happened starting from Firefox 29 till Firefox 31 related to Spidermonkey. If you haven’t read the first part yet, I would encourage you to do that first. In part two I will continue iterating the major changes  from Firefox 32 till Firefox 34.

<- Year in review: Spidermonkey in 2014 part 1

Firefox 32

Recover instructions

During the Firefox 32 release recover instructions were introduced. This adds the possibility to do more aggressive optimizations in IonMonkey. Before we couldn’t eliminate some instructions since the result was needed if we had to switch back to a lower compiler tier (bailout from IonMonkey to Baseline). This new infrastructure makes it possible to recover the needed result by adding instructions to this bailout path. As a result, we still have the needed results to bailout, but don’t have to keep a normally unused instruction in the code stream.

Read more about this change
Read more about this in the bug report

Irregexp

Internally there has been a lot of discussion around the regular expression engine used in Spidermonkey. We used to use yarr, which is the regular expression engine of Webkit. Though there were some issues here. We sometimes got wrong results, there were security issues and yarr hasn’t been updated in a while. As a result we switched to irregexp, the regular expression engine in v8 (the JS engine in Chrome). In the long run this helps by having a more up to date engine, which has jit to jit support and is more actively tested for security issues.

Read more about this in the bug report

Keep Ion code during GC

Another achievement that landed in Firefox 29 was the ability to keep ionmonkey jit code during GC. This required Type  Inference changes. Type information (TI) was always cleared during GC and since ionmonkey code depends on it, the code became invalid and we had to throw the Ion code away. Since a few releases we can release Type information in chunks and since Firefox 32 we keep ionmonkey code that is active on the stack. This result in less bumps in performance during GC, since we don’t lose the Ion code.

Read more about this in the bug report

Generational GC

Already going back to 2011 Generational GC was announced. But it took a lot of work before finally accomplishing this. In Firefox 29 exact rooting landed, which was a prerequisite. Now we finally use a more advanced GC algorithm to collect the memory and this also paves the way for even more advanced GC algorithms. This narrowed the performance gap we had on the Octane benchmark compared to V8, which already had Generational GC.

Read more about this change

Firefox 33

8 bit strings

Since the first Spidermonkey release strings were always stored as a sequence of UTF-16 code units. So every string takes 16 bits per character. This is quite wasteful, since most of the strings don’t need the extended format and the characters actually fit nicely into 8 bits (Latin1 strings). From this release on strings will be stored in this smaller format when possible. The main idea was to decreases memory, but there were also some performance increases for string intensive tasks, since they now only need to operate over half the length.

Read more about this change

Firefox 34

ES6 template strings

ES6 will introduce something called template strings. These template strings are surrounded by backticks ( ` ) and among others will allow to create multi-line strings, which weren’t possible before. Template strings also enable to embed expressions into strings without using concatenation. Support for this feature has been introduced in Firefox 34.

Read more about ES6 template strings.

Copy on write

Some new optimization has been added regarding arrays. Before, when you copied an array, the JavaScript Engine had to copy every single item from the original array to the new array. Which can be an intensive operation for big arrays. In this release this pain has been decreased with the introduction of “Copy on write” arrays. In that case we only copy the content of an array when we start modifying the new array. That way we don’t need to allocate and copy the contents for arrays that get copied but not modified.

Read more about this in the bug report.

Inline global variable

A second optimization was about inlining constant global names and constant name access from singleton scope objects. This optimization will help performance for people having constant variables defined in the global scope. Instead of reading the constants out of memory every time the constant will get embedded into the code, just like if you would have written that constant. So now it is possible to have a “var DEBUG = false” and everywhere you test for DEBUG it will get replaced with ‘false’ and that branch will be removed.

Read more about this in the bug report

SIMD in asm.js

SIMD or Single Instruction Multiple Data makes it possible to execute the same operations on multiple inputs at the same time. Modern cpu’s already have support for this using 128-bits long vectors, but currently we don’t really have access to these speedups in JavaScript. SIMD.js is a proposition to expose this to JavaScript. In Firefox 34 experimental support (only available in Firefox Nightly builds) for these instructions in asm.js code were added. As a result making it possible to create code that can run up to 4x faster (Amdahl’s law) using SIMD. Now work is underway to fully optimize SIMD.js in all Ion-compiled JS. When this work is complete, and assuming continued progress in the standards committee, SIMD.js will be released to all Firefox users in a future version.

Read more about this change
Read more about this in the bug report
View the SIMD.js demos

Update: The next post in this series has been published:
Read part 3 of ‘Year in review: spidermonkey in 2014′ ->