I Learned How To Read Source Maps

Source maps are these magical things that translate output code back into their source form. It's essentially the Rosetta Stone or the Google Translate in the programming world. But have you ever wondered how they really worked? That's what I've been up to the past few weeks, looking for resources that explain the concept like I was five. And when I learned how they worked, my mind was blown by its simplicity.


A quick look at a source map

The structure that defines the source map is very self-explanatory. This is what you normally see when you open one, and usually the only part you'll care to read:

{
  "version" : 3,                          // Revision 3
  "file": "out.js",                       // The output file being mapped
  "sourceRoot": "",                       // Root for sources if not absolute
  "sources": ["foo.js", "bar.js"],        // Sources, relative or absolute
  "sourcesContent": [null, null],         // Embedded content, if source not available
  "names": ["src", "maps", "are", "fun"], // Symbols, i.e variable/method names
  "mappings": "A,AAAB;;ABCDE;"            // The fun part
}

Things needed in a mapping

  1. Output line
  2. Output column
  3. Source line
  4. Source column
  5. Source file name or content
  6. Source symbol

Basically, a mapping is just the coordinates of a symbol in both source and output, plus a few metadata about the files on both ends. A source map just a comma-separated sequence of these mappings. That's all there is to it. You've learned half of what source maps are!

Eliminating redundancy

Because mappings are arranged in the order of the output code, mappings occur line after line, left to right. Mappings on the same line can be found in a contiguous sequence with identical first values, the output line. To optimize this, mappings on each line are separated by a ; and the output line value is removed from the mapping. This means I can do a mappings.split(';') and each item in the array is a line, with index + 1 as the line number.

The next optimization is removing the strings in the mappings, the symbol and the file name or content. This is done by putting the strings in an array, and have the mapping use the index of that string instead. It's kinda like how most compression algorithms work, storing redundant chunks of data in a dictionary, and replacing where they're found with references. This is done for the file name/content and symbol parts of the mapping.

Reduction of representation

The values are optimized further by representing field values in succeeding mapping as offsets to the preceeding mapping's field values. For instance, the number sequence 9,12,21,17 can be represented as 9,3,9,-4, where 9 is 0(+9), 12 is 9(+3), 21 is 12(+9), and 17 as 21(-4). Not a whole lot for small numbers, but larger numbers, especially the values that represent line numbers, this would greatly reduce the size of the value's representation.

Here's where the funny jumble of characters come in. Enter variable-length quantity (VLQ). The idea is that instead of representing a value using a fixed-width chunk (which is wasteful for smaller values that don't utilize the entire width), VLQ encodes data in a way that smaller values use fewer "chunks" which are 6 bits wide. For larger values, it daisy-chains several chunks together.

C4321S C43210 C43210 C43210 ....
  • C is the continuation bit. A 1 tells the decoder that the next chunk is a continuation of the current chunk. A 0 means there's no continuation or terminates an ongoing continuation.

  • S is the sign bit. A 1 indicates that the value is a negative number. A 0 means a positive number. It only appears on the very first chunk, and its place on continuation chunks is used by data.

  • The remaining bits on the chunk are used by data, with the least significant bits going in first.

Contrived example time!

Here's an example of a random Base64 value broken down into its constituent parts. Note the reversal of the first three chunks in the values row because least significant bits go first.

Base64   = t      p      d      0
Decimal  = 45     41     29     52
Binary   = 101100 101001 011101 010101
Position = C4321S C43210 C43210 C4321S
Values   = 11101  01001  0110   1010
           14998                -10

Accuracy

You may have heard of the terms "lores" and "hires" source maps from some documentation. That's because out of the 5 possible fields in a mapping, all 5 can be omitted. You can simply have a blank line, or as the specs say, a mapping with 1, 4 or 5 fields. In addition, due to offsets, screw up the calculation of a value in a mapping and you doom the succeeding mappings. This is the reason why breakpoints can be accurate to the column, to the line, or no line at all (and jump to erroneous lines).

Conclusion

And there you have it! Now normally, you don't read the mappings directly but it's nice to know how they work. You might have to debug one someday. For the visually-inclined (like me), there is a tool online that visualizes the mapping, showing what the mapping points to in source and in output. Now if only there was a VS Code extension for that thing. Hmm...

Resources