Varun Ramesh's Blog

Archive     Feed

Loop, Autoplay, Muted - Say Goodbye to Animated GIFs

Tuesday, April 17th 2018

Animated GIFs are awesome for showing off projects. Unfortunately, GIFs have significant limitations. For one, they only support 256 colors per frame. This means that some color information is lost, resulting in that classic 90s webpage aesthetic. Futhermore, GIFs are big - they are losslessly compressed. This GIF below of Mario Kart from one of my AI projects is almost 10 MB!

Animated GIF, 15 FPS, 9.4 MB

If only we could use videos - compression techniques for video are lossy and more advanced, using techniques like motion vectors. Well, it turns out we can combine the loop, autoplay, and muted attributes on HTML5 video elements to make them behave exactly like GIFs.

<video loop autoplay muted src="..."></video>

Here’s the video that the GIF above was generated from. It’s way smoother, and it’s only 1.1 MB!

H.264 Video, 30 FPS, 1.1 MB

Some things to note:

Better GIF Compression

If you must use a GIF, either for a GitHub README or some other restricted environment, I highly recommend giflossy. It improves GIF size by enabling lossy compression. You can adjust how lossy the compression is in order to reach a target file size.

View Comments

The Boehm GC Feels Like Cheating

Thursday, March 15th 2018

A while ago, I worked on a simple interpreted Lisp, primarily as a way for me to understand the philosphy and ideas behind Lisp languages.

However, I ran into a problem when trying to integrate garbage collection. Because I didn’t plan garbage collection from the beginning, I shot myself in the foot, due to the fact that I had pointers to Lisp objects (expr*) all over the C stack. As an example, take a look at my implementation of the CONS function, which is a builtin (a language primitive written in C). CONS takes two expressions, evaluates them, and returns a cell where the head and tail point to the first and second evals respectively.

expr *builtin_cons(scope *scope, expr *arguments) {
  expr *first = eval(scope, nth(0, arguments));
  expr *second = eval(scope, nth(1, arguments));
  return create_cell(first, second);

Note how expr *first is stored on the stack while expr *second is being evaulated. This is a problem - what if a GC triggers during the eval of the second argument 1. How will I know that there is still a reference to the first expression so that it isn’t freed?

I considered rewriting my interpreter to do the following:

The code would basically look like this:

expr *builtin_cons(scope *scope, expr *arguments) {
  expr *first = expr_stack_push(eval(scope, nth(0, arguments)));
  expr *second = expr_stack_push(eval(scope, nth(1, arguments)));

  expr* result = create_cell(first, second);
  return result;

Well, I already wrote most of the interpreter, so it’s too much work to go back and fix everything. Turns out there’s a shortcut - enter conservative garbage collection.

Conservative Garbage Collection

Conservative garabage collectors are drop-in GCs that work with any C program. At GC points, they look at the C stack and registers directly to figure out what objects are still reachable. However, the GC doesn’t actually know if something is a pointer to the stack, thus it must behave conservatively. If a number happens to look like a heap pointer, then the GC will assume that it’s a heap pointer. I decided to integrate the Boehm GC, which has been used by Mono and other production languages.

To start, we add a call to GC_INIT() before we do any allocations. Now, we replace all our malloc(size) calls with GC_MALLOC(size) and explicit free(ptr) calls with GC_FREE(ptr). We can also use GC_MALLOC_ATOMIC(size) to allocate atomic buffers - the GC won’t scan inside these when looking for more heap pointers, so it’s a slight optimization. I used these for strings, and you probably want to use it for things like matrices.

Now we… Wait! That’s it! Full garbage collection in a few lines. This is the diff where I added garbage collection.

This is pretty awesome - the Boehm GC works for any program, as long as pointers aren’t obfuscated in some way. It was far easier than having to do the hard work of writing my interpreter properly!

Precise Mode

I noticed that the Boehm GC has a precise mode, where you can tell it exactly where in structs the pointers to other objects might be. This should make GC faster, as it doesn’t have to scan as much, but I couldn’t find any documentation on how to use this.

Another Method for Tracking the Root Set

I later came across another lisp project - minilisp by rui314. It features an interesting approach to tracking the root set.

In each C function, you create an array void* root of n + 2 pointers, where n is the number of expressions that this C function will have handles to. Obj* (the equivalent of my expr*) pointers will go in here. The C code will then hold Obj** double pointers in local variables, which point into the root array.

Now, when you recieve an Obj*, you save it using code like this *obj = read_expr(...);, where obj is an Obj**. Thus, all Obj* pointers used in a C function are in this array.

Here’s the important part - the first of the n + 2 pointers is a pointer to the root array of the caller of the current function. Furthermore, whenever we call a function, we always pass in the current root as the first argument, so that our callee’s root can point back to our root. Therefore we have a stack of root arrays.

Finally, the last of the n + 2 pointers is a sentinel ROOT_END. Now, when we do a GC, we start with our current root, scan until the ROOT_END, then we traverse to the caller’s root and scan until ROOT_END, and so on. Thus we traverse the entire set of reachable objects.

  1. Interpreters typically trigger GC during the allocation routine, so if the second expression allocates a new object, it could trigger a GC. 

View Comments

No More Primitives - What Python and Java Get Wrong

Wednesday, March 14th 2018

Many languages such as Java distinguish between primitives and objects. Primitives are stored directly on the stack, and don’t have any callable methods. Objects have callable methods and fields, and are typically allocated on the heap with only a pointer stored on the stack 1.

Furthermore, in Java, every primitive has a corresponding object that wraps the value of the primitives - these objects have methods, and can also be used to allocate primitives on the heap when necessary. These wrappers are typically known as boxed values. This duplication can lead to confusion, as in the snippet below. We have two variables that hold the same integer, but we can only call methods on one.

class Main {
  public static void main(String[] args) {
    Integer a = new Integer(3);
    int b = 3;
    System.out.println(a.floatValue()); // Works
    System.out.println(b.floatValue()); // Doesn't work

In theory, this confusion is worth it because we get a performance benefit - primitive ints take up exactly 32 bits and don’t require heap allocations, which are slow relative to stack operations. They are also stored directly on the stack, and thus don’t require a dereference (and possible cache miss) to access the value.

However, I believe primitive types could be removed entirely without hurting performance. All you need are objects. First let’s see how dynamic languages do this.

Dynamic Languages

Lots of dynamic languages unify values under one object model, meaning that there are only objects. Python does this, and as a result integers are always boxed and stored on the heap. This means that addition requires dereferences, as well as the allocation of new objects, which makes integer operations slow 2.

Unlike Python, Ruby stores integers, floating points, and other simple values directly on the stack. However, at the same time, they act as objects, and are part of a unified object model. #=> 2

Ruby seems to have the best of both worlds - we have callable methods and we don’t have dereferences/allocations for primitive operations. Ruby does this through a special-case hack for certain types called immediates, which it stores directly on the stack. Every stack value has a simple enum tag which allows us to differentiate between an object pointer on the stack and integers and floats on the stack3.

Now assume that we are searching for the next method on a reciever4. We start by getting the class of the object on the stack. For immediates, the rb_class_of method returns hard-coded pointers to the appropriate class for the immediate - for regular objects we use the stored class pointer. Now we can scan the class’s method table to find the method implementation. An outline of the ruby VM code that does this is below.

static void vm_search_method(..., VALUE recv)
  VALUE klass = rb_class_of(recv);

static inline VALUE rb_class_of(VALUE obj)
  if (RB_IMMEDIATE_P(obj)) {
    if (RB_FIXNUM_P(obj)) return rb_cInteger;
    if (RB_FLONUM_P(obj)) return rb_cFloat;
    if (obj == RUBY_Qtrue)  return rb_cTrueClass;
    if (RB_STATIC_SYM_P(obj)) return rb_cSymbol;
  else if (!RB_TEST(obj)) {
    if (obj == RUBY_Qnil)   return rb_cNilClass;
    if (obj == RUBY_Qfalse) return rb_cFalseClass;
  return RBASIC(obj)->klass;

Back to Statically Typed Languages

It’s clear that we can have the best of both worlds in dynamic languages. However, we can also have this in statically typed languages like Java. Consider C#, where primitive types can be dispatched on.

class MainClass {
  public static void Main (string[] args) {
    double a = 0.3;
    Console.WriteLine (a.ToString());

This is likely implemented in a similar fashion where certain types are hard-coded to look up in the appropriate method table. In this situation there is no type enum on the stack, because we can statically know the types of objects at any position on the stack. Java could have done something similar. Primitives would still need to be boxed when dealing with generics and polymorphism 5, but that could happen transparently without the user noticing. Furthermore, the Integer class would be both immutable and final, in order to prevent mutation and subclassing.

Ultimately, we would have had a cleaner, object-oriented heirarchy suitable for higher level languages such as Java.

  1. The Java VM can use escape analysis to determine if an object will leave the method where it was created. If it doesn’t escape, then the object can be allocated on the stack. Note that objects still need space for a class pointer, which primitives don’t have. 

  2. Python implements an optimization whereby small integers are preallocated in an array so that they don’t need to be reallocated. Some discussion into this can be found here. The CPython code related to this is here

  3. Check out NaN Tagging for an interesting discussion on how to keep the tag and data for primitives in one 64-bit value. 

  4. In, a is the “reciever.” 

  5. Java generics use the same byte-code for different specializations. For example ArrayList<Integer> must have the same bytecode code as ArrayList<Object>. This means primitives must be boxed to look like objects. C# does better - it actually generates ‘reified’ bytecode for different specializations of a generic class. This means that List<int> can use stack values, while List<object> uses pointers to the heap. 

View Comments

I Allowed All Web Push Notifications for a Week

Sunday, March 11th 2018

You can’t escape them. Web push notifications are the new e-mail newsletter, and every site wants you to subscribe. Up until last week, I had never allowed any site to send me notifications. Because I’m curious what it’s like to have sites constantly ping you with content, I decided that for a week, I would enable all push notifications on every site that I visit.

Pre-2013, I used Google Reader to keep up with articles published by various sites. There were advantages and disadvantages. For blogs and other sources with a high signal-to-noise, it was pretty great. Unfortunately, there was no good way to keep up with general tech or world news. I subscribed to sites like The Verge, Joystiq, and Reuters, but these sources published hundreds of articles a day, of which I would only find a few interesting. Then Google Reader shutdown.

Fast-forward a few years, and I started to use Facebook as my primary general news feed. Over the years, Facebook has moved away from posts by friends and towards posts by pages 1. While many people complain about the algorithm, I’ve actually found it to be reasonable - it manages to sort out interesting news items from the mass of sites I “like.” I also use Hacker News as a form of discovery - it’s otherwise hard to find articles out in the developer blogosphere. Finally, I still use RSS through Feedly, though I only subscribe to low volume blogs.

The experiment began on a Sunday. I quickly ended up subscribing to notifications from Facebook, Google Calendar, and other sites I frequently use. I then ended up subscribing to sites like CNET and Product Hunt. I noticed that lots of cryptocurrency news sites use push notifications. I also found that if one site used push notifications, their sister sites under the same digital media company would likely use notifications as well.

The notifications were extremely annoying, as I expected. However, I actually expected them to be worse. It turns out that most sites don’t send a push notification for every article, but only for certain articles. Still, I found that notifications interrupted me whenever I was working. I ended up using “do not disturb” mode frequently. RSS readers are a better experience in every way for keeping up with the sites you care about.

Going forward, I would not even subscribe to notifications from messaging or email tools, since I found those to be unnecessarily distracting.

  1. Facebook has recently announced that they want to move back to posts by friends. It will be interesting to see how this affects my use of Facebook as a news feed. 

View Comments

Editing Gameplay Videos without Re-encoding using FFmpeg

Tuesday, December 26th 2017

I recently worked on a Lego Island 2 let’s play with my brother. It was recorded using Dxtory with the x264vfw codec, meaning that the saved recordings are H.264 streams in an AVI container1. Our recordings are 1920x1080 at 60fps. Audio commentary was recorded separately in Audacity.

When it came time to edit the videos, I fired up Adobe Premiere2, but quickly ran into a problem. Rendering the 1080p/60fps videos was taking upwards of an hour. Futhermore the output videos had significantly lower quality due to the re-encoding. I knew that after YouTube transcoded the videos for it’s internal format, the final output would look even worse. To fix this, I did some experimenting with FFmpeg, a command-line based video processor, and found a worflow for editing our let’s play without re-encoding.

These tips require the command-line, but if you can get past that barrier, you’ll be able to edit your videos without long rendering times and quality downgrades. Also, you can do this for free without purchasing any software!

Trimming Footage

You can use FFmpeg to trim footage off of the beginning and end of your video. Below is an example of trimming a 20-second clip from the 100 second mark 3.

ffmpeg -ss 100 -i input.avi -t 20 -c copy output.avi

If you run this, you might find that your video isn’t exactly 20 seconds, but a little bit longer. This is because this method trims the video starting from the nearest keyframe to the provided timestamp. Unfortunately, there’s no way to get around this without re-encoding.

Concatenating Footage

Now let’s say you have two separate gameplay recordings that you want to concatenate together. FFmpeg lets you do this without having to re-encode. The interface for this is a bit strange - you have to create a file with file '{VIDEO_FILE_NAME}' on each line. Here’s a snippet for cutting two videos out of a source file and concatenating them together:

ffmpeg -ss 100 -i input.avi -t 20 -c copy output_1.avi
ffmpeg -ss 200 -i input.avi -t 20 -c copy output_2.avi
echo "file 'output_1.avi'\nfile 'output_2.avi'" > concat_list.txt
ffmpeg -f concat -safe 0 -i concat_list.txt -c copy concatenated.avi

Adding in Audio Commentary

This is the snippet that I use to mix in audio commentary with game audio, assuming that the audio commentary is already synced with the video. It’s a bit complicated, as it uses FFmpeg’s filtergraph functionality. Note that the audio is encoded using the mp3 codec. I keep my audio in FLAC form up till this point, so this is the first point where the audio is encoded.

ffmpeg -i input.avi -i commentary.flac \
    -filter_complex "[0:a]volume=0.5[volumeadj];[volumeadj]aformat=sample_fmts=s16:channel_layouts=stereo[volumeadj_fmt];[volumeadj_fmt][1:a]amerge=inputs=2[merged];[merged]volume=2.0[aout]" \
    -map 0:v -map "[aout]" \
    -c:v copy \
    -c:a libmp3lame -ac 2 \

However, this assumes that the audio is already synced. To sync it, you have two options:

  1. Pad silence at the beginning of your recording in Audacity.
  2. Concatenate silence to the beginning of the audio in FFmpeg, which is shown in the snippet below.
ffmpeg \
  -t 10 -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 \
  -i commentary.flac \
  -filter_complex "[0:a][1:a]concat=n=2:v=0:a=1[out]" \
  -map "[out]" \
  -c:a flac -ac 2 \

To find sync points, what I’ve done is move the cursor up and down in the menu, while saying the words “up” and “down.” This gives me a point in both the commentary and game-play recording to sync up 4.

Putting it All Together

You shouldn’t use this process for videos with a lot of edits as it’s much easier to use an NLE such as Premiere or Vegas 5. However, if you’re creative about how you split-up and sync your videos, I think this approach is worth it. The output videos are the same quality as your raw recordings, and the encoding process is often faster than real-time. Check out our let’s play below!

More Complicated Editing

Here’s an example of a more complicated edit involving multiple overlays and zoom levels that show up at different times. The entire edit, including trimming, audio syncing, and audio merging is done in one FFmpeg command. Since overlays are applied to the video, it must be re-encoded.

ffmpeg \
    -framerate 60 -loop 1 -i "zoomed-out-frame.png" \
    -i "gameplay-recording.ts" \
    -framerate 60 -loop 1 -i ".zoomed-in-frame.png" \
    -i "commentary.flac" \
    -t 4.11 -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 \
    -filter_complex "[1]scale=1024:640:flags=neighbor[scaled];[0][1]overlay=25:144:shortest=1[zoomedout];[2][scaled]overlay=50:38:shortest=1[zoomedin];[zoomedout][zoomedin]overlay=shortest=1:enable='gte(t,105)'[out];[3:a]atrim=19.131[trimmedaudio];[4][trimmedaudio]concat=n=2:v=0:a=1[adjustedaudio];[1:a]volume=0.35[volumeadj];[volumeadj][adjustedaudio]amerge=inputs=2[aout]" \
    -map "[out]" -map "[aout]" \
    -ss 00:00:21 -to 00:25:07 \
    -c:v libx264 -crf 18 \
    -c:a libmp3lame -ac 2 \
    -y output.mp4

And the resulting video…

For this kind of video, I highly recommend using an NLE!

  1. The CRF is set to 23 (the default). I also had to increase the encoding speed, at the expense of having larger files. 

  2. I’m using CS5.5 

  3. More info on trimming videos with FFmpeg

  4. Once I forgot to do this and had to sync up by trying to match button press sounds with in-game actions! 

  5. The reality is that, right now, there aren’t any good open-source NLEs so you’ll have to open your wallet for that kind of editing. The most promising one that I’ve looked at is OpenShot, and it just got an update that’s supposed to improve stability. 

View Comments