Recent Posts

Removing the Close Button on Chromium's tab January 15, 2016

I have the fat mouse syndrome – there have been numerous times when I accidentally closed a tab when I actually meant to switch to that tab. I tend to keep many tabs open, which probably made the problem worse. It looks something like this:

While Chromium automatically hides the close button when the length of the tabs becomes very small, it does not help much since fat mouse errors can occur way before that. Also, to close a tab, I normally use the middle mouse button or the keyboard shortcut Ctrl-W, yielding a complete negative utility for these close buttons. Apparently, I’m not the only one who got annoyed.

Patching Chromium

Unfortunately, this aspect of UI isn’t user configurable. To get rid of these close buttons, some hacking on Chromium’s source code is required. I did that. It’s actually a little simpler than I had imagined thanks to the excellent code search engine and very high quality source code. After some digging, it turns out there is a well isolated function that decides whether the close button should be shown for each tab. To always hide the close button, simply short-circuit this function.

As mentioned, the patch is quite simple:

From f6283382f6f92581bd52202f400841753c5418e0 Mon Sep 17 00:00:00 2001
From: Yung Siang Liau <liauys@gmail.com>
Date: Sun, 1 Nov 2015 14:18:20 +0800
Subject: [PATCH] Always hide close button

---
 chrome/browser/ui/tabs/tab_utils.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/chrome/browser/ui/tabs/tab_utils.cc b/chrome/browser/ui/tabs/tab_utils.cc
index cd088c4..3129461 100644
--- a/chrome/browser/ui/tabs/tab_utils.cc
+++ b/chrome/browser/ui/tabs/tab_utils.cc
@@ -132,12 +132,15 @@ bool ShouldTabShowMediaIndicator(int capacity,
 bool ShouldTabShowCloseButton(int capacity,
                               bool is_pinned_tab,
                               bool is_active_tab) {
+  return false;
+  /*
   if (is_pinned_tab)
     return false;
   else if (is_active_tab)
     return true;
   else
     return capacity >= 3;
+  */
 }
 
 TabMediaState GetTabMediaStateForContents(content::WebContents* contents) {
-- 
2.6.3

Result (after two hours of compilation):

Yes!

Packaging (for ArchLinux)

I made an ArchLinux package description (PKGBUILD) for the above patched chromium based on the chromium package on ArchLinux official repository. It can be found here.

Processifying Bulky Functions in Python October 3, 2015

Long running Python scripts that repeatedly call “bulky” functions can incur growing memory usage over time. Other than calling gc.collect() and pray hard, can we do a bit better in dealing with the heavy memory footprint?

The problem with bulky functions

Some real examples I had faced include scripts that evaluate performance of many different combinations of features and model on a dataset:

def evaluate(features, model):
    # Load some big features
    # Train a big model
    # Measure the performance
    # After all the bulky work, return a number or a small dict, etc.
    return score

In a single Python script, repeatedly calling the above bulky function can cause the memory usage to grow over time, even though the calls are really independent of each other. That is to say, some amount of lingering memory will accumulate after every call to the function, as if there is an on-going memory leak. This continues until one has to restart the script to trigger a “refresh” to free up the memory.

Possible solutions

0. Find out where the memory leak is with a profiler

This is probably the only real solution to the problem. However, the effort needed might be too much for exploratory code that will not go into production. Sometimes, something more pragmatic is needed.

1. gc.collect()

This attempts to force the garbage collector to do its work. Often preceded by some del ... statements. Unfortunately, there is no guarantee that the memory will be released.

2. Invoke each execution of evaluate() in a new process, e.g. with a “driver” shell script

Now we are getting somewhere. The OS will be forced to clean up the memory incurred by the function, when the process terminates. However, this requires a lot of extra boilerplate code to be written – a command line interface and a shell script. Besides the annoying context switching of the mind between languages, more importantly this approach is only feasible for simple “end-user” function such as the one shown in this example. Generally, for a bulky function that needs to be called periodically within another function, we need to stay within the Python interpreter.

3. Spawn a new process in Python that will execute the function

Indeed, we can spawn a new process to execute the function using the multiprocessing module. This requires modification to the evaluate() function so that it takes in a Queue object for communicating back the result. Perhaps more preferably, we can make use of the Pool interface to help with the result communication:

from multiprocessing import Pool
pool = Pool(processes=1)
result = pool.apply(evaluate, features, model)

Looks acceptable. However, what if an exception is raised somewhere in the function? The error traceback now becomes

/usr/lib64/python2.7/multiprocessing/pool.pyc in apply(self, func, args, kwds)
    242         '''
    243         assert self._state == RUN
--> 244         return self.apply_async(func, args, kwds).get()
    245
    246     def map(self, func, iterable, chunksize=None):

/usr/lib64/python2.7/multiprocessing/pool.pyc in get(self, timeout)
    565             return self._value
    566         else:
--> 567             raise self._value
    568
    569     def _set(self, i, obj):

ValueError: operands could not be broadcast together with shapes (10,2) (10,20)

regardless of where the exception is actually raised. As one can guess, staring at the source code of the multiprocessing module is not particularly informative for debugging an error in the user level code1. Some further hacking is necessary to float the actual context up to the user on exception. Also, having to interact with the Pool object every time is kind of distracting. In other words, this solution is close but not yet ideal.

The processify decorator

Recently, I found a neat piece of code in the wild that solves this problem – the processify decorator. Full credit goes to schlamar who posted it on GitHub Gist. For completeness, I will show the (slightly modified) code here:

import os
import sys
import traceback
from functools import wraps
from multiprocessing import Process, Queue


def processify(func):
    '''Decorator to run a function as a process.
    Be sure that every argument and the return value
    is *pickable*.
    The created process is joined, so the code does not
    run in parallel.
    '''

    def process_func(q, *args, **kwargs):
        try:
            ret = func(*args, **kwargs)
        except Exception:
            ex_type, ex_value, tb = sys.exc_info()
            error = ex_type, ex_value, ''.join(traceback.format_tb(tb))
            ret = None
        else:
            error = None

        q.put((ret, error))

    # register original function with different name
    # in sys.modules so it is pickable
    process_func.__name__ = func.__name__ + 'processify_func'
    setattr(sys.modules[__name__], process_func.__name__, process_func)

    @wraps(func)
    def wrapper(*args, **kwargs):
        q = Queue()
        p = Process(target=process_func, args=[q] + list(args), kwargs=kwargs)
        p.start()
        ret, error = q.get()

        if error:
            ex_type, ex_value, tb_str = error
            message = '%s (in subprocess)\n%s' % (ex_value.message, tb_str)
            raise ex_type(message)

        return ret
    return wrapper

Note: In the original version, the usage of p.join() causes a risk of deadlock situation if the object returned from the target function is too large. The above code has been modified and it seems to have solved the problem. More information can be found in the comment section of the Gist.

To use it, simply decorate the bulky function:

@processify
def evaluate(features, model):
    # Load some big features
    # Train a big model
    # Measure the performance
    # After all the bulky work, return a number or a small dict, etc.
    return score

and the function will be run cleanly in a new process, as long as the input and output of the function are picklable.

Very nice! What I like about it is:

  1. Very easy to use, no modification to the original function required (in fact, this is probably a perfect use case of a decorator)
  2. A trick that seems to successfully pickle nested function
  3. Able to show some useful context on exception as opposed to the case above using Pool.

Regarding point (3) above, consider the following code:

@processify
def work():
    """Get things done here"""
    import numpy as np
    np.random.rand(10,2) + np.random.rand(10,20)
    return np.random.rand(10,2)

if __name__ == '__main__':
    work()

The error traceback is shown as:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-c2d43235dcae> in <module>()
     66
     67 if __name__ == '__main__':
---> 68     work()

<ipython-input-3-c2d43235dcae> in wrapper(*args, **kwargs)
     43             ex_type, ex_value, tb_str = error
     44             message = '%s (in subprocess)\n%s' % (ex_value.message, tb_str)
---> 45             raise ex_type(message)
     46
     47         return ret

ValueError: operands could not be broadcast together with shapes (10,2) (10,20)  (in subprocess)
  File "<ipython-input-3-c2d43235dcae>", line 18, in process_func
    ret = func(*args, **kwargs)
  File "<ipython-input-3-c2d43235dcae>", line 55, in work
    np.random.rand(10,2) + np.random.rand(10,20)

which is quite helpful.

As long as a bulky function is repeatedly called in a coarse-grained manner (i.e. time spent in a single execution of function is dominant in the caller’s loop) and has lightweight input and output, the processifying trick is a sensible technique that can be used to control the memory usage.

I’ve added this to my standard utility toolbox for quick data analytics projects, and it just saved me a time or two in a recent competition which I’ll probably write about soon.

No more gc.collect() blindly!


  1. This issue seems to have been fixed in Python 3.4: see the bug tracker

How Not to Use pandas' "apply" August 28, 2015

Recently, I tripped over a use of the apply function in pandas in perhaps one of the worst possible ways. The scenario is this: we have a DataFrame of a moderate size, say 1 million rows and a dozen columns. We want to perform some row-wise computation on the DataFrame and based on which generate a few new columns.

Let’s also assume that the computation is rather complex, so those wonderful vectorized operations in that comes with pandas are out of question (the official performance enhancement tips is a nice read on this). And luckily it has been packaged as a function that returns a few values:

def complex_computation(a):
    # do lots of work here...
    # ...
    # and finally it's done.
    return value1, value2, value3

We want to put the computed results together into a new DataFrame.

A natural solution is to call the apply function of the DataFrame and pass in a function which does the said computation:

def func(row):
    v1, v2, v3 = complex_computation(row[['some', 'columns']].values)
    return pd.Series({'NewColumn1': v1,
                      'NewColumn2': v2,
                      'NewColumn3': v3})
df_result = df.apply(func, axis=1)

According to the documentation of apply, the result depends on what func returns. If we pass in such a func (returning a Series instead of a single value), the result would be a nice DataFrame containing three columns as named.

Expressed in a more loopy manner, the following yields an equivalent result:

v1s, v2s, v3s = [], [], []
for _, row in df.iterrows():
    v1, v2, v3 = complex_computation(row[['some', 'columns']].values)
    v1s.append(v1)
    v2s.append(v2)
    v3s.append(v3)
df_result = pd.DataFrame({'NewColumn1': v1s,
                          'NewColumn2': v2s,
                          'NewColumn3': v3s})

However, at the first glance, the loopy version just does not seem elegant compared to the apply version. Plus, leaving the work of putting together the results to pandas seems to be a good idea – could some magics be performed in the background by pandas, making the loop complete faster?

That was what I thought, but it turns out we have just constructed a silent memory eating monster with such use of apply. To see that, let’s put together the above pieces of code and consider a minimal reproducible example (and the pandas version here is 0.16.2):

import pandas as pd
import numpy as np
%load_ext memory_profiler

def complex_computation(a):
    # Okay, this is not really complex, but this is just for illustration.
    # To keep reproducibility, we can't make it order a pizza here.
    # Anyway, pretend that there is no way to vectorize this operation.
    return a[0]-a[1], a[0]+a[1], a[0]*a[1]

def func(row):
    v1, v2, v3 = complex_computation(row.values)
    return pd.Series({'NewColumn1': v1,
                      'NewColumn2': v2,
                      'NewColumn3': v3})

def run_apply(df):
    df_result = df.apply(func, axis=1)
    return df_result

def run_loopy(df):
    v1s, v2s, v3s = [], [], []
    for _, row in df.iterrows():
        v1, v2, v3 = complex_computation(row.values)
        v1s.append(v1)
        v2s.append(v2)
        v3s.append(v3)
    df_result = pd.DataFrame({'NewColumn1': v1s,
                              'NewColumn2': v2s,
                              'NewColumn3': v3s})
    return df_result

def make_dataset(N):
    np.random.seed(0)
    df = pd.DataFrame({
            'a': np.random.randint(0, 100, N),
            'b': np.random.randint(0, 100, N)
         })
    return df

def test():
    from pandas.util.testing import assert_frame_equal
    df = make_dataset(100)
    df_res1 = run_loopy(df)
    df_res2 = run_apply(df)
    assert_frame_equal(df_res1, df_res2)
    print 'OK'

df = make_dataset(1000000)  

Before anything else, let’s check the correctness first on a small set of input data (i.e. both implementations yield identical results):

test()
# OK

And now it’s time for some %memit. The loopy version gives:

%memit run_loopy(df)
# peak memory: 272.18 MiB, increment: 181.38 MiB

How about the elegant apply?

%memit run_apply(df)
# peak memory: 3941.29 MiB, increment: 3850.10 MiB

Oops, that’s a 10 times more in memory usage! Not good. Apparently, in order to achieve its flexibility, the apply function somehow has to store all the intermediate Series that appeared along the way, or something like that.

Speed-wise we have:

%timeit run_loopy(df)
# 1 loops, best of 3: 36.2 s per loop

%timeit run_apply(df)
# 1 loops, best of 3: 2min 48s per loop

Looping is slow; but it is actually a lot faster than this way of using apply! The overhead of creating a Series for every input row is just too much.

Combining both its memory and time inefficiency, I have just presented to you one of the worst possible ways to use the apply function in pandas. For some reason, this did not appear obvious to me when I first encountered it.

TL;DR: When applying a function on a DataFrame using DataFrame.apply by row, be careful of what the function returns – making it return a Series so that apply results in a DataFrame can be very memory inefficient on input with many rows. And it is slow. Very slow.