Revisions of Latex documents

latexdiff example
latexdiff example output used on lebowski ipsum.

Like lots of other people I have come to like latexdiff to highlight changes between versions of latex documents. It takes two latex files as input and outputs a new latex file, in which changes are highlighted. I use a lot of pgf/tikz for graphics and latexdiff will destroy tikzpicture code. To avoid that I call latexdiff as (the hint how to do this came from stackexchange):

latexdiff -c PICTUREENV='(?:picture|tikzpicture|DIFnomarkup)[\\w\\d*@]*' VersionBefore.tex VersionAfter.tex > Diff.tex

Pyorbit and numpy

On Wednesday I started looking into py-orbit (the orbit tracking code for python). In order to quickly see what happens with the particles I wanted to be able to directly access the particle coordinates as a numpy array. The builtin c functions will only give you a pointer to the particle array and the builtin python functions give single particle coordinates. Accessing those for turn-by-turn diagnostics would give you a huge overhead of python code. This is not a flaw as everything time-critical in py-orbit is written on the C++ level. Nevertheless as a beginner and cython/python lover I would like to be able to use numpy for a fast view on the particle coordinates. For the direct access on a c array pointer via numpy I found help on a gist by Gael Varoquaux.

Unfortunately it turns out that py-orbit allocates memory in chunks of 10 particles. So the two dimensional array of particles is not a continuous chunk of memory and therefore there is, as I see it no easy way to access it as a numpy array. Nevertheless I think it is worthwhile to post this here so others do not waste time of this.

If you want it really badly I think the way to go is to modify py-orbit to allocate the memory for all particles at once in the beginning, but instead you can also just access the arrays in cython directly so I decided it was not worth the effort.

PDF-OCR: Sorting documents into searchable PDFs

I’v gotten rid of paper at home by installing an automatic scanner/OCR/document sorting system on based on an all-in-one printer-scanner and a raspberry pi.

For years I’ve been struggling to keep up with bureaucracy. I do really dislike everything to do with official papers. In most years that meant that I would just briefly read official letters and documents before putting them in a box. That summarizes my sorting system pretty well. Come the end of the year I would take a day or two to sort them into folders by category. I’ll never understand why, at least for those letters, we have not yet gone digital. In Germany the laws would have permitted that for more than 10 years.

Since I started my PhD I have been making an effort be more careful about my bureaucracy. I began to use the printer/scanner combination at work to archive a digital version of the most important documents in order to be able to find them quickly. But for most documents they still lived at home in a box. The main reason was: I was reluctant to bring a box of personal documents to work and scan them, even off-hours it seemed inappropriate.

Then my father told me that his all-in-one printer supposedly does OCR (optical character recognition) on documents he scans (unlike the machine at the office). OCR means that your PDF is not made of of images, as it is with most scanners. The computer also reconstructs the text from the images and allows you to search through the PDF and jump to a page as you can in PDFs that you generate from, say, Word documents. Searchable PDFs of course have the important quality of being searchable. In theory you don’t have to sort them at all.

In practice you may want some presorting, say by the company that sent you a letter. But that is something you can do easily once you have searchable PDFs.

When I bought the printer/scanner I paid attention that it offers the possibility to scan to a network drive without having a computer attached. This way it can directly deposit scanned documents on the hard disk of our network attached storage system (we’ve got a synology DS213 but really any NAS would be fine).

For the scanning I thought I’d use the other computer constantly running in our home, a Raspberry Pi whose tasks so far include logging of the temperature in different rooms and remote control of power outlets. At first I thought I’d have to do everything myself but soon found that somebody had already done the work: pypdfocr, a great python software by Virantha Ekanayake takes multipage-image-only PDFs as an input, disassembles them, runs them trough the open source tesseract OCR engine and puts them back together as conveniently searchable PDFs. Then it puts the PDFs into folders depending on configurable keywords (think “Invoice”, “Insurance”, “Tax”)

More than that, it can conveniently be installed from the python package index (PyPI) using the command

pip install pypdfocr

The first time running it on the Raspberry Pi though the output was unfortunately not searchable PDFs. In my case the reason was that the tesseract-version on the Raspberry Pi package repositories was either outdated or a modified version. The fix in my case was downloading tesseract from Google (they develop it) at google code and compiling it myself. The necessary steps are:

  1. download tesseract and unpack it on the raspberry pi

  2. run the setup and compilation from main folder of the source:
    make && sudo make install

    If your raspberry pi complains during the configuration/compilation that software or libraries are missing install them using the package manger. Pay attention that you might need the -dev version of the libraries.

On top of that I wanted to make sure that pypdfocr automatically scans all PDFs that go into the incoming folder on the NAS. To do that did the following:

    1. I mounted the documents directory from the nas on


    1. I instructed the scanner to scan into the documents directory, subfolder


      on the NAS

    1. I wrote a short shell script that regularly checks for new PDFs and if it finds any runs


while sleep 60 #every minute
  for i in /nfs/documents/paul/*.pdf #for all pdfs in the folder
    echo "found file " \$i #output name
    sleep 20 # wait for 20 seconds to make sure it is there
    pypdfocr \$i -f -v -c config.yaml # run pypdfocr.

In theory, pypdfocr can do the last step itself (heck it can even upload the stuff to evernote if you’re into that). However depending on your system of network shares you can not always be sure that a file is correctly locked. Then it can happen that you start the conversion process on a file that is currently being written by the scanner. In my case the scanner usually takes less than 10 seconds to write a file. Therefore I wait for 20 seconds after I notice a new file. This way I am sure the file is fully on the NAS before starting the conversion.

The file config.yaml contains a dictionary of disk folders and corresponding keywords. If a the text in a scanned document matches a keyword, it automatically gets sorted into the folder on the disk.

Fixing missing ia32-libs on Ubuntu 13.10

Some commercial packages (namely AftershotPro) still depend on the old 32 bits compatibility libs for ubuntu. The package is obsolete for a long time now as the new ubuntu distributions are compatible with 32-bits stuff from the start (at least that’s what I gather from various sources).

For aftershot there were two approaches:

  1. Install the aftershot 32 bit deb version with dpkg -i –force-architecture. This approach is now obsolete as the 32-bits version is outdated
  2. Manually install ia32-libs from somewhere else and install aftershot 64 bit deb. This step is unnecessary as stated before.
  3. Force-install the aftershot 64 bit deb. This leads to complications later on as every time packages are installed the installer tries to fix the supposedly broken dependencies by removing aftershot.

To fix approach 3, I prepared an ia32-libs dummy package with no content. It does nothing, but it makes the apt system believe that you have the package installed, after which aftershot and any other packages that require ia32-libs run just fine.

You can download the package here: ia32-libs_1.0_all.deb

The commands used to prepare the package were:

equivs-control ia32-libs
# edit ia32-libs to give the package name 
# ia32-libs and a sensible description
equivs-build ia32-libs

Coding: python particle to grid interpolation

In particle tracking simulations you often need to interpolate particles onto a grid in one or more dimensions. Recently I decided to write a linear particle-to-grid interpolation in one dimension in python.This is an educational introduction into interpolating particles onto a one-dimensional grid.

Interpolation of particles onto a grid

Particles are usually described by a vector of coordinates in an $ n$-dimensional phase space. Often people want to compute a density of particles along one of the coordinate axes. Let us first start with the example of particles in a two-dimensional space, they have coordinates $ (x,x’) $. A common question will be: What is the density of the particles projected on the $ x $ axis?

note: code blocks from here on are executed one after another in an ipython notebook. You can download the notebook here.

To answer the question you first have to think about: How do you compute the density? A builtin way in python/pylab is the histogram function.

%pylab inline
# Generate 1 Million particles
hist(x,bins=10) # Calculate the histogram
ylabel("number of matches")

The histogram function computes the number of particles between the left and right edges of the bars respectively. We can also look at the data:

print "there are %d particles with x between\
 %0.2f and %0.2f" % (numbers[0], bins[0],bins[1])

there are 154841 particles with x between 0.00 and 0.1

The histogram function does exactly this, count the number of particles between the edges of a bin. Visually the edges are represented by filling a bar between the left and the right edge up to a height proportional to the resulting number of particles.

Another way to look at this would be to say: the histogram function computes the density at the centers of the bins by attributing each particle to its nearest neighbouring bin. We could then compute a value for the density as follows:

# compute the width of a bin
# compute the positions of the centers of the bins
# plot the density (particles per length unit)

This looks nice and smooth, but you might say. But if you think about it, for a particle in the middle of two bins, say at the right edge of the first bin at 0.95, is it really justified to assume that it only contributes to the grid point at 0.5? It virtually has the same distance to the grid point at 1.5!
\fill [red] (0.95,.15) circle (3pt);
\draw [line width=2pt] (0,0) — (11,0);
\foreach \x in {0,…,11}
{\draw [line width=2pt] (\x,0)–(\x,0.3);
\foreach \x in {0,…,10} {
\node () at ($(\x,-.3) + (0.5,0)$) {\x};
%\fill [red] (0.95,.15) circle (2pt);
\caption{A grid with one particle at 0.95}

This problem can become even worse. Let us define point positions $$x_i=10(1-1/i)$$. Here see a figure with particles with $i$ between 10 and 25:

\foreach \x in {10,…,25}{
\fill [red] (\pgfmathresult,.15) circle (3pt);
\foreach \x in {0,1,…,10} {
\node () at ($(\x,-.3) + (0.5,0)$) {\x};
\draw [line width=2pt] (0,0) — (11,0);
\foreach \x in {0,…,11}
{\draw [line width=2pt] (\x,0)–(\x,0.3);
\caption{A grid with particles of decreasing spacing}

You can see how the particle distance (which is inversely proportional to the intuitive density) changes smoothly. The density per bin can be calculated analytically to be $\rho(x)=\frac{x \delta}{(-10 + x)^2}$ with $\delta$ the bin width. Let us plot the point positions and the resulting histogram for $i$ between 10 and 25. The histogram however has a step-shape as you can see below and does not compare too well to the analytic result (grey vertical lines indicate particle positions, the green curve is the analytic particle density):

x=10*(1-1/numpy.arange(10,25.)) # the points
density=lambda x: (10/(10/(-10 + x) + (10/(-10 + x))**2))**(-1)*.05833
for i in x:
  axvline(i,color="grey",linewidth="1") # grey lines for the point positions
hist(x,bins=10) # histogram of point positions
plot(numpy.arange(9,9.6,.01),density(numpy.arange(9,9.6,.01))) # analytic density function

You see how steppy the histogram looks when you compare it to the analytically calculated density? Maybe we can do better. If we think about the single particle from above again
\fill [red] (0.95,.15) circle (3pt);
\draw [line width=2pt] (0,0) — (11,0);
\foreach \x in {0,…,11}
{\draw [line width=2pt] (\x,0)–(\x,0.3);
\foreach \x in {0,…,10} {
\node () at ($(\x,-.3) + (0.5,0)$) {\x};
%\fill [red] (0.95,.15) circle (2pt);
\caption{A grid with one particle at 0.95}
how about we say: It should contribute to its neighbouring bins according to its distance to the bin center. There are the two bin centers at $x_1=0.5$ and $x_2=1.5$ neighbouring the particle. We should pay attention that the total amount of density created by the particle stays the same. Let us for the time being focus on the assumption that a particle only contributes to two bins. So let us zoom in on the particle and its two neighbouring bins:
\fill [red] (2.85,.15) circle (7pt);
\draw [line width=2pt] (0,0) — (6,0);
\foreach \x in {0,3,6}
{\draw [line width=2pt] (\x,0)–(\x,0.3);
\foreach \x in {0,3} {
\draw [line width=2pt] ($(\x,0) + (1.5,0)$) –($(\x,-.3) + (1.5,0)$);
\node () at ($(0.5,-.7) + (1.5,0)$) {0.5$\delta$};
\node () at ($(3,-.7) + (1.5,0)$) {1.5$\delta$};

%\fill [red] (0.95,.15) circle (2pt);
\draw[line width=2pt,||] (2.85,.5) — (4.5,.5) node[above,pos=0.5]{0.55$\delta$};
\draw[line width=2pt,||] (1.5,.5) — (2.85,.5) node[above,pos=0.5]{0.45$\delta$};
\caption{A grid with one particle at 0.95}
The easiest way to distribute the particle density between its neighbouring bins is now to just measure its distance $d_i$ to its neighbouring bins, in the example $d_1=0.45\delta,d_2=1-\frac{d_1}{\delta}=0.55\delta$ and add $1-d_i$ to the grid point $i$. This way we make sure that the particle number is conserved (the total number added to the grid is 1, as for the histogram). In our example, the particle contributes $\rho_1=.55, \rho_2=.45$ to the grid bins 1 and 2.

Now expressing this in python and adding the bin width to calculate a proper density we write a function pics2gridpy(particles, left, right, bins). The function is called with four arguments:

particles: the Array of coordinates
left: The left edge of the grid (also the zeroth grid point)
right: The right edge of the grid (no grid point here)
bins: the number of bins (including the one at the left edge)

def pics2gridpy(particles, left, right, bins):
    leftIndexF = 0.
    binPosition= 0.
    for i in range(particles.shape[0]):
        grid[leftIndex % bins]+=1-binPosition
        grid[(leftIndex + 1) % bins]+=binPosition
    return numpy.array([numpy.arange(left,right,binwidth),grid])

Let’s try to run it on our example particles and again compare to the analytic density!

# because actually the method makes a circular grid, we need to add one bin
# to the left and right of the distribution
plot(bins,density,label="density (grid)")
plot(numpy.arange(9,9.6,.01),density(numpy.arange(9,9.6,.01)),label="density (analytic)")
pylab.legend(loc="upper left")

We see, this is much better. But one thing is notable: The spike for the bin at 9.0, where does it originate? In this region, the density of perticles per bin is smaller than one. Because of that you have to expect aliasing effects that can produce these spikes. In fact, normally you try to have at least a few tens of particles in the bins that matter to you because you want to have a good approximation of the real density. In our case we needed very few because we had a smooth distribution. In most cases however you sum random particles and therefore you expect some noisiness due to the random fluctuations of your distribution.

Now that we have the method let’s get an estimate of its performance for very many particles:

%timeit bins,density=pics2gridpy(x,0,1,100)

gives a time of 844 msec on my machine. The histogram however only takes 8.5msec. There has to be a way of optimizing this. Next post we will look at how fast we can make it using cython!

Backup strategy on untrusted FTP

Update: Before implementing this method you might want to look at duply which uses gpg to implement the same functionality but does not need large image files.

We own a server which is used to keep e-mail and other personal data off the commercial cloud. Partly because of paranoia, partly because of the enjoyment of maintaining our own server.

We run the server ourselves so we have to backup all the software and configuration to get timely recovery after catastrophic failure. If you do the same, depending on the availability you want you will probably be running on a hardware RAID. But: Even that can fail, be it through software failure fucking your file system good or through some bigger hardware blackout. If the server going down is home to your e-mail, your dropbox-equivalent and your website you want to be able to get it back up as quickly as possible.

For this goal, our hosting provider allows for 100 GB of backups on one of their machines. It is vastly less than the 3 Tb of RAID5-storage on our server, but roughly covers the amount of data really vital to operating the server. We’re talking mysql databases, mail boxes, config files, websites hosted etc.

The big caveat is that the 100Gb of free backup space can only be accessed through FTP. This means:

  1. It’s not encrypted; everything can be read by our hoster. Not really desireable when you just spent hours on end making your system as safe as possible with encrypted disks.
  2. It does not preserve file permissions.
  3. It is too slow for large amounts of files. (We tried to run rsync on a curlftpfs-mounted ecryptfs-overlaid FTP share and it was just not usable).

Creation of the encrypted disk image

We decided to do things differently: As a backup target we use a disk image file (~95Gb), one copy of which lives on the local disk of our server. This disk image is encrypted using cryptsetup. I used the following commands to create the disk image (they are needed only once)

 # create 95Gig file of zeros (takes time depending on disk speed)
dd if=/dev/zero of=/backupimage.dat BS=1M COUNT=95000
# format it as an encrypted disk image
cryptsetup luksFormat /backupimage.dat
# open a loopback device for said image
cryptsetup luksOpen /backupimage.dat backupvolume
# format the volume
mkfs.ext4 /dev/mapper/backupvolume
# make a directory for the "disk"
mkdir /backupvolume
mount /dev/mapper/backupvolume /backupvolume

Now the encrypted image is mounted in /backupvolume. You can toy around with it, write some data to it etc. When you’re done, you should unmount the volume and close the loopback device:

umount /backupvolume
cryptsetup luksClose backupvolume

The backup process

For the backup process I needed to automatize the process. Only automatic backups will be run regularly and only regularly running backups are good backups. To automatize, I need a foolproof way of starting and stopping the loopback device and uploading the data to the FTP server. For the time being I’m using three bash scripts:

1. Mount the encrypted backup volume

This script opens the loopback device and mounts the encrypted disk image to /backup. Note that the password for the crypted image is clear text. This is necessary because you want the backup to run automatically. Additionally, the script can only be read by root and once someone has root access to your server he can access all the data anyway.

if ! [ -a /dev/mapper/backupvolume ] # make sure loopback is closed
  echo "password" | cryptsetup luksOpen /backupimage.dat backupvolume
  if mount /dev/mapper/backupvolume /backupvolume/
      echo mount successful
      exit 0
    else # if we cannot mount we should close the loopback
      cryptsetup luksClose backupvolume
      echo "had problems mounting closed cryptloop device"
      exit 200
  else # if we cannot even open the loopback there was an error
  echo "error"
  exit 200

1.1. Back up

Now is the time you want to backup stuff to /backupvolume. Make sure you leave the directory after, otherwise the following will throw an error.

2. Unmount the encrypted backup volume

After backup you need to properly close the backup volume:

if [ -a /dev/mapper/backupvolume ] #make sure there is a cryptoloop
  if ! umount /backupvolume #try to unmount it
    then # throw error if fail
      echo "unable to unmount"
      exit 200
    else # otherwise go on and unhook the loopback
      if ! cryptsetup luksClose backupvolume
          echo "luksClose failed"
          exit 200
          exit 0

3. Upload the backup disk image to the FTP server

Once you have backed up to your image, the image file on your needs to be sent to the backup location. In our case that is an FTP server and if you want to use this method of backing up you are likely in a similar situation. For uploading data we use the following script:

MACHINE=<name of the machine>
ftp -inv $HOST <<END >ftp.log
user $USER $PASS
put /backupimage.dat backupimage-$MACHINE.dat
if fgrep "226 Transfer complete" ftp.log
   then exit 0
   echo "FTP upload failed"
   exit 200

In the next chapter: How to use these three scripts to backup your data automatically and be informed via e-mail whenever something goes wrong.

Raspberry Pi – Animated temperature plot

Raspberry pi with realtime temperature plot
Raspberry pi with realtime temperature plot

For the Christmas party yesterday I was asked to bring a large pot. The contents were supposed to be, as is customary in Germany mulled wine (Glühwein). Mulled apple wine in this case. Besides a pot we needed a hotplate which was brought by a colleague. Well, having mulled wine on a hot plate has one risk: You easily run your wine so hot you lose all the precious alcoho l. With it, usually also the fruity part of the flavour is gone. Enter the raspberry pi. I still had a foodsafe temperature sensor from a sous-vide cooking experiment I did recently. A quick search on the internet told me that 65°C was the temperature to go for, warm enough to have the Glühwein feeling, cold enough to preserve alcohol and taste.

In sous vide cooking the task of the raspberry pi is usually exactly this: You put a temperature sensor into a water bath, have the pi heat it to a specific temperature and keep it there for several hours. To do so switch on and off the heating for the water bath in with an appropriate frequency. Some months ago I experimented with sous-vide cooking and used the code I found here. It can be used for keeping Glühwein temperature just as good.

But if you are anything like me, you will find Glühwein of the perfect temperature tasty but not very exciting. What was needed was a display of temperature over time, so the raspberry could show off how it was doing a good job. As prerequisites I installed a current version of python (2.7) and matplotlib on the pi. Before installing matplotlib (you can get a current version using pip) you should check that you have the dev packages and libraries for tk / gtk, otherwise the graphical frontend you need for the realtime display will not be built.

To have the reatime plot display I use two scripts:

  1. A script to periodically fetch the temperature from the temperature sensor (DS18B20) and store it in a file (in this case hardcoded to /home/pi/Desktop/test.txt
    from subprocess import Popen, PIPE, call
    import time
    import re
    import numpy
    def tempdata():
        # Replace 10-000801d31e81 with the address of your DS18B20
        pipe = Popen(["cat","/sys/bus/w1/devices/w1_bus_master1/10-000801d31e81/w1_slave"], stdout=PIPE)
        result = pipe.communicate()[0]
        result_list = result.split("=")
        temp_mC = int(result_list[-1])/1000. # temp in milliCelcius
        if(re.match(".*YES.*",result) and temp_mC!=85.000):
            return temp_mC
            print result
            print "invalid result"
            return tempdata()
    while True:
            print temps[-1]
  2. A script to plot the resulting temperature, nicely and christmassy, with dark picture background and red writing.
    font = {'family' : 'sans serif',
            'weight' : 'bold',
            'size'   : 22
    import matplotlib
    import time
    matplotlib.rc('font', **font)
    matplotlib.use('TKagg') # if this fails install matplotlib AFTER installing tk. And install it using pip.
    import matplotlib.pyplot as plt
    import matplotlib.image 
    import numpy as np
    import matplotlib.animation as animation
    #from scipy.imread import imread
    img=matplotlib.image.imread("bg.png") # load background image. 
    limits=[-30,0,10,26] #define plot limits. In this case: -30 to 0 minutes, 10-26°C for the demo picture, there was no Glühwein.
    def main(): # everything in the beginning is just preparation of the plot
        fig = plt.figure(figsize=(8,6),tight_layout=True)
        for i in ax.spines.keys():
            print i
        plt.imshow(img, zorder=0, extent=limits,aspect='auto')
       # secondAxes.set_ylim((55,70))
        thePlot, = plt.plot([], [],color="red",linewidth=4,animated=True)
        ani = animation.FuncAnimation(fig, update_plot, frames=xrange(1000),
        # here we start the animation
    def readRetry(): # reads data from the text file, retries if there are problems
        while test:
               except ValueError:
            except Exception:
        return (x,y)        
    def update_plot(i, thePlot, theText): # this is the function that updates the plot for the "FuncAnimation"
        print i
        theText.set_text("T=%0.3f" % y[-1])
        return thePlot,theText

Now starting the scripts after one another (and adding your temperature sensor id to the first / your background picture to the second) you will get a nice animated temperature plot. If you press f in the window with the plot, it will go full screen.

Tune diagram in python

Resonance line diagram in python.

I am trying to move whatever I can to python. The main reason being that I want it to be free, and not require expensive commercial software. So recently I took the time to revisit my Mathematica code for the tune diagram and port it to Python. Basically I rewrote the whole thing and it has the nice feature that you can overlay it on top of whatever you want to you find my python code for plotting a resonance line diagram as the one found above.

The code should be self-explanatory, when you run the script directly with the option -o shows an example plot. When you import it, you can use the function plotTuneDiagram(maxOrder,xlim=[0,1],ylim=[0,1],tickOrders=False,tuneLineColor="black")

Parameters are

  • maxOrder the maximum order of resonance line to be displayed
  • xlim,ylim the intervals in horizontal and vertical tune in which to plot
  • tickOrders whether or not to place x- and y labels on full fractions up to selected order
  • tuneLineColor color for the tune lines, sensible when plotting on top of measurement or simulation data.

Server setup 1 (seafile vs OwnCloud)

Notice (27. Feb 2014): If you came here searching for “seafile lighttpd”: I wrote a comment on how to get it running on here, this article is only on my perceptions of seafile vs OwnCloud.

In the aftermath of the revelations about NSA spying, a few friends and I decided to buy a hardware (as opposed to virtual) root server together. We wanted to have a more private replacement for the majority of cloud services, most notably e-mail, Dropbox, rss reader and backup solutions.

Before getting the server I used a paid Dropbox account so for me moving away from paying Dropbox I would automatically save money to pay the server cost from. I needed paid because I kept around 30 gigagbytes of stuff on the Dropbox. My main requirement was to have a replacement that can handle large directories with tens of thousands of small files smoothly. I looked into two solutions:


I tried Owncloud, first in a virtualbox ubuntu on my local box. There it was rather ok, working with Dropbox-like performance. During the setup of the server I ran a copy on remotely and tested it with the contents of my Dropbox. I have a few sourcecode folders in there, which, including their git repositories amount to about 100k files, most of them very small. Over the internet, Owncloud was really slow. Analogue modem-like slow. In the end the transfer speeds were below 10kb per second. Really not what you want your 30gig full dropbox folder to sync with. The test folder of a few gigabyte (around 2) was still syncing after one day on a 10mbit connection. So it is not a problem of “getting up to speed”. The reason for this seems to be that Owncloud (currently) uses one request per file. And maybe these are not even parallelized, adding a lot of delay to the file transfers.


Seafile follows a different concept from most of the other selfhosted cloud storage solutions. Instead of a file-based system it uses a revision-based approach based on a modified version of git. If you know the github interface, you know the seafile (web-) interface. For seafile the test with the 100k file folder managed to max out my connection bandwidth. For me this clearly indicated seafile as the preferrable solution and it is what we are now running for “production”. I deployed it with lighttpd and wrote a short post on the seafile mailing list here.

Another plus of seafile is that it supports client-side encryption. This makes it interesting even for people who want to run it on untrusted hardware or on a hosted solution. The server never sees the plain text data after all.

Platform independent, crude timing of code

I work a lot on MacOS where I write code that I want to use on linux later. Unfortunately some of the normal timing functions are missing on MacOS. I don’t quite remember which. So I have some crude platform independent code for timing c++ code. I am aware it does not really count the CPU usage, but putting it around code you want to time you can still get a feeling for how much time is spent in that part of your code.

class pTimer {
 timeval t1, t2;

 double t;
 pTimer(): t(0){};
 void start() {gettimeofday(&t1, NULL);};
 void stop() {gettimeofday(&t2, NULL);
 t +=
 (t2.tv_sec -
 t1.tv_sec) * 1000.0 + (t2.tv_usec - t1.tv_usec) / 1000.0;};
 void reset() {t=0;};
 void restart() {t=0;gettimeofday(&t1, NULL);};

it is used like this:

pTimer t1;
<computationally intensive part of the code>
printf("took %f miliseconds", t1.t);

restart(); sets the timer to zero and starts it at the same time, using start(); again will add to the time already on the counter (useful if you want to accumulate time from different parts of your source in one timer).