Since the camera can be controlled through a device connected to the audio jack, the first thing that I did was to search the documentation (you can find them in the links section). It says that there are four functions available, and they are selected based on the impedance seen by the phone between the pins 3 and 4 of the audio jack. Action A (play, pause, unhook a call...) is selected just by short-circuiting both pins; action B (volume up) requires an impedance between 210 and 290 ohms; action C (volume down) requires an impedance between 360 and 680 ohms; finally, action D (reserved) requires an impedance value between 110 and 180 ohms.

Although the specification looked easy, unfortunately there is a detail that is not very clear: the microphone impedance.

The microphone must have a DC impedance of "1000 ohms or higher", according to the specification, but it doesn't specify a maximum value, so an "infinite" value could seem valid. Unfortunately that's not the case: you can't just leave the pins 3 and 4 unconnected while no button is pressed, or it won't work (or, at least, in my phone it won't work).

To fix this, I added a 2Kohm resistor between pins 3 and 4, thus emulating a microphone, and it worked like a charm.