Overview

The whole illusion runs as an app on the phone, there are no external devices used. The iPhone X introduces 3D head tracking (face tracking) using its TrueDepth camera. This allows for inferring the 3D positions of the users eyes.

Device Cam view: tracking the positions of the eyes using ARKit face tracking on iPhone X

Using the position of an eye and the device screen rectangle a non-symmetric camera frustum can be defined.

World Cam view: Non-Symmetric camera frustum defined by eye position and device screen

The frustum defines an off-axis projection that when used for rendering on the device allows for objects appearing in front of, and behind the screen of the device.

Objects can appear to be in front of the physical screen

Get the free app from the AppStore: TheParallaxView

Here is a YouTube video that describes the technique:

Details

Source code

If you are an experienced Unity + iOS developer you can download the source code and build the app to your iPhone X but if you just want to try the app you can grab it free from the AppStore: TheParallaxView

I cannot give support on how to use Unity and build the app. Only get the source if you know what to do with it.

Full source code is available here: TheParallaxView on GitHub
(Xcode 9.2, Unity and Unity's ARKitPlugin required)

Implemented in Unity with UnityARKitPlugin. The technique should easily transfer to native iOS apps and other devices.

Although you are welcome to use the code the point of sharing the code is more to show the technique to other developers than to serve as actual code building blocks. (Code is under MIT license, basically with attribution. Art assets are CC BY-NC, meaning they can be used non-commercially with attribution)

Mirrored view, arghh

The first problem I ran into is that with ARKit Face Tracking (at least Unity's version) everything is mirrored. This is not so strange since the front facing camera view is mirrored, because that is how we are used to see ourselves. But when trying to figure out the world position of an eye this led to many problems. I ended up programatically inverting the x position coordinate and the orientation quaternion rather than fighting with the transforms in the scene hierarchy.

// invert on x because ARfaceAnchors are inverted on x (to mirror in display)
headCenter.transform.position = new Vector3 (-pos.x, pos.y, pos.z); 
headCenter.transform.rotation = new Quaternion( -rot.x, rot.y, rot.z, -rot.w);

Non-Symmetric camera frustum and off-axis projection

When trying to figure this out my Google-Fu led me to this thread. They were talking about a display on a wall acting like a window, essentially the same problem. User "dorbie" says:

"Your difficulty (and it is shared by MANY), is that you assume that the view vector cannot be perpendicular to the viewing plane. However, reguardless of where the eye is w.r.t. the window on the wall, there is always a line towards the wall which is perpendicular to the imaging plane. Even if it does not fall within the window. Using this way of thinking about the problem the view vector is that line and the frustum is an asymmetric frustum relative to that line (the line intersecting at 0,0 on the near clip)."

Kudos to dorbie for helping me understand!

Imagine a plane on the device screen, that stretches out infinitely. The eye is just a point, so it is always possible to make a line from the eye to the plane that is perpendicular to the plane. The length of that line is the near distance. The rest of the frustum; left, right, top, bottom; can be somewhere else way off to the side. That is what off-axis projection means.

Now let's look at the code. First, the eye camera is pointed towards the device plane by using the rotation of the device rotated 180 degrees on the Y axis:

// look opposite direction of device cam 
Quaternion q = deviceCamera.transform.rotation * Quaternion.Euler(Vector3.up * 180); eyeCamera.transform.rotation = q;

Then, in order to find the near value for the frustum the distance from the eye to the plane defined by the device screen is measured:

Vector3 deviceCamPos = eyeCamera.transform.worldToLocalMatrix.MultiplyPoint( deviceCamera.transform.position ); // find device camera in rendering camera's view space

Vector3 fwd = eyeCamera.transform.worldToLocalMatrix.MultiplyVector (deviceCamera.transform.forward); // normal of plane defined by device camera

Plane device_plane = new Plane( fwd, deviceCamPos);

Vector3 close = device_plane.ClosestPointOnPlane (Vector3.zero);
near = close.magnitude;

It is helpful that the world space when using ARKit is in physical meters so it was fairly easy measuring the device screen size and setting up the rest of the frustum correctly. The numbers used here should be the same when using for example glFrustum() to setup the camera projection matrix. Note that these number are unique to the iPhone X and would need re-measuring for any other device. Also note that the position of the front facing camera is used as "origo" - that is where the device camera is. This might seem self-evident but some images I've seen define the center of the device as origo, which I found did not work well. The far value is chosen simply so the scene fits.

    left = deviceCamPos.x - 0.000f;
    right = deviceCamPos.x + 0.135f;
    top = deviceCamPos.y + 0.022f;
    bottom = deviceCamPos.y - 0.040f;
    far = 10f; // may need bigger for bigger scenes, max 10 meters for now
    // I left out a part here that visualises the frustum and moves 
    // the near plane closer to the eye
    // so rendering can extend in front of the device screen plane
    Matrix4x4 m = PerspectiveOffCenter(left, right, bottom, top, near, far);
    eyeCamera.projectionMatrix = m;

Inter Pupil Distance (IPD)

In this implementation the user sets the IPD using a slider. The default is 64mm, which is the average IPD for males. This could be improved by automatically measuring the IPD using ARKit face tracking or some other camera based method. The IPD can be set up to very unrealistic 150mm, this is so a camera can be placed next to the eye and record a correct view for demonstration purposes. Use the Device camera mode to set IPD.

For the current manual method, eye height relative to the face anchor may need to be added. And depth.

Selecting which eye to use

By default the app uses the right eye (left eye should be closed for best viewing experience). In settings the user can select which eye to use and there is also an Auto mode which uses ARKit's blend shapes to try to decide which eye is closed and which is open. It works pretty well but I still decided to use the right eye as default.

Why one eye only?

Since only one view can be presented to the user, only one eye can be used (monoscopic 3D). A future thing to try would be anaglyph (red/green) glasses and rendering both views for stereoscopic viewing. Polarisation techniques probably wouldn't work since the device can be moved freely. Timing-based stereo (active shutter) is another option that should work. Parallax barriers probably would not work that well, again since the device can be moved freely.

Some people I've shown the app to thinks "it work just as well when viewing with both eyes". That worries me a bit - is the illusion not working for them? For me personally, when using one eye that is correctly tracked the image really pops out. When using both eyes, it feels "3D" in a way but it doesn't pop.

Device orientation

The app currently uses landscape mode only. Getting the other orientations to work made my head hurt too much, there was already the problem of the mirrored view. The illusion however works in any orientation (just hold it in portrait for example and try), it’s just the controls (settings) that are locked to landscape orientation.

Scene "attached" to device

When Face Tracking ARKit (at least Unity's version) does not provide 6DOF tracking for device camera (Face Anchors are fully 6DOF tracked of course). It does provide 3DOF (orientation tracking) of the device camera but I found that just confusing. Instead the scene is imagined as completely "attached" to the device, which is a nice illusion.

Moving the device vs moving the head around

When holding the device in hand and rotating or moving it, ARKit can infer head position pretty well even if head is well outside camera view (it presumably uses the gyroscope and accelerometer). But when device is still, such as laid on table or otherwise affixed, it’s very sensitive and only a small range of motion is allowed. Use the Device camera view to figure out the allowed viewing positions when using the device affixed.

It seems like ARKit is a little primed to think your device is held in front of you and not above the eye height, which seems logical - because that is where you normally hold it! So tracking works best at eye hight and below.

Don't hold the device too close to your eye, ARKit seems to need at least about 30cm (12 inches) distance to track correctly.

On the correctness of this solution

Please note that there is no guarantee that this solution is perfectly correct. The end result works very well, the illusion really is there. That indicates it is correct. But the code is the result of iterative experimentation so even though the end result is correct there may be errors that end up cancelling each other out. If you find any errors, please let me know here in the comments.

Why is it called TheParallaxView?

It exploits the parallax effect - that objects closer to the observer appear to move more than objects further away when the observer (or the observed) is moving. The name is also a reference to the book by philosopher Žižek, who writes:

“The subject’s gaze is always-already inscribed into the perceived object itself, in the guise of its ‘blind spot,’ that which is ‘in the object more than object itself’, the point from which the object itself returns the gaze. Sure the picture is in my eye, but I am also in the picture.”

— Slavoj Žižek, The Parallax View

I don't pretend to fully understand this book by Žižek but I always liked the title. And in the app, the subject's gaze is directly used to generate the picture that is presented back to the subject.

Future work

Anaglyph and active shutter stereoscopic rendering would be fun to explore.
Automatic measuring of IPD / exact eye position relative to the face anchor would be useful and seems possible. If not, at least eye height and depth relative to the face anchor should be added to the current manual method.
Using the light estimation from ARKit (directional and/or spherical harmonics) could be cool.
If Apple/Unity add 6DOF camera tracking in the Face Tracking mode that would open up new possibilities. The scene would no longer have to be "attached" to the device. The code is (should be!) fully prepared for 6DOF movement of the device camera.
Using the back facing camera for transparency effects
Using ARKit without Face Tracking or even raw gyro+accelerometer data for a limited effect on devices that do not have TrueDepth camera. The head would need to be held still, but the device could move and rotate.
One possible direction for the app would be to allow 3D model import (FBX, OBJ, glTF) or even direct integration with 3D packages.
Fixed point / camera tracking mode where a fixed point / camera is tracked rather than an eye.

Known Issues

Sometimes the app freezes for a few seconds. This is due to Unity starting an "enlighten" thread. I haven't been able to turn this off. The app doesn't use enlighten (AFAIK) so would be nice if it didn't start.