Wednesday, April 20, 2011

Source Code Update

Today I worked on the code that is already checked in at CodePlex (
  1. Refactoring: I changed some Classes to Structs and added some Equals and GetHashCode implementations. And I'm trying to apply my learnings from "Clean Code: A Handbook of Agile Software Craftsmanship" while I'm reading this book.
  2. Better cluster separation (3 dimensional): If the hands are far enough from each other on the z axis the points are correctly assigned even if the x and y distance is close to 0
3D Cluster Separation

Tuesday, April 19, 2011

Clustering Settings

There are some settings in the class CCT.NUI.Core.Clustering.ClusterDataSourceSettings that you might have to adjust to your needs:
  • ClusterCount
    Maximum number of clusters. This should be as low as possible (Default is 2 ~ two hands)
  • LowerBorder
    Number of rows (in pixels) on the lower border that are not processed. My kinect is on my desk and usually I have stuff on it that gets in the way on the bottom part of the image. You can set it to 0 if you want to process the full y range
  • PointModulo
    Clustering gets more expensive the more points there are to distribute. This setting lets you configure to use less points (default is 5, this means only every 25th point is used: x and y coordinates % 5 == 0)
  •  MinimumDepthThreshold
    Only points farther than this are used in clustering (default is 500 millimeters)
  •  MaximumDepthThreshold
    Only points closer than this are used in clustering (default is 800 millimeters)
  •  MinimalPointsForClustering
    Minimal number of points that are within the threshold for the clustering to work (default is 50)
  •  MinimalPointsForValidCluster
    Minimal number of points that are in a cluster for the cluster to be valid (default is 10)
  •  MergeMinimumDistanceToCluster There are always ClusterCount clusters formed. Clusters are merged if the distance between the points that are closest to the center of the other cluster is smaller than this value
  •  MergeMaximumClusterCenterDistances
    There are always ClusterCount clusters formed. Clusters are merged if their centers are closer than this value

Monday, April 18, 2011

Going Open Source (partly)

I have put the first part of the code on CodePlex. It contains some data sources (depth image, rgb image and clustering). It can be found here:

It's not a final release, so it may and probably will change soon. There is also no documentation available yet, but it comes with a small demo application. If you have problems running it, feel free to contact me.

I'm still working on the hand and finger tip detection part but it's not ready to be published. It is based on the clustering data source though, so maybe that can be already useful in itself.

Sample Application

To run the sample application, you have to edit the file "config.xml" and replace the key in the following line with a valid PrimeSense vendor key: <License vendor="vendor" key="*****"/>
 EDIT: The sample runs on my pc without a vendor key

The depth segment that is used for clustering is set to 500 to 800 mm by default, please make sure your hands are close enough to the kinect (or you'll only see a black screen).

Monday, April 11, 2011

New Address for this Blog

The main URL of my blog has changed, please use:

( is still working)

Saturday, April 9, 2011

Center of the Palm (Hand Tracking)

Until now I was using the center of the cluster as center of the hand (the dark blue dot). But this is not ideal, because if you open and close your hand, the center moves up and down. The same happens if you rotate the hand or if more of the arm becomes visible.

The solution is to find the center of the palm, which is quite stable during rotating, opening and closing the hand (the light green dot). This is done by finding the biggest circle inside the hands contour. The center of this circle is in most cases the center of the palm.

This circle can be found by identifying the point inside the contour that maximizes the distance to the points in the contour line. The line through the center of the cluster and the center of the palm could also be used as hand orientation indicator.

Blue point: Center of the cluster
Green point: Center of the palm
Red points: Finger tips
White line: Convex hull
Yellow line: Contour
Red line: Distance from the palm point to the contour
Green circle:  Biggest circle that is completely within the contour

Wednesday, April 6, 2011

"Social" Interactive Image Manipulation

The maximum number of hands that the application detects is now configurable and theoretically unlimited. But due to performance and resolution constraints (fingers can't be detected if the hand is too far away from the kinect) it's already quite difficult to work with 3-4 hands.

Currently the hand should not be farther away than 100 - 120 cm from the Kinect to reliably detect fingers. Should the Microsoft Kinect SDK (or another driver, or even another device) offer better resolution, then hands and fingers could also be detected farther away.

Plans for the weekend
  • Find the center of the palm (in addition to the center of the whole hand cluster)
  • Improve hand and finger detection performance
  • Hand detection without fixed depth segment

Sunday, April 3, 2011


While refactoring the clustering and hand detection code, I noticed a drop in the framerate. I didn't add any new functionality, so I did some profiling (I'm lacking a decent .Net Profiler and won't pay 400 - 600 USD for one in a private project). Here is what I found:

I did add an additional method call inside the loop over the depth values to filter the value instead of putting the expression directly into an if. And I did move the width and height of the map into a Size field and used this as the boundary in the two for loops, plus I introduced a ClusteringSettings class where I placed some boundary values.
  1. if(IsValidValue(...)) instead of if(depthValue > minDepth ...)
  2. for(int x = 0; x < this.size.Width; x++) instead of for(int x = 0; x < width; x++)
  3. this.settings.MinDepth instead of ushort minDepth = ...
The following times were measured on my Intel Core i7-860:
  • Duration of the clustering after the refactoring: 8.9 ms
  • With local settings (undo point 3): 6.9 ms
  • With local width and height (undo point 2): 5.2 ms
  • Expression directly in the if (undo point 1): 3.8 ms
This means the refactoring had more than doubled the required time to iterate over the depth values. I had not payed enough attention to performance.

Kinects depth camera works with VGA resolution (640x480) at 30 frames per second, which means you have to process 9'216'000 depth values per seconds. It also means that processing must not take longer than 33.3... milliseconds per frame. Otherwise, because new frames come faster than you process them, you have to skip some frames to keep up.

The hand and finger detection currently takes around 12 ms per hand per frame. That's 24 milliseconds for both hands, plus the 8.9 milliseconds for the clustering adds to around 33 milliseconds. And that is where the frames start to get lost.

Of course the next task is to optimize the hand and finger detection code.