Simple & Cheap Webcam Indoor Positioning System
Purpose: Automatically get the (x,y,z) position of a moving object in 3D space.
Context: A large indoor space, and a desire to measure properties like sound or light or radiation or whatever, that depend on the position. And no desire to manually measure and record all positions.
Method: When two or more fixed webcams can see the moving object, its real-time position can be calculated. Any measurement at the object position can now be automatically annotated with its (x,y,z) coordinates.
Motion capture system? Similar but different: much larger scale, single point, absolute position, much cheaper, lower resolution required.
Accuracy: With modest webcams, expect about 10cm accuracy (all directions) at 10m distance.
Intended audience of this release: Tinkerers and experimenters. Software needs to be combined with various hardware. Soldering required.
Code quality: Reasonably okay. Can be much improved, but is fully operational.
Status: Works For Me [TM]. Unsupported. Abandonware. Please do not contact me, I will not have opportunity to assist you.
Pronounciation: Hard 'c', as in scale and screen. First part sounds like 'squid', last part sounds like 'tips'.
Sorry, there is no quick start. Really.
Before you can use this software, you need to make an object that can be tracked, and you need to setup webcams to track it. And the webcams must be properly characterized first. This is not a fast process. (But once all equipment has been organized, setup at any location will be easy enough.)
For a quick demo: see below for download and compilation and
dependencies. Once all is compiled, point webbrowser at
web/scwipsweb.html, and run the
web/test/start-full-demo
script to start several background processes.
See at top of that script for commands to stop all processes when done.
Most individual programs have a test/
directory with commands and
real-world examples to get you started.
ffmpeg
,
perferably portable and locally-powered;Install development library packages from your Linux distribution;
see Makefile
for more details.
Makefile
for workaround for 2.4.2 as shipped in
Debian 10, and for upstream 2.4.4)make
If make
reports any issues, solve them and repeat.
There is no make install
, just run programs from their own
directories. You will probably end up running various programs on
separate systems, and writing various starter scripts to get them
running.
Setup an MQTT broker (such as mosquitto
) on the local system, with
plain-MQTT as well as Websockets access. The various programs will by
default expect MQTT on port 1883 and Websockets on port 1884.
Now read further to know what to do.
SVG at doc/system.svg - PNG at doc/system.png - PDF at doc/system.pdf
All boxes in the figure are separate programs/executables or hardware pieces. Depending on the setup, it will make sense to run several programs on separate processors. "Stacked" boxes indicate that multiple instances may be (or need to be) operating in parallel.
The lower part of the figure shows the data flow when the system is in normal operation:
Before the object tracking [objtrack] can start normal operation, the camera locations and orientations need to be known for the specific scene that the system is operating in.
This needs to be done only once per camera system setup; and must be re-done completely whenever camera positions or orientations have been changed.
Before a camera can be used at all, it must be fully characterized so that real-world light rays will have a known correspondence to pixel locations. This needs to be done once per camera, as represented in the green part at the top of the figure:
Most of the system communicates by JSON messages over MQTT, including MQTT over Websockets for the webbrowser UI. Also all configuration files are JSON, which makes them appear somewhat rigid, but delivers high expressivity and easy parsing.
See doc/MqttTopics.txt for an overview of various MQTT traffic. Since MQTT has an arrival guarantee as long as connection is established, there is no request-response operation, but only publishing of new state. When disconnected and reconnected, synchronization is automatically established by re-publishing own state and publish-triggered re-publish of others' state.
To keep MQTT data neatly separated, all SCWIPS programs support specifying a freely choosable prefix topic (or topic-subtopic path). This allows creating a "namespace" for the particular SCWIPS operation and keeping it distinguishable from any other traffic on the MQTT system.
Before delving deeper into the system's components, it is useful to get some insight into the basic method of operation.
A system in which "multiple cameras see an object", may sound like a triangulation system, but no triangulation is actually done.
Only guesses are done, and guesses are adjusted. The basic procedure is as follows:
This same approach is used for camera characterization, for scene calculation, and for object tracking.
So the system never calculates the object position. It just guesses the object position until the guess corresponds best to the camera observations.
This has the particular advantage that only "forward" calculations need to be done, namely for light rays from an object at a known (guessed) position, entering a camera that is at a known (guessed) position and orientation, and has a known (guessed) focal length and distortion. This makes for an easy calculation to find the resulting camera pixel. The "backwards" calculation, from camera pixel back to 3D light ray, is never needed, and would indeed be quite impossible to do.
Another advantage is that camera observations do not need to correspond exactly. The 3D light rays as seen by all cameras do not need to cross each other at exactly the same point. In practice, they do not cross each other at all, and that is not needed either. Observation errors are treated the same as guess errors, and the procedure will automatically find the result where errors are minimal.
One notable disadvantage is that guessing and adjusting may take lots of time, especially when the starting guess is very wrong. This mostly concerns the camera characterization and scene calculation procedures, not the regular object tracking.
The mathematical method is non-linear least squares fitting, but used in
a rather more generalized way than the specialized function fitting that
is usually described in literature. (Library in cmpfit/
.)
The system is indended for cameras that see an object. At least two cameras are required to get a 3D position. More cameras can get a more accurate position, and/or cover more space. There is no limit on the number of cameras operating together, as long as corresponding compute power is available.
Some points of consideration for selecting a camera type:
The camera must be able to get its live video stream into a Linux system
so that ffmpeg
can pick it up.
The camera characterization procedure to map real-world points to image pixels can be very accurate, easily to subpixel detail. But it requires that the camera always produces exactly same result as it did before.
High resolution is not necessarily required; it can be compensated by object size. Proper focus is also not required, since detection will always use a weighted average to find the object center with subpixel accuracy.
Wide-angle cameras are highly advisable for their large area coverage.
Use a wireless setup when assembling a portable system to use at many different locations.
Since the object detection [blinkdetect] locks onto color and brightness, it will work best when these are similar in all circumstances. Whenever possible, set exposure, brightness, contrast, color balance, etc. to forced to manual settings, instead of automatic adjustment which will change depending on the environment.
v4l2-ctl
to query and set control values.The cameras must be installed with extreme stability. Especially the camera angle must not change during use, not even by half a millimeter, as that would loose much of the accuracy.
But before installation, the camera must be properly characterized.
The goal of camera characterization is to obtain a set of parameters that describe a camera in such detail that every real-world coordinate can be calculated into the corresponding camera pixel with minimal errors. In practice, single-pixel accuracy is obtainable even for quite cheap webcams.
The parameter calculation is fully automated, and the only thing it requires is one or more pictures taken by the camera, that have some number of pixels mapped to real-world coordinates. (The pixel-to-coordinate mapping is mostly automated as well, in many cases.)
It may seem unbelievable that just a snapshot of a known real-world scene tells everything we need to know about a camera. But it is true.
In the simplest case, given a camera with known settings (zoom, etc), and a snapshot containing just three points with known real-world coordinates, for example three points on a paper, there can be only one position where the camera must have been when it took that snapshot. When you try to recreate that same snapshot using that same camera, you will eventually end up at exactly that same position.
In a further simple case, given a zoom camera with unknown zoom setting, and a snapshot containing just four points with known real-world coordinates, for example four points on a paper, there can be only one camera position and one zoom setting to create that snapshot. (Except in the case of an exactly paper-perpendicular orientation.)
Any further points with known real-world coordinates can be used to determine whether the camera has a normal, or wide-angle, or fisheye lens; and how much the actual lens corresponds to the "ideal" lens, and which distortion parameters can best describe what the deviation from that "ideal" lens actually is.
This can all be derived, automatically, from a single picture with many (20+) pixels for which real-world coordinates are known. More is better, in the sense that better accuracy can be obtained. More known pixels per picture, but also multiple pictures to get even more pixels. Especially for wide-angle and fisheye lenses, it is hard to have any single object covering the entire picture, but it is easy to take many snapshots that each feature the same object in a different part of the picture.
There is one particular object which immediately provides many real-world coordinates at once, and also produces easily detectable pixels in any snapshot. This is the chessboard pattern (or checkerboard if you like). Especially the inner corners of a chessboard pattern are excellently visible thanks to the two intersecting lines formed by the contrasting sides of the adjacent squares. And being a flat plane with a highly regular pattern, measuring all coordinates is as easy as measuring any single square. Well, for optimal accuracy, it is best to measure a distance covering many squares, and then divide by the number of squares passed.
Automatic detection of chessboard inner corners is readily available
from the OpenCV library, and is used in the opencv-findcorners
program. For cases where OpenCV detection fails, pixels can also be
enumerated manually using any graphics editor; or especially the hugin
panorama stitching software which is built for creating and handling
pixel-to-pixel mappings.
Find or create a chessboard pattern that is totally flat and has high contrast. A real non-folding chess or checkers board may work well. Or print the pattern from files provided under camcalib/chessboard. Preferably use thick paper to prevent curling, and paste it onto a wall or table or other totally flat surface. The chessboard overall height variation should be less than 1mm.
The chessboard outer edges do not matter. Outer squares may be clipped, as long as all inner corner are still clearly visible.
The number of inner corners does not matter, although "many" are advised. Attempt to get a pattern with 5x5 inner corners (6x6 squares) or more.
The physical size of the chessboard does not matter, as long as it is clearly visible on the camera pictures. No problem if it is out of focus, since the inner corners can usually still be accurately determined even when the picture is somewhat blurry.
Measure the size of the chessboard squares, which is the distance
between the inner corners of the chessboard. Measure separately for
both directions, called "horizontal" and "vertical" for clarity.
Preferably measure the total length of multiple squares, and note the
number of squares passed. camcalibchess
will do the division for
you. Do not include any outer squares in the measurement, since these
may be clipped.
Use the camera to take several pictures of the chessboard pattern.
For USB webcams, you can use mpv /dev/video0
or mplayer /dev/video0
, and then "S" key to take a Snapshot. For other cameras,
it may be useful to record the video stream, then play it back using
mpv
or mplayer
and again use "S" key.
Take one picture with the chessboard filling almost the entire image.
Take four more pictures, with the chessboard at 1/4th image size, placed in each corner of the image and reaching approximately to the center.
For wide-angle and fisheye cameras, again take five more pictures: with the chessboard at 1/9th image size at the center of all edges, and at the center of the image.
The precise positions and angles of the camera do not matter. Do not use too much rotation, so that "horizontal" and "vertical" remain clear. Make sure lighting is adequate to provide high contrast between the squares.
For each of the pictures, use opencv-findcorners
to automatically
detect all inner corners. This program needs to be compiled
separately because of its one-time use, and requirement to install
the large openCV development library. See
camcalib/opencv-findcorners/ for
details.
opencv-findcorners
requires the expected number of inner corners
(horizontal/vertical) to be specified as commandline arguments.
If detection is fully successful, a resulting picture will be shown with all detected corners marked in red-green-blue sequence. Also a JSON list of pixel coordinates will be printed to standard output, ready to be used in the next step.
Detection may be upside-down, this is no problem.
For chessboards with an equal number of squares in both directions, detection may be sideways, i.e. horizontal and vertical exchanged. This can be fixed by manually editing the JSON result and exchanging the "realh" and "realv" labels.
If detection is partially successful, a pixel cordinate list may
still be printed, but probably in a totally wrong order. This may
be fixed manually. And/or re-try opencv-findcorners
while
specifying less expected inner corners, this will sometimes help
the detection algorithm, while leaving some pixels for manual
detection.
For any pixels that do not get detected automatically, manual
action is advised. See camcalib/hugin/ for using
hugin
to easily map multiple pixels. But also any picture editor
may be used that allows pixel coordinate readout.
Whatever the method, the output must be a JSON list of points that
map the chessboard corner number (horizontal/vertical) to the image
pixel. See camcalib/opencv-findcorners/test/test.json
for an
example.
Not all chessboard corners need to be included, only as many as are easily obtainable. The chessboard is just a convenient method to create many known real-world coordinates, and not more than that. The calculation will consider all points individually, and does not care about any spatial relation between any points.
Paste all pixel coordinate lists into a complete JSON file similar to
the example camcalib/test/c170-9-19.json
. At the top of the file,
enter the camera pixel size, and the measurements of the chessboard
horizontal and vertical square size. And be sure to add the closing
brackets at the end of the file.
The JSON file may contain coordinates from any number of pictures taken from any number of different chessboard patterns ("grids"). In practice, just one chessboard pattern, and a handful of pictures, will usually suffice to give highly accurate results.
For really special cases, any flat-plane "non-chessboard" set of
measured real-world points may be used by setting the grid scale
horizontal and vertical measurements to 1 meter, divide-by 1, and
then filling all individual points "realh" and "realv" fields with
the actual measured coordinates in meters with fractions. This
method is very labor-intensive and not recommended. Similarly,
non-flat-plane 3D real-world points may be used, but then the
camcalibplanes
source must be adjusted to be able to accept them.
Run camcalibplanes < input.json
and wait until it has tried all of
the various camera and distortion types. When finished, it will print
a JSON snippet with resulting parameters that provide the best
results. Paste that snippet into the blinkdetect
configuration
file, see below.
When running camcalibplanes
for very similar cameras, the camera
type name and distortion name may be forced in the JSON input file,
see the examples. This will speed up the process by skipping all
irrelevant atempts.
The camcalibplanes
program uses the "guessing" method as
discussed above. It starts by guessing and adjusting camera
positions for each picture, then proceeds to guess and adjust
camera positions and parameters for all pictures at once. In the
end, it will reach a set of camera positions and parameters that,
very accurately, calculate all image pixels beloning to all
real-world chessboard points. And since this calculation works
reliably for hundreds of points spread out over the entire image,
the expectation is that also all other points will be calculated
with great accuracy.
Repeat the above procedure for all cameras individually. Even though cameras may be the same brand and model, this procedure will easily correct for any minute differences in construction and mounting, to provide the most accurate results possible.
While camera video streams may be processed to detect most any object, SCWIPS comes with detection of a very specific object namely a fast blinking light, also called "blinker". This gives highly effective detection for a number of reasons:
(Of course, you are free to add detection routines for any other kind of object.)
You will have to design and build the blinker yourself. For testing, a simple flashlight might do, but that will quickly get you tired... This is where electronics and soldering experience come in.
Some considerations when designing your blinker:
camplacingvis
to
get some insight into camera placement and blinker visibility in real
situations. Where a 10x10x10cm blinker could possibly reach 20 meters
distance, a 20x20x20cm blinker would reach 40 meters.For a simple example oscillator circuit to drive the blinker, see doc/blinker-idea.pdf
When using SCWIPS to measure some environmental property at the blinker location, also consider that the sensor system will have to be physically attachable to the blinker (or vice versa), be locally powered, and be able to send measurement results via JSON/MQTT over Wifi to the scwips system.
The SCWIPS processing begins in the blinkdetect
program, which
receives an uncompressed video stream and detects the pixel(s) where the
blinker is seen. The pixel will be continuously reported via MQTT. See
blinkdetect/test/Testcmds.txt
for some examples.
The standalone blinkdetect
executable will normally require ffmpeg
(or similar) to decode the incoming compressed video stream.
Alternatively, blinkdetect functionality can be patched into ffmpeg
as a special output format, see blinkdetect/ffmpeg/
. This will
significantly reduce CPU load since no uncompressed video needs to be
transferred between two separate programs.
One blinkdetect
instance (plus one ffmpeg
instance) will handle one
camera. Whenever possible, it should run physically close to the camera,
so that no high-bandwidth video needs to pass the network.
blinkdetect
requires configuration via a JSON file. This file contains
the camera name and characterization parameters, as obtained from the
camcalibchess
program, discussed above. Also, it contains blink
detection parameters, and details for the MQTT server to connect to. For
the standalone executable, the configuration file name is passed as
command line argument; for the ffmpeg
-integrated variant, the file
name is passed by environment variable. See blinkdetect/test/
for an
example configuration file.
Blink detection parameters are as follows:
colorfilter
: R/G/B 8-bit values that specify the approximate blinker
color to look for on the camera video stream. This may vary for
different camera models. Only the color (hue) will be used, not the
brightness. Once the blinker has been detected, the colorfilter value
will be continuously adjusted within a limited deviation to best match
the brightest detected pixels. The adjusted colorfilter value will be
reported regularly on the camera's status
MQTT topic, and can be
pasted into the configuration file for future use.
blinkthres_min
: A 16-bit value (1-65535) that specifies the minimum
off vs. on brightness difference that the blinker must reach before
being detected. This mainly limits the maximum distance that the blinker
can reach. A too-low value will cause camera noise to be detected as
blinks, which should be avoided. Whenever the blinker is detected, the
threshold will be increased to match the blinker brightness and avoid
misdetection of blinker reflections. The actually used threshold value
will be reported regularly to the camera's status
MQTT topic.
focusarea_min
: The minimum radius in pixels around a detected blink
for which the video stream will be processed. Whenever the blinker is
detected, large parts of the video images can be excluded from
processing, saving much CPU load. However, any movement will still need
to be covered by the processed image area, so the specified value should
be large enough to cover any motion that can be expected within one or
two blink times. Whenever no blinker is detected, the area will be
quickly enlarged to enable searching in the full camera image. The
actually used focus area, and other details, are posted continuously to
the camera's high-traffic debug
MQTT topic.
fps_min
: The minimum value for number of input video frames received
in any second. Whenever less frames are received, blinkdetect
will
terminate itself, to allow for an external sleep-retry loop to attempt
resetting the camera and restarting processing. The actual number of
received frames will be posted every second to the camera's status
MQTT topic.
Blink detection begins by filtering any received video image for the expected blinker color, and then finding brightness differences with respect to one or two previous images. This is done only for pixels inside the actual focus area to avoid wasting compute power. Any detected brightness difference must be greater than the actual brightness threshold and "near enough" to the previous detection; else it is assumed to be a misdetection and nothing will be reported. These criteria are maintained with floating thresholds that are gradually relaxed as long as no blink has been detected.
Whenever a blink is detected, it will be reported on the camera's pix
MQTT topic. Both "on" and "off" blinks will be reported, so possibly at
twice the frequency of the blinker. Also any detection timeout will be
explicitly reported as lost fix.
The blinkdetect
program, and also ffmpeg
, will terminate once the
input video stream is interrupted. For robust operation, an external
sleep-restart loop is advised. To handle any intentional system shutdown
such as commanded via MQTT, blinkdetect
can create a flag file as
specified by the shutdowncmd_touch
configuration option. An external
sleep-restart loop can check for this file's existence to really
terminate when ordered.
During normal operation, it is no problem to temporarily stop and
restart blinkdetect
for one or more cameras. Of course, blinker
position as calculated by objtrack
may temporarily get lost when not
enough cameras are reporting. But all operations will be resumed
automatically once blinkdetect
has been restarted.
Whenever objtrack
has a successful 3D position fix for the blinker, it
will calculate where every camera should be seeing it, and report that
back on the camera's calcpix
MQTT topic. Any camera that does not see
the blinker because of some occlusion, will use this notice to limit its
focusarea as if it did have a successful detection itself.
blinkdetect
does not use any camera parameters as obtained from the
camera characterization, but merely acts as gateway to report them on
the camera's info
MQTT topic to be picked up by the scenecalc
and
objtrack
programs.
To allow for easy camera setup/alignment and visual inspection,
blinkdetect
accepts commands to post raw binary images from the video
stream to MQTT img
subtopics, both for original and color-filtered
images. These commands will normally be posted automatically by the
scwipsweb webbrowser UI, once per two seconds, in round-robin fashion to
all known cameras. This will cause considerable bandwidth use for the
MQTT traffic. The raw images are not immediately usable, and an
imgproc
program should be waiting to receive the images, convert them,
and store them as files for retrieval by the webbrowser UI.
Some tips for setting up a number of cameras to watch some scene:
Already mentioned above: make sure that cameras will never move or rotate: fixate them completely.
Putting cameras in corners of a rectangular area will usually provide better area coverage than putting cameras at the center of walls. This holds even for 180-degree fisheye cameras, since these will lose so much resolution at the far edges of the picture that only a 120-140 degree view angle will be really usable.
Small-angle webcams can be placed immediately next to each other to
provide a wide-field view. Be sure to have some overlap between the
camera images to allow for proper hand-over. The regularly updated
images in the webbrowser UI will be helpful to achieve that. Note that
cameras need not be exactly aligned to each other, this will be
automatically calculated by scenecalc
.
Whenever an object is close to the exact straight line between any two cameras, the relative distance to either camera will be undetectable. If possible, have a third camera to provide an additional viewpoint.
The camplacingvis
program can be used to evaluate camera placement in
advance. It will read a configuration file from stdin and produce a
figure of the camera reach troughout the scene. The configuration file
can contain any number of cameras with parameters, and any number of
coordinates where an identical copy of the camera will be positioned.
Further, the blinker object size needs to be specified, and the minimum
number of pixels that each camera requires for successful detection. See
examples in camplacingvis/test/
. In the output figures, areas where
accurate 3D positioning is desired, should be covered by two or more
cameras.
The visualization may be somewhat inaccurate because it considers the blinker as a rectangular box in neatly axis-aligned orientation. It will report a blinker at 45-degree view to be detectable from further away than at frontal view, because it would appear to be 1.4 times wider when viewed at an oblique angle.
Also, the visualization will not show that two cameras are insufficient to obtain 3D positioning on the straight line between the cameras. And it has no way to evaluate visual obstructions (occlusions) that might be in the scene.
The scene coverage may be visualized at runtime by using the collect2d
and render2d
programs that produce figures of positioned data. In
particular, simply visualizing the reported position in the X/Y plane
with Z as color, will provide a quick overview of the situation.
For multi-camera setups, it may be instructive to visualize the reach of each camera separately. This can be done by treating a camera's blink pixel reports as sensor data stream and annotate it with the calculated object coordinate: this will produce a coordinate stream only whenever the camera is actually detecting the blinker.
mqttpositioner \
--datatopic scwips/cam/NNN/pix \
--must-have-key pixx \
--positiontopic scwips/objtrack/result \
--pubtopic scwips/cam/NNN/pix-position \
--shutdowntopic scwips/cmd
The camera's annotated pix-position
topic can then be visualized by
collect2d
and render2d
as before.
When a number of cameras have been set up to watch some scene, and a
blinker object is moving around in the scene, the blinkdetect
programs
may detect blinker pixels for each camera. But the blinker's 3D position
can only be calculated once all cameras have a known position and
orientation.
The function of the scenecalc
program is to find all camera positions
and orientations with great precision, as calculated from an input list
of observations that are easy to obtain. Specifically, the following
observations are useful:
The scwipsweb
webbrowser UI provides easy means to register blinker
position "snapshots", called reference points, and to enter height and
distance measurements.
Heights and distances can be quickly measured using an affordable laser distance measurer. Lacking that, a tape measure can be used as well, but accurately measuring straight distances through free space will then prove to be somewhat challenging.
For measuring distance from one blinker position to a previous blinker position, it may help to leave a "stand-in" object at that previous blinker position. Preferably a white/reflective object with similar size, to allow for reliable results using a laser distance measurer.
Each height or distance measurement should approximately reflect the center of the blinker object, and/or the center of the camera lens. Inaccuracies of a few centimeters will be handled just fine.
Never shine a laser light towards a camera: as soon as laser light enters a camera lens, all of its energy will be focused on the image sensor chip which will most probably destroy part of it. Instead, always measure distance starting from the camera and pointing towards the blinker or stand-in object.
The following procedure can be used to register scene reference points
and measurements in an orderly fashion. Interaction with the SCWIPS
system is done via the scwipsweb
webbrowser UI, particularly its
right-hand sidepanel (which can be collapsed once setup is finished).
SCWIPS considers 3D space with conventional coordinates, having X-axis and Y-axis in the floor plane, and Z-axis pointing upwards. Visualizations in 2D will also be generated with conventional coordinates, having X-axis pointing to the right, and Y-axis pointing upwards. Before starting, decide where you want the origin (X=0, Y=0) and X-axis to be in the scene, namely in such a way that the generated visualizations have a useful/recognizable orientation. The X-axis may be along a wall, but only if the wall is in full view of at least two cameras opposite that wall; then have the "origin" location not entirely in a corner but about a meter out of it. Of course, it will also be fine to have the X-axis be any other straight line in the middle of the area. In particular, it is very helpful if the X-axis has no obstructions, so that it is easy to measure the distance between the "origin" location and another point far away on the X-axis. Preferably, the "origin" blinker location should be in a stable position such as on a table or a high stool.
Whenever possible, have the blinker active and visible by all cameras at
all times: this will cause each blinkdetect
program to use the least
amount of processing power. Switch on the blinker before starting the
blinkdetect
programs.
Place the blinker at the intended "origin" location (X=0, Y=0). In the webbrowser UI, use the camera images to verify that the blinker is correctly registered by all relevant cameras (need not be all cameras, but at least two). Then click "Add Current Point" button. The new reference point "R0" will be added below the camera list. This is a "snapshot" of that blinker position as seen by the cameras.
Measure the height of the blinker center above the floor in meters, and enter the value into the "height" field next to the "R0" name. (No enter key or submit button required.)
Set the blinker somewhat aside and replace it by a stand-in object.
Now go to any camera that had a proper blinker detection. Measure the camera height above the floor, and enter it into that camera's "height" field.
Note: Be careful not to touch the camera! Once it has been fixed, it must not move, not even a millimeter.
Also measure the distance from the camera to the (stand-in) blinker position. If needed, measure some additional centimeters, because the center of the blinker is intended instead of its outside size. In the webbrowser UI, select the camera name and the "R0" reference point name in select lists, click "Add Distance", and enter the measured distance into the field that has newly appeared.
For two other cameras (if present), repeat the same height and distance measurements.
For all other cameras (if any are left), at least height measurements would be nice to have, if possible. Whenever the floor is not immediatly accessible beneath a camera, do not attempt to measure its height.
Leave the stand-in object at the "origin" position, and set the blinker as far as possible in the positive X-axis direction that you decided on previously. Height should preferably be somewhat close to the "origin" position height, but need not be equal. Verify that at least two cameras correctly detect the blinker, then click "Add Current Point" button to add it as "R1" reference point.
Measure the height of the blinker (center) above the floor in meters, and enter the value into the "R1" height field.
Measure the distance of the blinker to the stand-in object that is still at the "origin" position, and add it as R0-R1 distance.
Lift the blinker as high as possible, somewhere above the R1 position, and register that as new reference point. No further measurements needed.
Take the blinker to two other positions that form somewhat like a horizontal square when combined with the R0 and R1 reference points. On these two positions, register new reference points for both a low height and a high height.
Finally return to the origin position, and add another reference point somewhere high above it.
After these steps, there now should be eight reference points that, when taken together, form something like an approximately rectangular box in 3D space.
This should be enough for many common situations. The following steps will provide a checklist and further guidance for more complicated setups.
Verify that at least three, but preferably four/five heights have been entered. If there are too few heights, add another reference point, seen by at least two cameras, and measure its height.
Verify that at least one, but preferably two/three distances have been entered. If there are too few distances, add another reference point, seen by at least two cameras, and measure its distance to the stand-in object that is still at the origin position.
For every camera, verify that it has reference points spread out over the entire image. (Use the mouseover function of the reference point list.) When some side or corner of the image has no reference points at all, take the blinker to that area and add some more reference points; make sure that each point is also seen by at least one other camera. Vary between low and high height points.
When done, all cameras should have at least four reference points that are at widely different pixels, and not on a single line.
When cameras are multiple steps away from each other, through multiple corridors or areas, add eight reference points in "box" shape between each of the camera sets to make sure they will be rigidly and accurately connected together. Also, add distance measurements from cameras at all sides to one or two in-between reference points.
If there are any "combined" webcams at the same location but with different orientations to provide a wide-angle result, make sure they have somewhat overlapping views. Take the blinker to the overlap area and add three additional reference points (low/mid/high) that are seen by both of the "combined" cameras and also by at least one other camera from another angle. Keep the blinker at some distance from the "combined" cameras, to avoid misdetection of the real center of the blinker.
Finally click "Submit to Scenecalc" to start the scene calculation. Sit back, and wait a while.
Further notes:
Reference points may be deleted from the webbrowser UI, but their names will not be re-used. This does not matter. The two lowest-numbered reference points will have special meaning (origin and X-axis), even if they are called something else than "R0" and "R1".
Height and distance measurements need not be done exactly as specified in the above procedure, as long as there are at least three height measurements (preferably four/five) and at least one distance measurement (preferably two/three). Measurements should be approximately centimeter-precise; any inaccuracies will be evenly distributed in the calculation result.
A distance measurement between first and second reference points is highly advisable since that will also be used to fixate the positive X-axis direction. When such measurement is not available, there will be two possible solutions, with the second reference point located on either the positive or the negative X-axis. If the negative-oriented result is reached, "upside-down" pictures will be generated. Verify the R1 coordinates in the detailed scene calculation results, and try re-submitting in case the "wrong" result has been reached.
During registering and measuring, all reference points and
measurements will be stored only within the webbrowser's page memory.
Do not navigate away from the SCWIPS page, else all points and
measurements will be lost instantly. Once the scene calculation has
been started, the points and measurements will have been transmitted
via MQTT, so hopefully logged by mqttlogger
and stored for re-use in
emergency cases, see under mqttlogger/repost/.
The only function of the scenecalc
program is to calculate camera
positions and orientations according to a set of observations. Before
and after that, scenecalc
will just sit there and do nothing. Well, it
will always listen for cameras announcing their parameters; and it will
stay around to re-post the calculation results in case objtrack
would
ever get restarted. But that's it.
scenecalc
gets to work after blinker position "snapshots" have been
registered as reference points, heights and distances have been
measured, and the webbrowser UI submits that entire list to the
scene/observ
topic.
Using the process discussed under "Magic" above, it will just start with a random guess for all camera positions and orientations and all reference-point positions, then calculate what the cameras would be seeing, and adjust positions and orientations to give progressively "less wrong" results. After several hundered steps, this should provide a result that is mostly "not wrong".
However, this often does not work: many combinations of starting guesses
will lead to adjustments that keep going into wrong directions forever,
or reach a result that is still very wrong but no adjustment is able to
make it less wrong. Whenever such a situation has been reached,
calculation will be stopped and re-started from a different set of
random guesses. In practice, it will take some 20-100 restarts before
reaching the real solution. scenecalc
will just keep trying until an
acceptable result has been reached, and will then post it to
scene/result
. Or, in case it really takes too long, the webbrowser UI
has a button to stop it. As long as scenecalc
is attempting
calculations, it will post some progress information to scene/+/log
topic.
When many cameras are used, it will usually be possible to disregard any
one camera and still get a proper calculation result. And also
repeatedly, until just two cameras are left. scenecalc
exploits this
process in reverse, namely by starting with just two selected cameras,
and gradually adding more. The selection process will always attempt to
add a camera with high expectation of success: initially cameras that
have origin and X-axis reference points, followed by cameras that are
well-connected to the previous subset. Once any subset has reached a
correct calculation result, further attempts can build upon that, and
will need random position guesses only for one extra camera and a small
number of extra reference points. This makes calculations progress
rapidly even for many-camera installations. Camera selection is
implemented using a random process to choose which camera to add, or
remove, or to restart completely; and is biased to try common (quick)
cases often, and increasingly uncommon (slow) cases with decreasing
chance.
The calculations are executed in a single thread, using a single
processor core. To get the result faster, it makes much sense to run
multiple scenecalc
instances in parallel to use as many CPU cores as
are available. The separate instances do not need separate
configuration, only different instance names for tracable MQTT logging.
They will all be listening for the same scene observation list as posted
by the webbrowser UI; but they will be using different random starting
guesses so that chances of early success will be multiplied. Once any
instance has reached the correct result, it will immediately instruct
all other instances to stop calculating.
In many situations, running multiple scenecalc
instances on a local
system will suffice, and will often reach the correct result within a
few minutes. But for really large installations with very many cameras
and reference points, it could make sense to run very many scenecalc
instances on external systems such as hosted VPS services. This is
easily possible since they only need low-traffic MQTT connection to the
SCWIPS system; see doc/MqttTopics.txt for some
details. Be sure to properly set up any MQTT link so that it will never
be accessible by unauthorized parties: the SCWIPS programs have no
access control built in.
Once scenecalc
is finished, the detailed results are shown at the
bottom of the scwipsweb
webbrowser UI. Always check that the results
are approximately correct. In particular, incorrect results will often
show reference points with very large coordinates that clearly fall
outside of the actual area. Or heights are calculated as negative, i.e.
inside the floor. See below for approaches to try in such failure cases.
Once a scene calculation result is available, the blinker position will
be calculated continuously by the objtrack
program, and the tracking
result will be shown by a light-red pointer on the camera images in the
scwipsweb
webbrowser UI. If the scene result is correct, the red
pointer will always closely match the green blinkdetect pointer.
Conversely, if the scene result is incorrect, there will always be areas
where the red and green pointer are very far apart for at least one
camera. If you suspect a calculation failure but are not quite sure,
just move the blinker to all edges of the area and closely watch how
well the red and green pointers stay matched.
It may sometimes happen that the scene calculation produces mostly
acceptable results except for a single reference point. In the
scwipsweb
webbrowser UI, select that reference point's height field so
that its position markers are shown on the camera images. Try to move
the blinker to that position again, and you will probably notice that it
is impossible to reach. In other words, the reference point itself was
misdetected. Simply delete the reference point, add a new one that is
correct, and submit to scene calculation again.
It is quite common that scenecalc
will reach a solution that is not
realistic, but it has no way to know what is realistic and what is not.
During the guessing process, it can only try to get the total error as
low as possible, and it will accept any result with total error less
than the "sumsquares limit" value which is settable from the scwipsweb
webbrowser UI. The actual resulting "sumsquares" value is reported in
the calculation results. Whenever a clearly incorrect result is reached,
inspect the resulting "sumsquares" value, then set the limit value
slightly lower than that, and re-submit to scene calculation. This will
force scenecalc
to skip these wrong results and try harder to find a
better one.
In case scenecalc
is not able to reach a correct result at all for a
very long time, these are some actions to take:
0.050
.This last option applies specifically whenever any camera position or orientation was changed, by whatever small amount, during the scene registering procedure. Any result that contains errors of more than a few pixel sizes, will cause the calculation to fail.
Another failure cause is when some blinker position was misdetected due to reflection by mirrors, windows, or even walls. Always carefully check that reference points have been correctly registered.
As soon as scenecalc
has posted its result on the scene/result
topic, the objtrack
program will start combining the detected blink
pixels with the calculated camera positions to obtain the blinker
position in 3D space. This will produce a continuous stream of
coordinates on the objtrack/result
topic, until the blinker stops
blinking or cameras stop reporting.
The internal processing is again as discussed under "Magic" above: the blinker position is guessed and the pixels are calculated where all cameras should be seeing it; then the guess is adjusted until the calculated pixels match the actually reported pixels. This is usually very fast and very accurate. The object position will be displayed in the webbrowser UI, and can be used for other purposes.
Also, the objtrack
program will report the "expected" pixel locations
back to each camera. This will help the cameras to focus on the correct
area, especially when they are temporarily not seeing the blinker
because of obstructions. The expected locations will also be shown on
the scwipsweb
webbrowser UI, indicated by light-red pointers on the
camera images.
The intended use of the SCWIPS system is to provide 3D position
information to measurements that are taken at the blinker location (i.e.
using a sensor attached to the blinker). The measurement samples are
expected to be available on some MQTT topic as JSON objects in
real-time. Depending on the sensor and processing, any suitable way may
be used to achieve that. For example, mosquitto_pub
may be used from
shell environments. Ideally, an actual measurement result should be
posted multiple times per second, since that allows for fastest movement
throughout the area.
The mqttpositioner
program is a generic tool that will receive any
JSON object on some given MQTT topic, then wait at most a second for a
reported position, finally re-post to another MQTT topic that same JSON
object with "x"
, "y"
, and "z"
coordinates inserted. This can then
be logged or visualized in any way desired.
To provide some basic filtering, mqttpositioner
can be instructed to
accept measurement data only if it does (or does not) include some
particular JSON key. This may be used to select between different
measurements posted to the same topic, or to choose whether "invalid"
measurements do (or do not) make sense to report with annotated 3D
positioning.
The collect2d
and render2d
programs can be used to generate
runtime-updated 2D planar ("flat") visualizations of any data, exported
as image file or .csv data file. Multiple collect2d
programs may run
in parallel to collect (remember) incoming data into data sets. One
single render2d
instance can pick up the binary data sets as posted by
collect2d
, and produce beautiful image files from them.
Rendering is implemented as a separate process since it tends to use very much memory and processing power. Running just one instance will cause all renderings to be run in sequence, using the same memory area, and spreading the CPU load.
Seperately from rendering, there may be many data collector processes running in parallel. These do not use much memory or processing at all.
A collector program is used to keep memory of a stream of real-time data points posted to MQTT topics.
The collect2d
program, in particular, keeps memory as an X/Y plane,
with potentially multiple separate data values per X/Y coordinate. The
X/Y coordinates are organized into "cells" with a specifiable size
(interval), and incoming X/Y coordinates are rounded to the nearest
"cell". Multiple values for the same cell may be chosen to overwrite the
cell immediately, or update exponentially, or keep minimum or maximum
for that cell.
Data can be taken from any MQTT topic, also outside of the scwips topic hierarchy, and must be supplied as JSON objects. The json object labels to be used for the X and Y coordinate, and any data items, may be specified freely.
For example, X/Y coordinates may also be from "y" and "z" labels, to obtain a side-view effect. Combined with "x" as (further unused) data item using "minimum" replacement strategy, a "nearest data side-view" effect can be obtained. For rendered output, the X/Y coordinates are plotted in usual fashion with positive X pointing to the right, and positive Y pointing upwards.
Multiple data items may be gathered from the same data stream, and they will be organized as separate data planes using the same X/Y coordinates. All data items will be updated together, or none at all. This is to make sure that consistency is maintained when using multiple planes in a single rendering as, for example, color, size and direction values.
Further, there is some basic filtering functionality to only consider values in specific ranges, or only accept values if some other JSON label is (or is not) included in the received data object.
As a special data label, autogenerated:time
may be used. Instead of
registering a data value from the incoming MQTT message, that plane will
then register the time of arrival. Renderings will usually convert this
to a "time-ago" value, so that a history trail of blinker movements may
be shown.
Since received data will be positioned in 3D space by means of blinker detection, it will often be quite sparse. For ease of interpretation, a rough interpolation method is available to show approximate shapes between the measured data points. Whenever the interpolation result shows large differences in a small area, it is advisable to obtain additional data points from that area to enhance detail and accuracy. The interpolation algorithm is described in collect2d/doc/interpolation.html. To maintain consistency between data planes, interpolation will always be applied to all of them together.
The scwipsweb
webbrowser UI will regularly instruct all collector
instances to post their actual data on MQTT. Data is posted in binary
format, with a JSON header containing instructions for the rendering
process. To ease configuration, settings for both collector and renderer
are combined into the collector's configuration file. In particular,
this allows separate collector instances to instruct the renderer to
create totally different renderings.
The render2d
program receives binary data dumps from collect2d
programs via MQTT, and produces renderings from that data as instructed
via the included JSON header (passed-through from collect2d
config
file). Only one rendering is processed at once, to preserve memory and
CPU load. On the commandline, MQTT topics can be specified for the
renderer to listen on, which also allows using multiple renderers for
separate (or same) topic subsets.
As the name implies, 2D "flat" ("planar") renderings are produced from corresponding binary data dumps. As discussed above, data is organized into rectangular "cells" with specifiable width (X) and height (Y). There may be multiple planes of data that use the same X/Y cells, such that an X/Y cell either has data for all planes, or has no data for any plane. It is required to specify which plane(s) to use for any rendering; complex renderings may require specifying multiple planes to provide values for color, size, direction, etcetera.
To obtain a view with approximate "height lines", any data plane may
have a rounddata
parameter specified in the collect2d
config file.
This will not be used by collect2d
itself, but may be used by some
render2d
render types to round data values to entire multiples of the
given number. This is most effective when combined with interpolation,
and allows for quick determination of approximate data value differences
by counting the discrete value steps between them.
Several rendering types are available:
mathgl-tile
: Generates .png image showing X/Y graph with colored
tiles representing the data value per cell.
csv
: Generates .csv spreadsheet file with X/Y as rows/columns and
data values in cells.
Image and data files will be placed in a directory specified by the
--pubdir
parameter, with filenames derived from the render "result"
name as passed-through from the collect2d
config file, plus render2d
instance name as specified using --name
parameter. Produced files will
be announced on an MQTT subtopic corresponding to the binary data topic,
to inform the scwipsweb
webbrowser UI that new content may be shown.
You can also manually copy the files over for safekeeping and/or offline
inspection.
Rendering is a very CPU- and memory-intensive operation, and it is
advisable to run all instances at high nice
levels. The scwipsweb
webbrowser UI will request renderings (or rather collect2d
data dumps)
separately in sequence, to spread the load.
The imgproc
program will receive raw PNM or YUV4MPEG images as posted
in binary format to MQTT by any camera on request (normally in sequence
by the scwipsweb
webbrowser UI). imgproc
will convert the raw image
data to a .jpg file using cjpeg
or ffmpeg
commands via shell
execution, and place the file into a specifiable directory that should
be available via file://
and/or http://
from the webbrowser UI. Once
the conversion has been completed successfully, imgproc
will post the
image file location to an MQTT subtopic corresponding to where the
binary image had previously been received on.
When no images seem to be converted, first verify that binary images are
actually posted to the expected MQTT topics. Then check that cjpeg
and/or ffmpeg
commands are available, and that the indicated
--pubdir
directory exists and is writable. Any error messages will be
posted on the MQTT imgproc/+/debug
topic.
The design allows for multiple image converters to be operational at
once, for example to allow conversion for subsets of cameras and/or
image formats, or to serve subsets of users. This can be organized by
having specific receive topics as commandline arguments. However,
running only a single imgproc
instance will probably be sufficient in
most cases.
The scwipsweb
single-page webrowser UI is the main means of
interacting with the SCWIPS system.
Interaction mainly consists of "snapshotting" reference points and
entering measurement values for scene calculation, as discussed above.
After scene calculation, the right-hand side panel may be collapsed
using the [>]
button at the top.
The main page content is useful for quick inspection and validation. It
will show images from all cameras, including markers for detected and
calculated object positions, and all visualizations that have been
generated. For cameras, it will also show a number of operational
parameters. Normally, scwipsweb
will request camera and render images
in round-robin fashion, one at a time with some delay between them, to
limit MQTT bandwidth and processor usage to reasonable values.
Images can be shown full-size by clicking on them. Whenever any image is shown in full size, an update for it will be requested up to twice per second, depending on the available processing speed. This is useful for evaluating camera placement and orientation.
For cameras, a checkbox is available to choose between the "original" camera image, and the color-filtered version. The latter is a greyscale image that should show only the blinker and not much else, if the blinker was on at the moment that the image was requested. The color filtering is done only for the current area of interest around the blinker location; and outside of that area, there wil usually be remnants of older data.
The scwipsweb
UI further offers:
A convenient way to shutdown all other parts of the system, at the
bottom of the (collapsable) side panel. This just posts the shutdown
command to the top-tevel cmd
MQTT topic, and expects every part of
the system to react appropriately.
A simple interface to post any textual message to any MQTT topic. This may be useful to add custom entries to MQTT traffic logs, or to send specific commands that are not otherwise available from the webbrowser UI.
A detailed list of results received from object position and scene calculations. These may be useful for inspection and validation.
The mqttlogger
program is a simple tool, similar to mosquitto_sub
,
that receives MQTT traffic on any number of topics and writes it to
standard output. The notable difference is that the result is intended
for later processing: it includes nanosecond timestamps, strict
formatting, and C-style \xHH
escapes for any non-printing byte. Also,
overly long topics and payloads will be excluded.
In the mqttlogger/repost
directory, some scripts are included to
showcase binary-safe re-posting of previously logged traffic.
The SCWIPS programs and accompanying documentation, including this document, are Copyright © 2021 J.A. Bezemer
These programs are free software: you can redistribute them and/or modify them under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
These programs are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
The GNU General Public License version 3 is available in doc/GPL-3.
For convenience and/or reference, the source code archive also contains code by different authors that is covered by different licenses. See the various files for details.