How it Works
The need for helping elderly individuals or couples remain in their home is increasing as our global population ages. Cognitive processing offers opportunities to assist the elderly by processing information to identify opportunities for caregivers to offer assistance and support.
Rather than depending on sensors worn on the body (and needing to be not forgotten, recharged, etc..) or installing sensors on individual devices (e.g. sink, cabinet, refrigerator, etc..), this solution utilizes passive monitoring devices, notably video camera(s) with multi-channel microphones, to sense activity, record events, and build a model of normative behavior.
Installation on kitchen shelf
Initially we intend to monitor the kitchen and recognizes the presence of a person (not an individual) in the room. From this simple event detection we will build a normative baseline of daily activity and detect when that daily activity exhibits aberrations (e.g. no activity in kitchen after +2 std. dev. past median time).
The visual recognition algorithm will run on the local device (e.g. RaspberryPi with camera), the issues of round-trip latency to the cloud, or bandwidth required, or security or privacy concerns, will be eliminated. Only events generated from this recognition will be sent to the cloud to build (and update) the normative behavior model.
This initial scenario may be easily extended to other passive monitoring capabilities, e.g. audio, motion (sonar), electrical circuit, , ... as well as deployment in other area, e.g. entry way, bathroom, hallway, etc.. to provide support for additional scenarios, e.g. medication adherence, diet, exercise, ...
The home automation package Home Assistant provides an integration point for both AgeAtHome devices as well as other IoT devices from multiple sensors. You can find out more about Home Assistant at http://home-assistant.io
First and Last
The table below has the first and last events today in which a Human classifier was identified. This result is via a Looker generated SQL provisioning a JSON result that is then cleaned-up using a CGI script. There is a bug in that the public dashDB instance I am sharing has its timezone set to GMT; this does not appear to something I can change in a simple or straight-forward fashion.
The results may be NULL until I find a fix.
Location First Seen Last Seen
The solution is patterned on a "do-it-yourself" (DIY) approach and utilizes inexpensive hardware, open source software, and free services from the IBM Bluemix cloud. The local device is comprised of a RaspberryPi computer and PlayStation3 Eye USB camera with four (4) internal microphones. Total cost of hardware is under US$80 on-line.
IBM Bluemix cloud provides a suite of services for the application, including this Web site, which is a Node.js CloudFoundry application.
The Bluemix environment enables this application context to apply to other services,
notably the IBM Watson image recognition services.
In addition, the Cloudant NoSQL repository stores the historical event and image recognition information.
Bluemix also provides the
dashDB hybrid relational data warehousing service.
This service automatically replicates from the Cloudant repository and provides an SQL interface for SELECT, PROJECT and JOIN.
This SQL services can also be consumed by Watson Analytics and other third-party software packages from IBM
and business partners (e.g. Looker).
Also on Bluemix, the IBM
Internet of Things platform and
associated Real-Time Insights provide device registration and system status monitoring for the RaspberryPi.
General purpose conditionals can be applied for integration with email, IFTTT, Node.Red and arbitrary web-hook.
In addition, this site utilizes Mixpanel (www.mixpanel.com) to provide user tracking.
This project started utilizing the AlchemyAPI (http://www.alchemyapi.com/) recognition algorithm and then was extended to include the VisualInsights recognition algorithm.
The AlchemyAPI demonstrated a low signal to noise ratio, with most images being classified as "NO_TAG," indicating that none of the known objects were identified;
in addition, the algorithm only returned a single result.
The VisualInsights algorithm was made available as beta in December 2005, and I added its analysis capabilities to augment the signal; the VisualInsghts algorithm returns up to twenty-five (25)
objects per call; the additional objects the VI algorithm could identify was a great benefit, but the signal, while improved, still included signifcant noise; specifically in the
identification of "humans" being split across various classifications and without any hierarchical organization into groups.
Sadly, first the VisualInsights algorithm was deprecated in June 2016 and in 2017 the AlchemyAPI will also cease to operate. The new algorithm, Watson VisualRecognition,
is the child of AlchemyAPI and VisualInsights with support for multiple entities per image, as well as a default classifier generating poor signal and still significant noise (n.b. there is now
a hierarchy, but neither the classes nor the hierarchy is published and must be discovered from results).
Therefore, I embarked on building a training loop for whatever recognition algorithm I might utilize. This loop would capture the images from the camera's local storage (n.b. uSD card) and
present those images to the application user community (e.g. elderly individual/couple) and enable manual classification for subsequent training, testing, and deploying of a model specific to both this application context (i.e. people detection) as well as the local environs (e.g. room location, dogs, cats, residents, ...)
Collecting the images
The images needed to create the training data for Watson VR are stored on each device in a local directory (n.b.
The image file names correspond to the date and time of the image, as well as a monotonically increasing sequence number.
Access to these images is provided through FTP, restricted to access from the local LAN.
When the end-user engages in curating, a.k.a. labeling, the images into their respective distinct classes (see the next section), another service is invoked (
review service periodically collects new events stored by the device in the Cloudant noSQL repository (e.g. ).
New events include the image identifier; the device is accessed via FTP and the image is collected and collated.
When the process is complete, the count of images in each class is updated in Cloudant (e.g.
), in addition to the sequence number of last event processed.
Labeling the images
Below is the user-interface for labeling images. Options are available as buttons (e.g. person, kitchen, dog, ..) based on previous labels assigned; new labels can be added in the text entry box and the
image's initial classification and capture date are shown.
Simple Web application to label images
Ideally, images are labeled if and only if the image contains the entity in question, e.g. a person, and does not contain any of the other entities of interest (e.g. dog or cat).
The training set also requires negative examples which do not include
any entity (i.e. person, dog or cat).
To achieve this distinction, each camera installation has been pre-defined to a corresponding label (e.g. "kitchen") that is used to identify the negative examples.
Similarly, other locations may also be suitably classified (e.g. bathroom, dining room, living room, ...)
Labeled images are collated into separate directory structure for their new classes and symbolic links are utilized as a state maintaince indicator (i.e. collected, labeled).
Once images are labeled they are deemed ready for training; additional curation of the labeled images is performed in the
Training the classifiers
The Watson VisualRecognition service provides for both initial learning as well as updates with new classes and images. The API does not provide details on images utilized in training for
either positive or negative examples so an independent record of images utilized must be maintained. In addition, no standard of practice is defined for validating or measuring the quality
of the learned model, so independent testing and quality measurement must be constructed. Finally, as the training process appears to be a required constiuent component,
other entities (e.g. myself, my wife, my kids, ..) could also be identified and used to train Watson VR.
The training set is limited to 100 megabytes (MB) of data for each class with a total maximum of 430 MB; minimum number of labeled images is ten (10). Updates can be made against a single
labeled set at a time, also including negative examples (i.e. not including any previously labeled entities).
Each learned model is referred to by both a name as well as a specific identifier. The name is being utilized for the device (e.g. rough-fog) and the identifier determines the model and serves
as an index to keep track of which images have been used for training purposes -- both positive and negative examples. The
script is still in process. Evident in the
log are failures of the Watson VR API call, e.g.
413 Request Entity Too Large, and corresponding successful repetition.
Results from Watson VR
Once the process has successfully complete, the updated model is recorded in Cloudant.
I copied a confusion matrix calculator and created a simple Web application to display the matrix for a given model and/or device (i.e. Watson VR
the prototype is available below:
Simple Web application to view model confusion matrix
The results from training the Watson VR algorithm using the curated examples improved the results, but overall the recall was less than 69% and typically under 40%.
The script executes a number of steps sequentially based on the output of the
aah-classify Web application.
The curated images are organized in the file-system in a directory structure corresponding to
device and class, e.g.
Label images by class
Split each image class train and test sets
Separate sets into batches of no more than 100 M-byte
Build model from train set (iterate batches)
Apply classifier to test set
Calculate quality metrics
Curate sets; iterate build
QA/QC vs production
Promote to production
There is an open source package called DIGITS that provides a graphical interface to numerous deep learning techniques using nVidia GPUs.
The curated image sets created using the Web application above are loaded into DIGITS and trained using the existing GoogLeNet pre-trained (i.e. weighted) Caffe network.
The results indicate significant performance on the TOP1 prediction (i.e. the algorithm got it right). Below are the training graphs indicating ~ 66% accuracy for rough-fog and ~ 72% accuracy for damp-cloud.
Results from DIGITS using Caffe & GoogLeNet
With several weeks of data collected patterns are emerging. Initial analysis step was to understand the classsifiers tagged in the images.
Step 1: Watson Analytics
Data was processed from JSON in Cloudant into CSV files for each device. Resulting data included listing of all classifiers by fifteen (15) minute interval per day.
These images are created using
Watson Analytics. It is simple to get a free account and load the CSV files.
Import these CSV files into Watson Analytics, refine the data model and explore yields the following graphical displays of the classifier space.
Heatmap of classifiers
Relationship between hour, day and classifier
The heatmap of classifiers as well as the primary classifiers across hour of day were both highly informative. This insight lead the selection of the following classifiers
from the VisualInsights recognition algorithms for further investigation using Looker.
Step 2: Looker
Analysis of the underlying events which generated the classifications lead to analysis using Looker. Looker generates SQL for the dashDB replicant of the Cloudant JSON event history
and then makes that output available for visualization and download as JSON. The following graphics are "live" views of the rough-fog and damp-cloud events.
Analysis of activity seen by rough-fog for past week
Analysis of activity seen by damp-cloud for past week
Step 3: Excel
The same UNIX script
that converts the Cloudant JSON into CSV files suitable to Watson Analytics is also suitable
for consumption by Excel.
The script builds the classifier (or classifier set, e.g. 'people') population statistics model for the specified device.
The results are processed in the
mkclass Excel spreadsheet to produce the charts below.
All Activity; the sum of 'person' events in the kitchen
Expected Activity; when 'person' count by 'intervals' "near" mean of interval count
Over Activity; when 'person' count by 'interval' is +2SD over mean of interval count
Under Activity; when 'person' count by 'interval' is -2SD under mean of interval count
Step 4: Web Services
With analytical understanding of the event classifier space and corresponding activity for specific classifiers by day of week and interval of day,
a model can now be constructed to determine the average time of first activity in the AM. That model is calculated on the server, based on the historical
event data and supplied to the device on demand through a Web service.
The Web service is a HTTP address with parameters that invokes logic to to return a result, e.g. the average score for Person in a given interval of time.
Below are some examples of the
The id= argument can be specified with any of the classifiers. The day= and interval= arguments can be specified with "all" or a number for day (0-6) or interval (0-95).
aah-stats Web service:
This information is consumed by the RaspberryPi and utilized to compare current events with the historical population statistics.
Local conditional testing is currently in-progress, utilizing the
motion package capability for additional event processing.
Image captured using motion detection algorithm based on changes in pixel count (bounding box around centroid identifed).
The necessary equipment for this project is relatively inexpensive:
A RaspberryPi3 with 32 GB uSD card and Playstation3 Eye USB camera
RaspberryPi3 ~ US$45.00
Enclosure, uSD card, power-supply ~ US$30
Playstation3 Eye camera ~ US$5 (WOW!)
TOTAL COST: ~ US$80! For comparison, a DropCam from Nest (aka Google) is ~ US$199.
We are making use of resin.io to manage the build and deployment process to the Raspberry Pi devices.
The resin.io service provides a customizable base image with which to "flash" the uSD card for the RaspberryPi. The image may be configured
with the SSID and password for the local WiFi network.
The "AgeAtHome" application we have defined provides a context in which devices participate. Each device is assigned to one application.
Once a device has been flashed and booted, it connects to the resin.io service and presents itself within the application context.
Each device associated with the application can be inspected, including summary status and logs (e.g. stderr).
Including the ability to ssh(1) into a terminal for command line interface:
Listing of motion detection volume data in file system
IBM IoTF Platform
I added the IBM IoT Foundation Quickstart for Raspberry Pi
to the environment and you can see a live stream
of instrumentation data. The dashboard below is shown once a device is registered to a Bluemix account and affiliated with an organization (another level of indirection
off your Bluemix account).
IBM IoTF Platform Dashboard
Changes were made to both Dockerfile as well as initial script to enable IoTF/QS and sample C program only sends system status. Will need to change
the sample program to progres HTTP requests to send any JSON payloads (i.e. our events).
IBM IOTF Real-time Insights
The IBM IOTF Real-time Insights capabilities can be linked to the IOTF Platform
through a shared repository for the JSON events received by the platform; the JSON objects in that repository provide schemas for payload processing.
The IOTF-RTI environment provides for rules to be specified in conjunction with JSON
payload values using numeric comparison to static values, other payload values, and context parameters (n.b. unsure where or how these values are set).
IBM IOTF Real-time Insights
IBM IOTF Real-time Insights
rule condition specification
IBM IOTF Real-time Insights
rule action specification
The following image is a more detailed diagram of the system operational components and process.
GitHub repositories are publically accessible at:
Download RaspberryPi image (Ethernet only; working on WiFi)
Use Etcher to copy image to SD card
Insert SD card in RaspberryPi & connect camera and power
Configure RaspberryPi using Web browser (port 8080 ; image at port 8081)
Send me an email to setup server side; working on automation
View your Looker graphs on-line (in progress)
Setup conditionals for notification (in progress)
is developing cognitive technology to build life
advisors to support the Elderly. These cognitive advisors will be
fueled by fusion of data such as IoT, wearables, social interactions,
health, and financial. To power the advisors, IBM Research has built
core technology, we called the Knowledge Reactor, that is designed
to be a general-purpose reactive data fusion engine for Cognitive
IoT. We use the Watson IoT Platform for all the sensor interfaces
to feed into our reactive data fusion engine. The Knowledge Reactor
will normalize data across various sensors from different manufactures,
heterogeneous device types, and so on and build a contextual model
to track the Elder's status and behavior, and produce alerts from
cognitive agents. The cognitive insights can easily feed into an
iPad app like the Elder ones used in the Japan Post project. Through
our Cognitive Eldercare Research, we will help governments, industries,
and companies around the world as they seek to develop products and
technology enabled services for consumers in the new longevity
If you've read all the way to here and you still want more information, you can find me at the following places; or click on the little blue circle in the lower right of your web browser to talk to me directly: