What is it?

Improve the elderlys' ability to age at home through understanding of daily activities inferred from passive sensor analysis. This project is an exploration of the IBM Cloud and Watson technologies for the use of high-fidelity, low-latency, private sensing and responding at the edge.



The video below provides an overview.

How it Works

The need for helping elderly individuals or couples remain in their home is increasing as our global population ages. Cognitive processing offers opportunities to assist the elderly by processing information to identify opportunities for caregivers to offer assistance and support.

Rather than depending on sensors worn on the body (and needing to be not forgotten, recharged, etc..) or installing sensors on individual devices (e.g. sink, cabinet, refrigerator, etc..), this solution utilizes passive monitoring devices, notably video camera(s) with multi-channel microphones, to sense activity, record events, and build a model of normative behavior.

Installation on kitchen shelf
Initially we intend to monitor the kitchen and recognizes the presence of a person (not an individual) in the room. From this simple event detection we will build a normative baseline of daily activity and detect when that daily activity exhibits aberrations (e.g. no activity in kitchen after +2 std. dev. past median time). The visual recognition algorithm will run on the local device (e.g. RaspberryPi with camera), the issues of round-trip latency to the cloud, or bandwidth required, or security or privacy concerns, will be eliminated. Only events generated from this recognition will be sent to the cloud to build (and update) the normative behavior model. This initial scenario may be easily extended to other passive monitoring capabilities, e.g. audio, motion (sonar), electrical circuit, , ... as well as deployment in other area, e.g. entry way, bathroom, hallway, etc.. to provide support for additional scenarios, e.g. medication adherence, diet, exercise, ...

Home Assistant

The home automation package Home Assistant provides an integration point for both AgeAtHome devices as well as other IoT devices from multiple sensors. You can find out more about Home Assistant at

First and Last

The table below has the first and last events today in which a Human classifier was identified. This result is via a Looker generated SQL provisioning a JSON result that is then cleaned-up using a CGI script. There is a bug in that the public dashDB instance I am sharing has its timezone set to GMT; this does not appear to something I can change in a simple or straight-forward fashion. The results may be NULL until I find a fix.

LocationFirst SeenLast Seen

System Overview

The solution is patterned on a "do-it-yourself" (DIY) approach and utilizes inexpensive hardware, open source software, and free services from the IBM Bluemix cloud. The local device is comprised of a RaspberryPi computer and PlayStation3 Eye USB camera with four (4) internal microphones. Total cost of hardware is under US$80 on-line.

The IBM Bluemix cloud provides a suite of services for the application, including this Web site, which is a Node.js CloudFoundry application. The Bluemix environment enables this application context to apply to other services, notably the IBM Watson image recognition services. In addition, the Cloudant NoSQL repository stores the historical event and image recognition information.

Bluemix also provides the dashDB hybrid relational data warehousing service. This service automatically replicates from the Cloudant repository and provides an SQL interface for SELECT, PROJECT and JOIN. This SQL services can also be consumed by Watson Analytics and other third-party software packages from IBM and business partners (e.g. Looker).

Also on Bluemix, the IBM Internet of Things platform and associated Real-Time Insights provide device registration and system status monitoring for the RaspberryPi. General purpose conditionals can be applied for integration with email, IFTTT, Node.Red and arbitrary web-hook.

In addition, this site utilizes Mixpanel ( to provide user tracking.


This project started utilizing the AlchemyAPI ( recognition algorithm and then was extended to include the VisualInsights recognition algorithm. The AlchemyAPI demonstrated a low signal to noise ratio, with most images being classified as "NO_TAG," indicating that none of the known objects were identified; in addition, the algorithm only returned a single result.

The VisualInsights algorithm was made available as beta in December 2005, and I added its analysis capabilities to augment the signal; the VisualInsghts algorithm returns up to twenty-five (25) objects per call; the additional objects the VI algorithm could identify was a great benefit, but the signal, while improved, still included signifcant noise; specifically in the identification of "humans" being split across various classifications and without any hierarchical organization into groups.

Sadly, first the VisualInsights algorithm was deprecated in June 2016 and in 2017 the AlchemyAPI will also cease to operate. The new algorithm, Watson VisualRecognition, is the child of AlchemyAPI and VisualInsights with support for multiple entities per image, as well as a default classifier generating poor signal and still significant noise (n.b. there is now a hierarchy, but neither the classes nor the hierarchy is published and must be discovered from results).

Therefore, I embarked on building a training loop for whatever recognition algorithm I might utilize. This loop would capture the images from the camera's local storage (n.b. uSD card) and present those images to the application user community (e.g. elderly individual/couple) and enable manual classification for subsequent training, testing, and deploying of a model specific to both this application context (i.e. people detection) as well as the local environs (e.g. room location, dogs, cats, residents, ...)

Collecting the images

The images needed to create the training data for Watson VR are stored on each device in a local directory (n.b. /var/lib/motion). The image file names correspond to the date and time of the image, as well as a monotonically increasing sequence number. Access to these images is provided through FTP, restricted to access from the local LAN.

When the end-user engages in curating, a.k.a. labeling, the images into their respective distinct classes (see the next section), another service is invoked (aah-review). The review service periodically collects new events stored by the device in the Cloudant noSQL repository (e.g. rough-fog). New events include the image identifier; the device is accessed via FTP and the image is collected and collated. When the process is complete, the count of images in each class is updated in Cloudant (e.g. rough-fog/review/all), in addition to the sequence number of last event processed.

Labeling the images

Below is the user-interface for labeling images. Options are available as buttons (e.g. person, kitchen, dog, ..) based on previous labels assigned; new labels can be added in the text entry box and the image's initial classification and capture date are shown.

Simple Web application to label images

Ideally, images are labeled if and only if the image contains the entity in question, e.g. a person, and does not contain any of the other entities of interest (e.g. dog or cat). The training set also requires negative examples which do not include any entity (i.e. person, dog or cat). To achieve this distinction, each camera installation has been pre-defined to a corresponding label (e.g. "kitchen") that is used to identify the negative examples. Similarly, other locations may also be suitably classified (e.g. bathroom, dining room, living room, ...)

Labeled images are collated into separate directory structure for their new classes and symbolic links are utilized as a state maintaince indicator (i.e. collected, labeled). Once images are labeled they are deemed ready for training; additional curation of the labeled images is performed in the Training phase.

Training the classifiers

The Watson VisualRecognition service provides for both initial learning as well as updates with new classes and images. The API does not provide details on images utilized in training for either positive or negative examples so an independent record of images utilized must be maintained. In addition, no standard of practice is defined for validating or measuring the quality of the learned model, so independent testing and quality measurement must be constructed. Finally, as the training process appears to be a required constiuent component, other entities (e.g. myself, my wife, my kids, ..) could also be identified and used to train Watson VR.

The training set is limited to 100 megabytes (MB) of data for each class with a total maximum of 430 MB; minimum number of labeled images is ten (10). Updates can be made against a single labeled set at a time, also including negative examples (i.e. not including any previously labeled entities).

Each learned model is referred to by both a name as well as a specific identifier. The name is being utilized for the device (e.g. rough-fog) and the identifier determines the model and serves as an index to keep track of which images have been used for training purposes -- both positive and negative examples. The train_vr script is still in process. Evident in the log are failures of the Watson VR API call, e.g. 413 Request Entity Too Large, and corresponding successful repetition.

Results from Watson VR

Once the process has successfully complete, the updated model is recorded in Cloudant.

I copied a confusion matrix calculator and created a simple Web application to display the matrix for a given model and/or device (i.e. Watson VR classifier_id, and name); the prototype is available below:

Simple Web application to view model confusion matrix

The results from training the Watson VR algorithm using the curated examples improved the results, but overall the recall was less than 69% and typically under 40%.

Process Model

The script executes a number of steps sequentially based on the output of the aah-classify Web application. The curated images are organized in the file-system in a directory structure corresponding to device and class, e.g. rough-fog/person.


There is an open source package called DIGITS that provides a graphical interface to numerous deep learning techniques using nVidia GPUs.

The curated image sets created using the Web application above are loaded into DIGITS and trained using the existing GoogLeNet pre-trained (i.e. weighted) Caffe network.

The results indicate significant performance on the TOP1 prediction (i.e. the algorithm got it right). Below are the training graphs indicating ~ 66% accuracy for rough-fog and ~ 72% accuracy for damp-cloud.

Results from DIGITS using Caffe & GoogLeNet


With several weeks of data collected patterns are emerging. Initial analysis step was to understand the classsifiers tagged in the images.

Step 1: Watson Analytics

Data was processed from JSON in Cloudant into CSV files for each device. Resulting data included listing of all classifiers by fifteen (15) minute interval per day.

These images are created using Watson Analytics. It is simple to get a free account and load the CSV files. Import these CSV files into Watson Analytics, refine the data model and explore yields the following graphical displays of the classifier space.

Heatmap of classifiers
Primary Classifiers
Relationship between hour, day and classifier

The heatmap of classifiers as well as the primary classifiers across hour of day were both highly informative. This insight lead the selection of the following classifiers from the VisualInsights recognition algorithms for further investigation using Looker.

Step 2: Looker

Analysis of the underlying events which generated the classifications lead to analysis using Looker. Looker generates SQL for the dashDB replicant of the Cloudant JSON event history and then makes that output available for visualization and download as JSON. The following graphics are "live" views of the rough-fog and damp-cloud events.

Analysis of activity seen by rough-fog for past week
Analysis of activity seen by damp-cloud for past week

Step 3: Excel

The same UNIX script (mkclass) that converts the Cloudant JSON into CSV files suitable to Watson Analytics is also suitable for consumption by Excel. The script builds the classifier (or classifier set, e.g. 'people') population statistics model for the specified device. The results are processed in the Excel spreadsheet to produce the charts below.

  1. All Activity; the sum of 'person' events in the kitchen
  2. Expected Activity; when 'person' count by 'intervals' "near" mean of interval count
  3. Over Activity; when 'person' count by 'interval' is +2SD over mean of interval count
  4. Under Activity; when 'person' count by 'interval' is -2SD under mean of interval count

Step 4: Web Services

With analytical understanding of the event classifier space and corresponding activity for specific classifiers by day of week and interval of day, a model can now be constructed to determine the average time of first activity in the AM. That model is calculated on the server, based on the historical event data and supplied to the device on demand through a Web service. The Web service is a HTTP address with parameters that invokes logic to to return a result, e.g. the average score for Person in a given interval of time.

Below are some examples of the aah-stats Web service:

The id= argument can be specified with any of the classifiers. The day= and interval= arguments can be specified with "all" or a number for day (0-6) or interval (0-95).

This information is consumed by the RaspberryPi and utilized to compare current events with the historical population statistics. Local conditional testing is currently in-progress, utilizing the motion package capability for additional event processing.


Image captured using motion detection algorithm based on changes in pixel count (bounding box around centroid identifed).

High confidence example
Low confidence example


The necessary equipment for this project is relatively inexpensive: A RaspberryPi3 with 32 GB uSD card and Playstation3 Eye USB camera

TOTAL COST: ~ US$80! For comparison, a DropCam from Nest (aka Google) is ~ US$199.


We are making use of to manage the build and deployment process to the Raspberry Pi devices.

The service provides a customizable base image with which to "flash" the uSD card for the RaspberryPi. The image may be configured with the SSID and password for the local WiFi network.

The "AgeAtHome" application we have defined provides a context in which devices participate. Each device is assigned to one application. Once a device has been flashed and booted, it connects to the service and presents itself within the application context.

Each device associated with the application can be inspected, including summary status and logs (e.g. stderr).

Including the ability to ssh(1) into a terminal for command line interface:

Listing of motion detection volume data in file system

IBM IoTF Platform

I added the IBM IoT Foundation Quickstart for Raspberry Pi to the environment and you can see a live stream of instrumentation data. The dashboard below is shown once a device is registered to a Bluemix account and affiliated with an organization (another level of indirection off your Bluemix account).

IBM IoTF Platform Dashboard

Changes were made to both Dockerfile as well as initial script to enable IoTF/QS and sample C program only sends system status. Will need to change the sample program to progres HTTP requests to send any JSON payloads (i.e. our events).

IBM IOTF Real-time Insights

The IBM IOTF Real-time Insights capabilities can be linked to the IOTF Platform through a shared repository for the JSON events received by the platform; the JSON objects in that repository provide schemas for payload processing. The IOTF-RTI environment provides for rules to be specified in conjunction with JSON payload values using numeric comparison to static values, other payload values, and context parameters (n.b. unsure where or how these values are set).

IBM IOTF Real-time Insights rule specification

IBM IOTF Real-time Insights rule condition specification

IBM IOTF Real-time Insights rule action specification

System Detail

The following image is a more detailed diagram of the system operational components and process.

GitHub repositories are publically accessible at:

SETUP (in-progress):
  1. Download RaspberryPi image (Ethernet only; working on WiFi)
  2. Use Etcher to copy image to SD card
  3. Insert SD card in RaspberryPi & connect camera and power
  4. Configure RaspberryPi using Web browser (port 8080 ; image at port 8081)
  5. Send me an email to setup server side; working on automation
  6. View your Looker graphs on-line (in progress)
  7. Setup conditionals for notification (in progress)

More Information

IBM Outthink Aging

IBM Research is developing cognitive technology to build life advisors to support the Elderly. These cognitive advisors will be fueled by fusion of data such as IoT, wearables, social interactions, health, and financial. To power the advisors, IBM Research has built core technology, we called the Knowledge Reactor, that is designed to be a general-purpose reactive data fusion engine for Cognitive IoT. We use the Watson IoT Platform for all the sensor interfaces to feed into our reactive data fusion engine. The Knowledge Reactor will normalize data across various sensors from different manufactures, heterogeneous device types, and so on and build a contextual model to track the Elder's status and behavior, and produce alerts from cognitive agents. The cognitive insights can easily feed into an iPad app like the Elder ones used in the Japan Post project. Through our Cognitive Eldercare Research, we will help governments, industries, and companies around the world as they seek to develop products and technology enabled services for consumers in the new longevity economy.

If you've read all the way to here and you still want more information, you can find me at the following places; or click on the little blue circle in the lower right of your web browser to talk to me directly: LinkedIn GitHub Twitter