Voice AI assistant using javascript, PHP, and Chat GPT

My disappointment with my Amazon Echo/Alexa device doubled every time I tried to use it and, after some recent exploration with live video streaming, I wanted to pair my desire for a quirky voice assistant with my desire to learn more about audio stream handling in javascript. The result is a cheesy AI voice assistant that sometimes feeds me a good dad joke.

Click here to give it a try for yourself!

How it works

Establishing Audio Input. The script begins with creating a media stream/getting the mic input. It’ll then listen and analyze the background amplitude / sound level to determine how much background noise is present so it can distinguish between background noise and the user talking to it. This eliminates the need for a “wake word” like “Alexa” or “Hey Google”. Once the baseline amplitude has been determined, we add 10 to it just to give a bit more buffer to distinguish background noise from the user speaking to the assistant. This baseline amplitude is defined a “Threshold” on the UI.

const checkAmplitude = () => {
	whatsHappeningDiv.innerHTML = 'Calibrating . . .';
	analyser.getByteFrequencyData(dataArray);
	const average = dataArray.reduce((acc, val) => acc + val, 0) / bufferLength;
	amplitudeSum += average;
	count++;

	if (count >= 100) { // 100 * 50ms (0.05seconds) = 5 seconds
		backgroundAmplitude = Math.max(10, 1.7 * (amplitudeSum / count)); //the avearge initial amplitude detected. ie the background noise. Adding 70% to it to give a threshold buffer and setting the min to 10 so high quality mics/very quiet environments don't allow the avg to be 0.
		clearInterval(timer);
		resolve();
		whatsHappeningDiv.innerHTML = 'I\'m ready to listen.';
	}
};

Listening for Input. Next, the script simply listens to the audio stream indefinitely until the amplitude exceeds the baseline threshold defined above. The incoming amplitude is displayed to the user alongside the threshold aplitude. The user can recalibrate the baseline threshold by simply refreshing.

function updateAmplitude(stream) {
	const audioContext = new AudioContext();
	const analyser = audioContext.createAnalyser();
	const microphone = audioContext.createMediaStreamSource(stream);
	microphone.connect(analyser);
	analyser.fftSize = 32; //It's better to keep this low so the response generation is faster (less to analyze and average)
	const bufferLength = analyser.frequencyBinCount;
	const dataArray = new Uint8Array(bufferLength);

	const checkAmplitude = () => {
		analyser.getByteFrequencyData(dataArray);
		//console.log('Frequency Data Array:', dataArray); // Log the array data
		const average = dataArray.reduce((acc, val) => acc + val, 0) / bufferLength;
		const amplitudeDisplay = document.getElementById('amplitudeDisplay');
		amplitudeDisplay.textContent = 'Amplitude: ' + average.toFixed(0) + '. Threshold: ' + Math.round(backgroundAmplitude) + '.';

			if (average > backgroundAmplitude && !isRecording && !isPaused && !isWaitingForResponse && !isAudioPlaying) {
				startRecording();
				console.log('Recording STARTED due to high amplitude.' + 'recording: ' + isRecording + 'pause:' + isPaused + 'waiting: ' + isWaitingForResponse + 'playing: ' + isAudioPlaying);
			} else if (average < backgroundAmplitude && isRecording && !isPaused) {
				if (!lowAmplitudeStartTime) {
					lowAmplitudeStartTime = Date.now();
				} else if (Date.now() - lowAmplitudeStartTime >= 3000) {
					stopRecording(); //If there's more than 3 seconds of quiet, stop recording.
					console.log('Recording STOPPED due to low amplitude.');
				}
			} else {
				lowAmplitudeStartTime = null;
			}
	};

	timer = setInterval(checkAmplitude, 50);
}

Recording Audio. Once the baseline amplitude threshold has been exceeded, the script begins recording audio. Once the script detects audio below the baseline threshold for >3 seconds *or* if the recording time exceeds 10 seconds, the script stops recording and generates an MP3.

function startRecording() {
	if (!isRecording && !isWaitingForResponse && !isAudioPlaying) {
		mediaRecorder.start();
		isRecording = true;
		whatsHappeningDiv.innerHTML = 'Listening . . .';
		humanTextRequestDiv.innerHTML = '';
		assistantResponseTextDiv.innerHTML = '';
		recordingTimeout = setTimeout(stopRecording, 10000); //If listening for more than 10 seconds, stop.
		lowAmplitudeStartTime = null;
	}
}
...
function saveRecording(blob) {
	isWaitingForResponse = true;
	const xhr = new XMLHttpRequest();
	xhr.onload = function () {
		isWaitingForResponse = false;
		if (xhr.status === 200) {
			const responseJson = JSON.parse(xhr.responseText);
			humanTextRequestDiv.innerHTML = '<strong>What I heard:</strong> ' + responseJson.human_text_request;
			assistantResponseTextDiv.innerHTML = '<strong>My response:</strong> ' + responseJson.assistant_response_text;
			console.log('Recording saved successfully.');
			audioPlayer.src = responseJson.audio_src;
			audioPlayer.load();
			audioPlayer.play();
			console.log(responseJson);
			console.log('Audio src updated and reloaded: ' + responseJson.audio_src);
		} else {
			whatsHappeningDiv.innerHTML = 'Failed to save recording: ' + xhr.statusText;
			console.error('Failed to save recording:', xhr.statusText);
		}
	};
	xhr.open('POST', 'write_file.php');
	console.log('File sent to POST handler.');
	whatsHappeningDiv.innerHTML = 'Thinking about what you said . . .';
	xhr.send(blob);
}

Transcribing Audio to Text. From here, we shift from javascript to the PHP handler. The handler leverages OpenAI’s Whisper model and speech to text endpoint to transcribe.

//A function to handle the repeat cURL calls
function callOpenAPI($url, $postData, $headers) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
    $response = curl_exec($ch);

    if (curl_errno($ch)) {
        $response = ['errors' => curl_error($ch)];
    }

    curl_close($ch);
    return json_decode($response);
}

// Step 1: Transcribe the input
$postTranscriptionData = [
    'model' => 'whisper-1',
    'file' => curl_file_create($filename),
    'response_format' => 'verbose_json',
    'language' => 'en',
];
$response = callOpenAPI('https://api.openai.com/v1/audio/transcriptions', $postTranscriptionData, $headers);
$text = isset($response->text) ? $response->text : '';
$json_summary->human_text_request = $text;

Prompting Chat GPT for Response. I wanted some quirky feedback so I tweaked my prompt to return some dad jokes but otheriwse, it’s pretty straight forward using the OpenAI Completions endpoint.

// Step 2: Generate response from Chat GPT
$headers = [
    "Content-Type: application/json",
    "Authorization: Bearer $openAIToken",
];
$postGPTData = [
    'model' => 'gpt-4-turbo-preview',
    'messages' => [
        ['role' => 'system', 'content' => "You're a virtual assistant who is interpreting voice requests from users. They like to joke and enjoy sarcasm but appreciate factual and succinct responses. Keep your responses to their requests and questions to less than 100 words unless they ask for something longer. They love a good edgy or sarcastic dad joke if you can incorporate one as part of your response -- but don't make it too corny."],
        ['role' => 'user', 'content' => $text],
    ],
];
$response = callOpenAPI('https://api.openai.com/v1/chat/completions', json_encode($postGPTData), $headers);
$assistant_response = isset($response->choices[0]->message->content) ? $response->choices[0]->message->content : '';
$json_summary->assistant_response_text = $assistant_response;

Generating Text to Speech. Lastly, we convert the Chat GPT response to speech and save it as an MP3 file, as well. All of these are then bundled and passed back to the javascript as json and then updated and played for the user.

// Step 3: Generate speech response
$postTTSData = [
    'model' => 'tts-1',
    'input' => $assistant_response,
    'voice' => 'onyx',
];
$response = callOpenAPI('https://api.openai.com/v1/audio/speech', json_encode($postTTSData), $headers);
file_put_contents("recordings/{$file_id}_response.mp3", $response);
//$json_summary->test = $response;
$json_summary->audio_src = "recordings/{$file_id}_response.mp3";

The full javascript:

document.addEventListener('DOMContentLoaded', () => {
	const audioPlayer = document.getElementById('responseAudio');
	const whatsHappeningDiv = document.getElementById('whats_happening');
	const humanTextRequestDiv = document.getElementById('human_text_requestDiv');
	const assistantResponseTextDiv = document.getElementById('assistant_response_textDiv');
	const toggleMute = document.getElementById('toggle_mute');

	let mediaRecorder;
	let audioChunks = [];
	let isRecording = false;
	let isPaused = false;
	let isWaitingForResponse = false;
	let isAudioPlaying = false;
	let timer;
	let recordingTimeout;
	let lowAmplitudeStartTime;
	let backgroundAmplitude = 0;

	function calculateBackgroundAmplitude(stream) {
		return new Promise((resolve, reject) => {
			const audioContext = new AudioContext();
			const analyser = audioContext.createAnalyser();
			const microphone = audioContext.createMediaStreamSource(stream);
			microphone.connect(analyser);

			analyser.fftSize = 32; //It's better to keep this low so the response generation is faster (less to analyze and average)

			const bufferLength = analyser.frequencyBinCount;
			const dataArray = new Uint8Array(bufferLength);

			let amplitudeSum = 0;
			let count = 0;

			const checkAmplitude = () => {
				whatsHappeningDiv.innerHTML = 'Calibrating . . .';
				analyser.getByteFrequencyData(dataArray);
				const average = dataArray.reduce((acc, val) => acc + val, 0) / bufferLength;
				amplitudeSum += average;
				count++;

				if (count >= 100) { // 100 * 50ms (0.05seconds) = 5 seconds
					backgroundAmplitude = Math.max(10, 1.7 * (amplitudeSum / count)); //the avearge initial amplitude detected. ie the background noise. Adding 70% to it to give a threshold buffer and setting the min to 10 so high quality mics/very quiet environments don't allow the avg to be 0.
					clearInterval(timer);
					resolve();
					whatsHappeningDiv.innerHTML = 'I\'m ready to listen.';
				}
			};

			timer = setInterval(checkAmplitude, 50);
		});
	}

	if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
		navigator.mediaDevices.getUserMedia({ audio: true })
			.then(async stream => {
				await calculateBackgroundAmplitude(stream); // Call function to calculate background amplitude
				updateAmplitude(stream);
				mediaRecorder = new MediaRecorder(stream);
				mediaRecorder.ondataavailable = event => {
					audioChunks.push(event.data);
				};
				mediaRecorder.onstop = () => {
					const audioBlob = new Blob(audioChunks, { 'type': 'audio/mp3' });
					saveRecording(audioBlob);
					audioChunks = [];
				};
			})
			.catch(error => {
				console.error('Error accessing microphone:', error);
				whatsHappeningDiv.innerHTML = 'Please allow microphone access and then refresh the page if needed.';
			});
	} else {
		console.error('getUserMedia not supported in this browser.');
		whatsHappeningDiv.innerHTML = 'Your browser isn\'t supported.';
	}

	function startRecording() {
		if (!isRecording && !isWaitingForResponse && !isAudioPlaying) {
			mediaRecorder.start();
			isRecording = true;
			whatsHappeningDiv.innerHTML = 'Listening . . .';
			humanTextRequestDiv.innerHTML = '';
			assistantResponseTextDiv.innerHTML = '';
			recordingTimeout = setTimeout(stopRecording, 10000); //If listening for more than 10 seconds, stop.
			lowAmplitudeStartTime = null;
		}
	}

	function stopRecording() {
		if (isRecording) {
			clearTimeout(recordingTimeout);
			whatsHappeningDiv.innerHTML = 'No longer listening . . .';
			mediaRecorder.stop();
			isRecording = false;
		}
	}

	/*
	I intend to add ability to mute TTS audio playback at some point.
	function muteAudio() {
		audioPlayer.muted = !audioPlayer.muted;
		toggleMute.innerHTML = audioPlayer.muted ? '<a onclick="muteAudio()">Toggle Mute Version 1</a>' : '<a onclick="muteAudio()">Toggle Mute Version 2</a>';
	}
	*/
	
	function handleAudioEvent(event) {
		if (event.type === 'play') {
			isAudioPlaying = true;
			console.log('Audio is playing.');
		} else if (event.type === 'ended') {
			isAudioPlaying = false;
			console.log('Audio has stopped playing.');
			whatsHappeningDiv.innerHTML = 'I\'m ready to listen again.';
		}
	}

	audioPlayer.addEventListener('play', handleAudioEvent);
	audioPlayer.addEventListener('ended', handleAudioEvent);

	function saveRecording(blob) {
		isWaitingForResponse = true;
		const xhr = new XMLHttpRequest();
		xhr.onload = function () {
			isWaitingForResponse = false;
			if (xhr.status === 200) {
				const responseJson = JSON.parse(xhr.responseText);
				humanTextRequestDiv.innerHTML = '<strong>What I heard:</strong> ' + responseJson.human_text_request;
				assistantResponseTextDiv.innerHTML = '<strong>My response:</strong> ' + responseJson.assistant_response_text;
				console.log('Recording saved successfully.');
				audioPlayer.src = responseJson.audio_src;
				audioPlayer.load();
				audioPlayer.play();
				console.log(responseJson);
				console.log('Audio src updated and reloaded: ' + responseJson.audio_src);
			} else {
				whatsHappeningDiv.innerHTML = 'Failed to save recording: ' + xhr.statusText;
				console.error('Failed to save recording:', xhr.statusText);
			}
		};
		xhr.open('POST', 'write_file.php');
		console.log('File sent to POST handler.');
		whatsHappeningDiv.innerHTML = 'Thinking about what you said . . .';
		xhr.send(blob);
	}

	function updateAmplitude(stream) {
		const audioContext = new AudioContext();
		const analyser = audioContext.createAnalyser();
		const microphone = audioContext.createMediaStreamSource(stream);
		microphone.connect(analyser);
		analyser.fftSize = 32; //It's better to keep this low so the response generation is faster (less to analyze and average)
		const bufferLength = analyser.frequencyBinCount;
		const dataArray = new Uint8Array(bufferLength);

		const checkAmplitude = () => {
			analyser.getByteFrequencyData(dataArray);
			//console.log('Frequency Data Array:', dataArray); // Log the array data
			const average = dataArray.reduce((acc, val) => acc + val, 0) / bufferLength;
			const amplitudeDisplay = document.getElementById('amplitudeDisplay');
			amplitudeDisplay.textContent = 'Amplitude: ' + average.toFixed(0) + '. Threshold: ' + Math.round(backgroundAmplitude) + '.';

				if (average > backgroundAmplitude && !isRecording && !isPaused && !isWaitingForResponse && !isAudioPlaying) {
					startRecording();
					console.log('Recording STARTED due to high amplitude.' + 'recording: ' + isRecording + 'pause:' + isPaused + 'waiting: ' + isWaitingForResponse + 'playing: ' + isAudioPlaying);
				} else if (average < backgroundAmplitude && isRecording && !isPaused) {
					if (!lowAmplitudeStartTime) {
						lowAmplitudeStartTime = Date.now();
					} else if (Date.now() - lowAmplitudeStartTime >= 3000) {
						stopRecording(); //If there's more than 3 seconds of quiet, stop recording.
						console.log('Recording STOPPED due to low amplitude.');
					}
				} else {
					lowAmplitudeStartTime = null;
				}
		};

		timer = setInterval(checkAmplitude, 50);
	}

});

The full PHP script:

<?php
//A function to handle the repeat cURL calls
function callOpenAPI($url, $postData, $headers) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
    $response = curl_exec($ch);

    if (curl_errno($ch)) {
        $response = ['errors' => curl_error($ch)];
    }

    curl_close($ch);
    return json_decode($response);
}

//Handing the audio input
$audio_data = file_get_contents('php://input');
$file_id = uniqid();
$filename = "recordings/$file_id.mp3";
file_put_contents($filename, $audio_data);

//Setting the standard header and prepping the response
$json_summary = new stdClass();
$openAIToken="";
$headers = [
    "Authorization: Bearer $openAIToken",
];

////////////////////////////////////////////////////////////
// Step 1: Transcribe the input
$postTranscriptionData = [
    'model' => 'whisper-1',
    'file' => curl_file_create($filename),
    'response_format' => 'verbose_json',
    'language' => 'en',
];
$response = callOpenAPI('https://api.openai.com/v1/audio/transcriptions', $postTranscriptionData, $headers);
$text = isset($response->text) ? $response->text : '';
$json_summary->human_text_request = $text;

////////////////////////////////////////////////////////////
// Step 2: Generate response from Chat GPT
$headers = [
    "Content-Type: application/json",
    "Authorization: Bearer $openAIToken",
];
$postGPTData = [
    'model' => 'gpt-4-turbo-preview',
    'messages' => [
        ['role' => 'system', 'content' => "You're a virtual assistant who is interpreting voice requests from users. They like to joke and enjoy sarcasm but appreciate factual and succinct responses. Keep your responses to their requests and questions to less than 100 words unless they ask for something longer. They love a good edgy or sarcastic dad joke if you can incorporate one as part of your response -- but don't make it too corny."],
        ['role' => 'user', 'content' => $text],
    ],
];
$response = callOpenAPI('https://api.openai.com/v1/chat/completions', json_encode($postGPTData), $headers);
$assistant_response = isset($response->choices[0]->message->content) ? $response->choices[0]->message->content : '';
$json_summary->assistant_response_text = $assistant_response;

////////////////////////////////////////////////////////////
// Step 3: Generate speech response
$postTTSData = [
    'model' => 'tts-1',
    'input' => $assistant_response,
    'voice' => 'onyx',
];
$response = callOpenAPI('https://api.openai.com/v1/audio/speech', json_encode($postTTSData), $headers);
file_put_contents("recordings/{$file_id}_response.mp3", $response);
//$json_summary->test = $response;
$json_summary->audio_src = "recordings/{$file_id}_response.mp3";

////////////////////////////////////////////////////////////
// Step 4: Return json
echo json_encode($json_summary);
?>

If you wanted to use the same CSS stylying I have for my demo, here’s the full thing:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>The Better Alexa</title>
    <style>
        body {
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh;
            margin: 0;
            background: linear-gradient(to bottom, #1f0036, #000000);
        }

        .centered-content {
            text-align: center;
            color: white;
            font-family: Helvetica, sans-serif;
        }

        .whats_happening {
            padding-bottom: 50px;
            font-size: 30px;
            font-weight: bold;

            --background: linear-gradient(to right, #553c9a 20%, #FFCC00 40%, #ee4b2b 60%, #ee4b2b 80%);
            background: linear-gradient(to right, #FEAC5E 20%, #C779D0 40%, #4BC0C8 60%, #FEAC5E 80%);
            background-size: 200% auto;

            color: #000;
            background-clip: text;
            -webkit-background-clip: text;
            text-fill-color: transparent;
            -webkit-text-fill-color: transparent;

            animation: shine 20s linear infinite;
        }

        @keyframes shine {
            to {
                background-position: 200% center;
            }
        }

        .toggle_mute {
            color: #ffffff;
            --font-size:80px;
            padding-bottom: 20px;
        }
        .human_text_requestDiv {
            color: #777;
            padding-bottom: 20px;
        }
        .assistant_response_textDiv {
            color: #ffffff;
        }
        .amplitudeDisplay {
            color: #555;
            font-size: 10px;
            padding-top: 50px;
        }

    </style>
</head>
<body>
    <div class="centered-content">
        <audio src="#" controls id="responseAudio" name="responseAudio" style="display: none;"></audio>
        <div id="whats_happening" class="whats_happening">Checking mic . . .</div>
        <div id="toggle_mute" class="toggle_mute"></div>
        <div id="human_text_requestDiv" class="human_text_requestDiv"></div>
        <div id="assistant_response_textDiv" class="assistant_response_textDiv"></div>
        <div id="amplitudeDisplay" class="amplitudeDisplay"></div>
    </div>
</body>
<script>
    document.addEventListener('DOMContentLoaded', () => {
        const audioPlayer = document.getElementById('responseAudio');
        const whatsHappeningDiv = document.getElementById('whats_happening');
        const humanTextRequestDiv = document.getElementById('human_text_requestDiv');
        const assistantResponseTextDiv = document.getElementById('assistant_response_textDiv');
        const toggleMute = document.getElementById('toggle_mute');

        let mediaRecorder;
        let audioChunks = [];
        let isRecording = false;
        let isPaused = false;
        let isWaitingForResponse = false;
        let isAudioPlaying = false;
        let timer;
        let recordingTimeout;
        let lowAmplitudeStartTime;
        let backgroundAmplitude = 0;

        function calculateBackgroundAmplitude(stream) {
            return new Promise((resolve, reject) => {
                const audioContext = new AudioContext();
                const analyser = audioContext.createAnalyser();
                const microphone = audioContext.createMediaStreamSource(stream);
                microphone.connect(analyser);

                analyser.fftSize = 32; //It's better to keep this low so the response generation is faster (less to analyze and average)

                const bufferLength = analyser.frequencyBinCount;
                const dataArray = new Uint8Array(bufferLength);

                let amplitudeSum = 0;
                let count = 0;

                const checkAmplitude = () => {
					whatsHappeningDiv.innerHTML = 'Calibrating . . .';
                    analyser.getByteFrequencyData(dataArray);
                    const average = dataArray.reduce((acc, val) => acc + val, 0) / bufferLength;
                    amplitudeSum += average;
                    count++;

                    if (count >= 100) { // 100 * 50ms (0.05seconds) = 5 seconds
                        backgroundAmplitude = Math.max(10, 1.7 * (amplitudeSum / count)); //the avearge initial amplitude detected. ie the background noise. Adding 70% to it to give a threshold buffer and setting the min to 10 so high quality mics/very quiet environments don't allow the avg to be 0.
                        clearInterval(timer);
                        resolve();
						whatsHappeningDiv.innerHTML = 'I\'m ready to listen.';
                    }
                };

                timer = setInterval(checkAmplitude, 50);
            });
        }

        if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
            navigator.mediaDevices.getUserMedia({ audio: true })
                .then(async stream => {
                    await calculateBackgroundAmplitude(stream); // Call function to calculate background amplitude
                    updateAmplitude(stream);
                    mediaRecorder = new MediaRecorder(stream);
                    mediaRecorder.ondataavailable = event => {
                        audioChunks.push(event.data);
                    };
                    mediaRecorder.onstop = () => {
                        const audioBlob = new Blob(audioChunks, { 'type': 'audio/mp3' });
                        saveRecording(audioBlob);
                        audioChunks = [];
                    };
                })
                .catch(error => {
                    console.error('Error accessing microphone:', error);
                    whatsHappeningDiv.innerHTML = 'Please allow microphone access and then refresh the page if needed.';
                });
        } else {
            console.error('getUserMedia not supported in this browser.');
            whatsHappeningDiv.innerHTML = 'Your browser isn\'t supported.';
        }

        function startRecording() {
            if (!isRecording && !isWaitingForResponse && !isAudioPlaying) {
                mediaRecorder.start();
                isRecording = true;
                whatsHappeningDiv.innerHTML = 'Listening . . .';
                humanTextRequestDiv.innerHTML = '';
                assistantResponseTextDiv.innerHTML = '';
                recordingTimeout = setTimeout(stopRecording, 10000); //If listening for more than 10 seconds, stop.
                lowAmplitudeStartTime = null;
            }
        }

        function stopRecording() {
            if (isRecording) {
                clearTimeout(recordingTimeout);
                whatsHappeningDiv.innerHTML = 'No longer listening . . .';
                mediaRecorder.stop();
                isRecording = false;
            }
        }

        /*
		I intend to add ability to mute TTS audio playback at some point.
		function muteAudio() {
            audioPlayer.muted = !audioPlayer.muted;
            toggleMute.innerHTML = audioPlayer.muted ? '<a onclick="muteAudio()">Toggle Mute Version 1</a>' : '<a onclick="muteAudio()">Toggle Mute Version 2</a>';
        }
		*/
		
        function handleAudioEvent(event) {
            if (event.type === 'play') {
                isAudioPlaying = true;
                console.log('Audio is playing.');
            } else if (event.type === 'ended') {
                isAudioPlaying = false;
                console.log('Audio has stopped playing.');
                whatsHappeningDiv.innerHTML = 'I\'m ready to listen again.';
            }
        }

        audioPlayer.addEventListener('play', handleAudioEvent);
        audioPlayer.addEventListener('ended', handleAudioEvent);

        function saveRecording(blob) {
            isWaitingForResponse = true;
            const xhr = new XMLHttpRequest();
            xhr.onload = function () {
                isWaitingForResponse = false;
                if (xhr.status === 200) {
                    const responseJson = JSON.parse(xhr.responseText);
                    humanTextRequestDiv.innerHTML = '<strong>What I heard:</strong> ' + responseJson.human_text_request;
                    assistantResponseTextDiv.innerHTML = '<strong>My response:</strong> ' + responseJson.assistant_response_text;
                    console.log('Recording saved successfully.');
                    audioPlayer.src = responseJson.audio_src;
                    audioPlayer.load();
                    audioPlayer.play();
                    console.log(responseJson);
                    console.log('Audio src updated and reloaded: ' + responseJson.audio_src);
                } else {
                    whatsHappeningDiv.innerHTML = 'Failed to save recording: ' + xhr.statusText;
                    console.error('Failed to save recording:', xhr.statusText);
                }
            };
            xhr.open('POST', 'write_file.php');
            console.log('File sent to POST handler.');
            whatsHappeningDiv.innerHTML = 'Thinking about what you said . . .';
            xhr.send(blob);
        }

        function updateAmplitude(stream) {
            const audioContext = new AudioContext();
            const analyser = audioContext.createAnalyser();
            const microphone = audioContext.createMediaStreamSource(stream);
            microphone.connect(analyser);
            analyser.fftSize = 32; //It's better to keep this low so the response generation is faster (less to analyze and average)
            const bufferLength = analyser.frequencyBinCount;
            const dataArray = new Uint8Array(bufferLength);

            const checkAmplitude = () => {
                analyser.getByteFrequencyData(dataArray);
                //console.log('Frequency Data Array:', dataArray); // Log the array data
                const average = dataArray.reduce((acc, val) => acc + val, 0) / bufferLength;
                const amplitudeDisplay = document.getElementById('amplitudeDisplay');
                amplitudeDisplay.textContent = 'Amplitude: ' + average.toFixed(0) + '. Threshold: ' + Math.round(backgroundAmplitude) + '.';

                    if (average > backgroundAmplitude && !isRecording && !isPaused && !isWaitingForResponse && !isAudioPlaying) {
                        startRecording();
                        console.log('Recording STARTED due to high amplitude.' + 'recording: ' + isRecording + 'pause:' + isPaused + 'waiting: ' + isWaitingForResponse + 'playing: ' + isAudioPlaying);
                    } else if (average < backgroundAmplitude && isRecording && !isPaused) {
                        if (!lowAmplitudeStartTime) {
                            lowAmplitudeStartTime = Date.now();
                        } else if (Date.now() - lowAmplitudeStartTime >= 3000) {
                            stopRecording(); //If there's more than 3 seconds of quiet, stop recording.
                            console.log('Recording STOPPED due to low amplitude.');
                        }
                    } else {
                        lowAmplitudeStartTime = null;
                    }
            };

            timer = setInterval(checkAmplitude, 50);
        }

    });
</script>
</html>

Home Automation Dashboard – Version 3

Over the past two years, I’ve had a few iterations on my home dashboard project. All of the integrations for a “smart home” have been rather dumb in the sense that they’re just handling static transactions or act only as a new channel for taking actions. I wanted to change this and start bringing actual intelligence into my “smart” devices.

A major problem in the current smart device landscape is the amount of proprietary software and devices that are suffocating innovation and stifling the convenience and luxury that a truly “smart home” can bring to consumers/homes of the future — this means improving my standard of living without effort, not just being a novelty device (a “smart” lightbulb that can be controlled through another novelty device like Amazon Alexa).

In this vein, I’ve been connecting my devices (not just my smart devices) into a single product that enables devices to interact with each other without my intervention. This project has slowly morphed from a UI that simply displayed information and allowed on/off toggling to an actual dashboard that will take actions automatically. There’s not much special behind many of these actions at the moment but it’s a starting point.

Home UI: Version 3

In the prior two iterations of my Home UI product, I focused on two static aspects: device functionality and data collection. With V3, I’ve shifted focus to merging those two and bringing in proactive, intelligent actions and notifications.

Key features

  • Building Habits and Accomplishing More: Using my calendar, weather forecast, my entertainment preferences, and my to-do lists, the system will make scheduling suggestions to help me build positive habits or remind me to take take care of household tasks in a more timely manner. For example, the system knows that I enjoy going to the movies but also knows I enjoy doing things outdoors. The system will encourage an outdoor task if the weather is nice and suggest a movie when it’s raining/I have nothing else scheduled. Similarly, the system will suggest items from my to-do list based on their due date and priority.
  • Commute Planning: the system collects real-time traffic information from Google Maps; toll, traffic alerts (crashes, special events, construction, etc), and camera feeds from WashDOT; and road condition information, including subsurface temperatures from WashDOT, to compare against my calendar for the day and recommend a time for travelling to/from work. For example, if there’s a SeaHawks game in the evening, the system will recognize that and recommend an earlier or later departure to avoid sitting in traffic. Similarly, if I have an early meeting, the system will send me a push notification the night before to recommend setting an earlier alarm.
  • Device Event Bundling: a common use case in home automation, the system will take multiple actions across multiple devices based on a single trigger. For example: before leaving the house, I’m able to reduce my thermostat, turn off all lights, and set my security alarm with having to take each of those actions individually. This isn’t a new concept but it’s a nice implementation despite the various product types supported.
  • Neighborhood Awareness: police events around my home are pushed to me so I know when there was a burglary, car theft, or other concerning event near me. Others are stored and available in a map view.

Full List of Features

  • Pipes RTSP feeds from security cameras and save them to AWS S3 (30 days of storage for ~$1.50)
  • Detect motion in video feeds and triggers notifications
  • Push notifications for:
    • Motion detection from security cameras
    • Police events near my house
    • Traffic alerts that can impact my commute
    • To-Do list reminders and calendar reminders
  • SimpliSafe Security System integration
  • Nest thermostat API integration
  • Nest Hello doorbell camera integration
  • Police events, restaurant health inspection scores, building permit applications, and traffic information for my community are captured/plotted
  • YeeLight integration/control
  • Google Calendar integration
  • Stock price integration (for stock in my portfolio)
  • Amazon Echo Music integration (history only)
  • And a few other things I’ve shared before (such as my movie collection UI)

Hardware in Use

  • Nest thermostat
  • Nest Hello
  • Hikvision security cameras
  • SimpliSafe Alarm System
  • YeeLight light bulbs (I highly recommend these)
  • Raspberry Pi (handles some LAN things)

Software Used

The Underlying Logic for Expansion

The foundation of the system has three core components: 1) building and flattening a timeline for my persona so it knows what to recommend/do and when to recommend/do it, 2) data collection and transformation from a number of different sources, and 3) API/event handling for the devices I use (cell phone, Nest, security stuff, etc).

In order for the system to be most effective, it needs to know a bit about me – it needs data for intelligence. To enable this, I’ve integrated a ton of my day-to-day apps (calendar, note app, commute times, data from my android phone, etc.) so that it’s aware of what I need/want/plan to do. Using this, I can build a sufficient schedule on-the-fly and the system can accompany me by bringing relevant meta-data along the way.

When the persona and supplemental data are merged, higher-quality and intelligent recommendation are the result.

1984

The downside to this approach is the obvious self-inflicted 1984 “big-brother” effect. I’m putting a lot of meta-data about my routine and my lifestyle into the system to effort to encourage the system to reduce the number of small decisions I’m burdened with day-to-day. It sounds crazy just writing that out…I know this.

I see this as inevitable, though. In order for us to achieve the next level of immediacy and convenience, we’ll have to get used to the idea that the next generation of smart devices (ie the next generation of Google AI, Alexa, Siri, etc) will begin using more of the information they already know about us to improve the quality and effectiveness of the convenience we told ourselves we’d get when we purchased the current generation of these devices. Accepting this, I’m okay with sharing a small amount of additional detail alongside what I already share today into a system I control end-to-end.

What’s Next?

I’m working towards extension of the personas concept through deeper integration. I want to focus on making the outputs surfaced to me higher value (ie more intelligent alerting and suggesting) while also concerning myself with less information.

In parallel, I want to continue shifting the system from primarily smart home to an intelligent assistance and entertainment console. I also see this evolving into hardware integrated into the house.

Consuming RTSP Stream and Saving to AWS S3

I wanted to stream and record my home security cameras to the cloud for three reasons: 1) if the NVR is stolen, I’ll have the footage stored remotely, 2) (more realistically) I want to increase the storage availability without having to add hard drives, and 3) I want to increase the ease-of-access for my recordings. There are a number of services that do this for you (such as Wowza) and you can also purchase systems that do this out-of-the-box. The downside to services like Wowza is cost — at least $50/month for a single channel streaming without any recording – and the out-of-the-box solutions are expensive and run on proprietary platforms that limit your use and access…plus it’s more fun to do it yourself and learn something.

The solution I arrived at was to use AWS Lightsail and S3. This gives me the low cost, ease of scale, and accessibility I desire. Due primarily to the transfer rate limits, Lightsail will only work for small, home setups but you could “upgrade” from Lightsail to EC2 to mitigate that. After all, Lightsail is just a pretty UI that takes away all the manual config work needed to setup an EC2 instance (in fact, Lightsail utilizes EC2 behind the scenes).  If you prefer not to use Lightsail or EC2 at all, you could swap in a Raspberry Pi to do the grunt work locally and pass the files to S3. This would cut the monthly cost by ~$5 but comes with the maintenance of local hardware.

What we’re doing

In this guide, we’ll capture and RTSP stream from a Hikvision (which includes most Nightowl, LaView, and many more as they all use a branded form of Hikvision’s software) security camera NVR and save the files to AWS S3 by:

  1. Creating an AWS Lightsail instance
  2. Installing openRTSP (via LiveMedia-Utils package)
  3. Capturing the RTSP stream, save it locally to the Lightsail instance
  4. Installing the AWS PHP SDK and use it to sweep the video files from the Lightsail instance to S3

While the details below are specific to my setup, any RTSP stream (such as the NASA stream from the International Space Station) and any Linux server will work as well. Substitute as desired.

Step 1: Creating the Lightsail Instance

I’m going to use the $5/month LAMP w/PHP7 type so that we can have the 2TB of transfer. In my testing, this was sufficient for the number of cameras/channels I’m handling. You should do your own testing to determine whether this is right for you. Keep in mind that transfer is measured both in AND out and we’ll be transferring these files out to S3.

  1. Navigate to Lightsail
  2. Select [Create Instance].
  3. Here’s a screenshot of the instance I’m using:

Although 100% optional, I’d recommend going ahead and assigning a static IP  and setting up a connection in PuTTY. Otherwise, the web terminal window provided in the Lightsail UI will work – I find it a bit buggy, though.

Step 2: Install LiveMedia-Utils package

The LiveMedia-Utils package contains openRTSP which is the key to consuming and storing the feeds from our NVR. Once connected to our Lightsail instance, let’s:

sudo apt-get install livemedia-utils
cd /usr/src
sudo wget http://www.live555.com/liveMedia/public/live555-latest.tar.gz
sudo tar -xzf live555-latest.tar.gz
cd live
sudo ./genMakefiles linux
sudo make
sudo make install

At this point, openRTSP should be ready to go.

Step 3: Capturing the RTSP steam

I want to keep my video files contained so let’s create a new directory for them:

mkdir /home/bitnami/recordings
cd /home/bitnami/recordings

And now we’re ready to test! I’d recommend reviewing the list of options openRTSP offers before diving in. Here’s my set of options:

openRTSP -D 1 -c -B 10000000 -b 10000000 -4 -Q -F CAM1 -d 300 -P 300 -t -u <USERNAME> <PASSWORD> rtsp://<MYCAMIP>:554/Streaming/Channels/102

Some explanations:
-D 5 | Quit if nothing is received for 5 of more seconds
-c | Play continuously, even after –d timeframe
-B 10000000 | Input buffer of 10MB.
-b 10000000 | Output buffer of 10MB (to the .mp4 file)
-4 | Write in .mp4 format
-Q | Display QOS statistics on exit
-F CAM1 | Prefix the .mp4 files with “CAM1”
-d 300 | Run openRTSP for this many seconds – essentially, the length of your files.
-P 300 | Start a new file every 300 seconds – essential, the length of your individual files (so each 5 minute block of time will be a unique file)
-t | Use TCP instead of UDP
-u <> | My cam’s username, password, and the RTSP URL.

You can use tmux to let openRTSP command contiue to run in the backgound (otherwise, it’ll die when your close your terminal window). So:

tmux
openRTSP -D 1 -c -B 10000000 -b 10000000 -4 -Q -F CAM2 -d 300 -P 300 -t -u <username> <password> <rtspURL>

Then press ctrl+b followed by d to hop out of tmux and you can close the terminal window.

You should see your video files start populating in the /home/bitnami/recordings directory now:

Step 4: Install the AWS PHP SDK and move recordings to S3

As S3 is cheaper and since we only have 40GB of storage with our Lightsail instance, I’m going to move my recordings from Lightsail to S3 using PHP.

Before proceeding, Install the AWS PHP SDK.

Now that the SDK is installed, we can create a simple script and cron to filter through the files in the /home/bitnami/recordings directory, determine their age, move the oldest S3, and delete the file from Lightsail. If my files are 5 minutes long, I’ll have my cron run every 5 minutes. Yes, there are more efficient ways of doing this but I’m okay with being scrappy in this situation.

I’d recommend taking a snapshot of your instance now that everything is setup, tested, and finalized. This enables you to tinker and try new things without worrying about having to repeat this process if you screw something up.

I’ll create a directory for my cron script and its log to live and then create my cron file:

mkdir /home/bitnami/cron
cd /home/bitnami/cron
sudo nano move.php

Here’s the script (move.php) I wrote to handle the directory list, sortation, movement to S3, and deletion from Lilghtsail:

<?php
//Include AWS SDK
require '/home/bitnami/vendor/autoload.php'; 

//Start S3 client
$s3 = new Aws\S3\S3Client([
'region'  => 'us-west-2',
'version' => 'latest',
'credentials' => [
  'key' => '<iamkey>', //IAM user key
  'secret' => '<iamsecret>', //IAM user secret
]
]);

//Set timezone and get current time
date_default_timezone_set('America/Los_Angeles');
$currentTime=strtotime("now");
 
 //Get a list of all the items in the directory, ignoring those we don't want to mess with
$files = array_diff(scandir("/home/bitnami/recordings",1), array('.', '..','.mp4','_cron_camsstorevideos.sh'));

//Loop through those files
foreach($files as $file){
  $lastModified=date ("Y-m-d H:i:s", filemtime("/home/bitnami/recordings/$file"));//Separate out the "pretty" timestamp as we'll use it to rename our files.
  $lastModifiedEpoch=strtotime($lastModified);//Get the last modified time
  if($currentTime-$lastModifiedEpoch>30){ //If the difference between now and when the file was last modified is > 30 seconds (meaning it's finished writing to disk), take actions
    echo "\r\n Taking action! $file was last modified: " . date ("F d Y H:i:s", filemtime("/home/bitnami/recordings/$file"));
    //Save to S3
    $result = $s3->putObject([
    'Bucket' => '<bucketname>', //the S3 bucket name you're using
    'Key'    => "CAM1VIDEO @ $lastModified.mp4", //The new filename/S3 key for our video (we'll use the last modified time for this)
    'SourceFile' => "/home/bitnami/recordings/$file", //The source file for our video
    'StorageClass' => 'ONEZONE_IA' //I'm using one zone, infrequent access (IA) storage for this because it's cheaper
    ]);
    
    //Delete file from lightsail
    unlink("/home/bitnami/recordings/$file");
  }
}
?>

That’s it! As long as you have the write policy applied to your bucket, you should be good to go:

The last thing I’ll do is set a crontab to run the move.php script every 5 minutes and log the output:

*/5 * * * * sudo php /home/bitnami/cron/move.php >> /home/bitnami/cron/move.log 2>&1

Indexing my movie collection

#FirstWorldProblems – having so many DVDs that you forget what you already own and end up buying multiple copies of the same movie.  While 126 movies isn’t a massive collection, it’s enough for me to sometimes forget what I have when I’m pillaging the $5 bins at Best Buy and Target.
To solve for this, I created a Google Sheets list of my collection so I could check what I have from my phone.  After typing all the titles into the list, I realized it’d be very easy for me to use the code I wrote for my DirecTV project to scrape additional details for the movies and create a nice, simple UI….so I did:
v1

What it does

  1. Using The Movie DB API, I pull several pieces of information about the film and store it locally: title, image, release date, rating, budget, revenue, runtime, synopsis, genres, cast, etc.
  2. Storing it locally reduces repetitive, slow API calls and allows me to cleanly add additional attributes like whether it’s DVD, Blu-Ray, Google Movies, Amazon Video, etc.
  3. Adding new titles is easy – I just type in the name and the rest of the details populate immediately.

 
There are two views: one shown above for desktop and another, more compact view, when a tablet or mobile device is detected:
v2
I’m not sure what’s next for this but it was a quick and fun project that connected my inner home automation and movie geeks.

Home Automation and NFC Tags

NFC has proven to be a pretty useless technology for cell phones (unless you’re one of the people who use you Google Wallet/Apple Pay).  Nevertheless, I decided to buy some tags and play with them because they’re so damn cheap (just over a dollar each, depending on the type).
20161002_121409
One useful application of NFC tags is setting “scenes” using my existing home automation setup.  By setting a tag where I usually place my phone at night, I can trigger several events all at once.  When I play my phone on my nightstand, the following events are triggered:

  1. If it’s a weekday, set my alarm for 7:00 AM.
  2. If an alarm was set, the phone will adjust its volume and say “Alarm set to 7 AM”.
  3. Using the same text to speech, the phone will say “Goodnight, Kevin.”
  4. After pausing for a few seconds, it’ll POST to a simple script I wrote  and turn all the lights in the apartment off before setting it’s volume to mute for the remainder of the night.

It’s a simple way of automating my night time routine and is likely the most practical use of NFC tags with home automation (and it’s not super practical, at that).  If you want to recreate, here’s how I did it…

NFC Tag and App

I decided to go with the WhizTags brand because they boast more writeable space (888 bytes of writeable memory vs the standard 144 bytes). For reading and writing the tag, I went with the NFC Tools app. There’s no real reason for using this app – it just looked to be the most stable after a quick search.

Creating the Task

One of the benefits of the NFC Tools app is that you can export import json tasks.  Here’s the json for the task I noted

[ {
    "tasks.profile.name":"bedtime post",
    "tasks.profile.date":"20161002T101631",
    "tasks.profile.length":11,
    "tasks.profile.size":226,
    "tasks.profile.data":[ {
        "tasks.profile.fields": {
            "field1": "5"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "53", "itemTask": "5", "itemDescription": "5", "itemHash": "16686621-29d1-46bf-80f0-2d8905abcfdb"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field1": "0"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "819", "itemTask": "1", "itemHash": "b63e98a5-7385-4ae5-a71e-a76949092649", "itemDescription": "1 second"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field4": "true", "field1": "true", "field3": "true", "field8": "1", "field2": "true", "field5": "true", "field7": "false", "field6": "false"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "92", "itemTask": "f9", "itemDescription": "MON,TUE,WED,THU,FRI\nPerform the tasks below", "itemHash": "eb2223cb-c5a2-4d59-a1a8-961e1bb13dde"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field3": "0", "field1": "7am", "field2": "7"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "41", "itemTask": "7:0;7am", "itemDescription": "7am - 07:00", "itemHash": "ba40261a-84e3-437d-b1e4-b44dbc6e0b8f"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field1": "Alarm set for seven am."
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "85", "itemTask": "Alarm set for seven am.", "itemHash": "02fc2b51-36ba-497a-adea-1620b5799d8c", "itemDescription": "Alarm set for seven am."
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field1": "1"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "819", "itemTask": "2", "itemDescription": "2 seconds", "itemHash": "60c8cd67-00bb-4850-a5c2-cbe5883f3de4"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field1": "1"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "90", "itemTask": "1", "itemHash": "78382993-811f-4692-b110-61ff20406731", "itemDescription": "Close your conditional block"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field1": "Goodnight, Kevin."
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "85", "itemTask": "Goodnight, Kevin.", "itemDescription": "Goodnight, Kevin.", "itemHash": "acab1f45-d379-4af4-99f9-0ccb63db4ae3"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field1": "2"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "819", "itemTask": "3", "itemDescription": "3 seconds", "itemHash": "37320287-c034-4742-a959-705ac3399800"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field1": "0"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "false", "requestType": "53", "itemTask": "0", "itemHash": "d7a566bd-107b-4dae-aff4-f2ce832f06e6", "itemDescription": "0"
        }
    }
    ,
    {
        "tasks.profile.fields": {
            "field1": "http:\/\/hellokevin.com\/nfc\/goodnight.php", "field2": "status=false;"
        }
        ,
        "tasks.profile.config": {
            "itemUpdate": "true", "requestType": "110", "itemTask": "http:\/\/hellokevin.com\/nfc\/goodnight.php|status=false;", "itemHash": "e1e00d48-16fa-4942-9d9e-298b806a65c2", "itemDescription": "Request: http:\/\/hellokevin.com\/nfc\/goodnight.php\nPOST parameters :\nName: status \/ Value: false", "itemTaskExtra": null
        }
    }
    ]
}
]

The Script

The script used to turn the lights off is a modified version of this script, which I posted earlier.  Instead of controlling a single device, I simply added in all the lights in my home and added in the POST var.

Expanding the Home Dashboard

In the previous post, I outlined the Home Dashboard touchscreen for controlling lights, temperature/humidity, displaying Amazon Echo information, displaying who’s home (via bluetooth sniffing), and displaying what was being watched on DirecTv.  As this dashboard is intended to be a sudo remote control for my home, I thought it made sense to be able to actually control the TV with it.
Screenshot 2016-09-01 at 10.18.31 PMScreenshot 2016-09-01 at 10.18.09 PM
After a bit of tweaking and some UI work, the end result is a super-quick (HTTP response is ~30 milliseconds which feels nearly as quick as the standard DirecTv remote) interaction between device and DirecTv receiver.
From my home dashboard/touchscreen controller, I can select the title that’s currently playing to launch the remove control (pictured above).  You’ll notice that the touchscreen controller can do everything the standard remote can do, including guide browsing, DVR browsing, etc.

Home Dashboard using a Raspberry Pi

After creating a desktop home automation dashboard and, later, a live stream “digital picture frame”, I got the idea to combine the two into an always-on control panel that condenses everything I care about into a single kiosk which can sit on my end table or nightstand.

What it does

It’s essentially a condensed UI of the desktop version linked above which uses the same databases and processes.

  • Current indoor temperature and humidity (via DHT11 sensor)
  • If my Amazon Echo is playing music, it’ll display the artist, song, and album
  • If I’m watching TV, it’ll show the title, channel, and image/movie poster
  • Display unique icons for each person in the house (by sniffing for their phone’s bluetooth signal)
  • It’ll show the status of my lights (on/off) and update if that status changes (using the Wink API)
  • Through touch screen, allow me to control my lights in near real-time.

Materials Used

How it works

Much of this (temperature, humidty, DirecTv and Wink control) is covered in “The Foundation” post.  Specific to collecting information from the Amazon Echo, I use IfTTT and the Maker channel.  Each time my Echo plays a song, I POST to a script similar to the one below which stores the song in a MySQL database.  I can then query that, determine if the song is still playing, and publish it to the UI.

<?php
$conn = mysqli_connect(<credentials>);
if (!$conn) {
die("Connection failed: " . mysqli_connect_error());
}
$song=$_REQUEST['song'];
$artist=$_REQUEST['artist'];
$album=$_REQUEST['album'];
$timestamp=$_REQUEST['timestamp'];
$sql = "INSERT INTO echo_history (artist, song, album, timestamp)
VALUES ('$artist', '$song', '$album', '$timestamp')";
if (mysqli_query($conn, $sql)) {
echo "New record created successfully";
} else {
echo "Error: " . $sql . "<br>" . mysqli_error($conn);
}
mysqli_close($conn);
?>

My Maker recipe looks something like this: IF then to URL

recordmusic.php?artist= {{ArtistName}}&song={{SongName}} &album={{AlbumName}} &timestamp={{PlayDateTime}}

Method POST.
That’s about it for the controller – quite simple and is probably the most practical project I’ve done thus far.

Using a Pi to measure TV habits

As noted here, I’m using the the DirecTV SHEF API and a Raspberry Pi to poll my DirecTV receivers every minute and store what they’re doing into a MySQL database. After ~6 months of storing that data, I thought it’d be interesting to analyze some of it.  After all, do I really watch enough HBO and Starz to warrant paying for them every month?

Channel Preferences

  • 25,900 minutes of TV watched (~2.25 hours per day…eek; still less than the national average!).
  • 2,612 minutes of that was recorded (10%).
  • NBC is our favorite channel (3,869 minutes watched, 15%).
  • E! is our second favorite channel (1,911 minutes watched, 7.37%). Gotta keep up with those Kardashians.
  • Premium movie channels (HBO, Starz, Encore, Cinemax, etc) were watched 6,870 minutes (26.52%) – apparently worth the money.
  • Premium movie channels were recorded only 571 minutes (lots of ad hoc movie watching, I guess).
  • NBC is our most recorded channel (479 minutes) followed by HGTV (391 minutes) and ABC (330).

Time Habits

  • Sunday is the most watched day (no surprise here) with 7,157 minutes watched (28%)
  • Saturday is the second with 5,385 (21%)
  • Wednesday is the least watch with 1,144 (4.4%)
  • April was our biggest TV month with 6,413 minutes watched (24.76%)
  • June was our lowest month with 1,197 (4.62%) — July is around 10%.  The excitement of summer faded fast, apparently.
  • 8PM is our biggest TV hour with 1,312 minutes (15.14%).  This is followed by 7pm (13%) and 6pm (10%).
  • 6AM is our lowest TV hour with 68 minutes watched (0.26%).  This is followed by 5am (0.49%) and 4am (0.90%).

This is pointless data but it’s always interesting to assess your own habits.  And if you’re wondering, it took roughly 60 minutes (0.23%) to query and publish this post :).

A month of tinkering

New Design

The original design wasn’t “clean” feeling and didn’t function too well on mobile or even tablet displays.  I changed that up a bit and the new design has a lot of transparent divs, bokeh background images, and some jquery to make actions a bit smoother.
new ui

Wink Integration Improvements

The initial integration of the Wink API wasn’t that great.  I was using PHP to trigger shell scripts which would then make the API call – quite messy and had several opportunities for failure.  This method also made a new request for a bearer token each time an action was taken so if I turned on three lights, I requested three unique tokens from the API.  I’ve since cleaned that up and now use a single token per session and the API calls are all made in a single PHP file.  This still isn’t the cleanest or safest way to do this but it works for my usecase.
While doing this, I also added the ability to dim some lights (such as the kitchen light which we leave on during the night).  The next step is to fetch the current state of the lights so that we can eliminate the on/off option and simply toggle it.  The problem with that is that the Wink Hub struggles to maintain accurate states for its devices.
dim.PNG
Gaining my Wink bearer token for the session:

<?php
$ch_token = curl_init();
curl_setopt($ch_token, CURLOPT_URL, "https://api.wink.com/oauth2/token");
curl_setopt($ch_token, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch_token, CURLOPT_HEADER, FALSE);
curl_setopt($ch_token, CURLOPT_POST, TRUE);
curl_setopt($ch_token, CURLOPT_POSTFIELDS, "{
  \"client_id\": \"<insert_here>\",
  \"client_secret\": \"<insert_here>\",
  \"username\": \"<insert_here>\",
  \"password\": \"<insert_here>\",
  \"grant_type\": \"password\"
}");
curl_setopt($ch_token, CURLOPT_HTTPHEADER, array(
  "Content-Type: application/json"
));
$ch_token_response = curl_exec($ch_token);
curl_close($ch_token);
$ch_token_json = json_decode($ch_token_response, true);
$bearer_token=$ch_token_json['access_token'];
?>

Wink Control:

<?php
$device_id=$_GET["device_id"];
$new_state=$_GET["new_state"];
$bearer_token=$_GET["bearer_token"];
if(is_numeric($new_state)){$action="brightness";} else {$action="powered";}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://api.wink.com/light_bulbs/".$device_id."/desired_state");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "PUT");
curl_setopt($ch, CURLOPT_POSTFIELDS, "{
  \"desired_state\": {
    \"".$action."\": ".$new_state."
  }
}");
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
  "Content-Type: application/json",
  "Authorization: Bearer ".$bearer_token.""
));
$response = curl_exec($ch);
curl_close($ch);
if($new_state=="true"){echo "Turned On"; } else if($new_state=="false") { echo "Turned Off"; } else { echo "Light Dimmed: $new_state"; }
?>

DirecTv Changes

I integrated TheMovieDB.org‘s API to pull images of the movies or shows that are currently on.  This currently works very well for movies but it often fails to find images for shows so I’ll loop back to fix that at some point in the future.  I also added in a link to view the title on IMDB for easy access.  An alarm was added to trigger a notification if the DVR is nearly full.

Mapping GPS Coordinates

Out of sheer curiosity, I decided to push my phone’s GPS coordinates to my server and plot them on a map.  I’m using the SendLocation app to push the coordinates to a script I have setup to listen to the app.  This may, perhaps, help me locate my phone one day if I ever lose it.  For now, though, it’s merely something for me to play with.  I’m also capturing things like speed so I can view when I’m in transit.  Essentially, this traces my steps.
gps

MQ2 Sensor (Gas/Smoke)

I added an MQ2 sensor to the PI and set it up to store the current state as well as send me a text message and email if it detects something.  Overall, the setup is pretty simple and certainly isn’t life-saving but does serve the goal of being able to monitor home while away.
Here’s my Python file I use which I simply schedule a cron job to ensure it’s monitoring frequently enough to be useful (note the loop of 20 and sleep of 3 seconds so it runs for the entire minute before the cron fires again):

import time, sys
import RPi.GPIO as GPIO
import MySQLdb
import smtplib
def sendemail(from_addr, to_addr_list, cc_addr_list,
              subject, message,
              login, password,
              smtpserver='smtp.gmail.com:587'):
    header  = 'From: %s\n' % from_addr
    header += 'To: %s\n' % ','.join(to_addr_list)
    header += 'Cc: %s\n' % ','.join(cc_addr_list)
    header += 'Subject: %s\n\n' % subject
    message = header + message

    server = smtplib.SMTP(smtpserver)
    server.starttls()
    server.login(login,password)
    problems = server.sendmail(from_addr, to_addr_list, message)
    server.quit()
GPIO.setmode(GPIO.BOARD)
GPIO.setup(11, GPIO.IN, pull_up_down=GPIO.PUD_DOWN)
n = 20
def action(pin):
    print 'Sensor detected action!'
    db = MySQLdb.connect("","","","" )
    cursor = db.cursor()
    cursor.execute( 'insert into mq2(event) values("Gas detected")')
    db.commit()
    db.close()
    sendemail(from_addr    = '',
          to_addr_list = ['att phone email address'],
          cc_addr_list = [''],
          subject      = 'Gas Detected',
          message      = 'Elevated levels of gas or smoke detected.',
          login        = '',
          password     = '')
    return
GPIO.add_event_detect(11, GPIO.RISING)
GPIO.add_event_callback(11, action)
try:
    while n > 0:
#        print 'alive'
#        print n
        n = n-1
        time.sleep(3)
except KeyboardInterrupt:
    GPIO.cleanup()
    sys.exit()

Other Changes

  • I’ve changed the interior camera to disable itself if anyone is home.  There’s no need for it to run if during that time and it also ensures an additional layer of security/privacy.
  • The date formats were updated to provide “friendlier” outputs.  I also added a time of day greeting for the active user – “Good morning/afternoon/evening/night”…
  • My favorite Google News RSS feeds were added to the left and a weather forecast link was added.  I also included a calendar with the eventual goal of integrating the Google Calendar API (I use Google Calendar quite heavily).  These are hidden by default in order to maintain the “clean” UI I’m going for.news.png

The Foundation

Purpose

Let’s just get it out of the way now — there’s no true practical purpose or value in doing this.  I took this on as an experiment and opportunity to learn something new.

What is it?

Using a Raspberry Pi, some sensors, and a lot of Googling with trial and error, I took my first step into custom home automation (Wikipedia).  I can control lights, DirecTv receivers, some appliances, measure indoor temperature and humidity, determine who is home, and view indoor/outdoor webcams through a single UI.

Materials and Cost

Screenshots

image

 

image

Control

Lighting Control
Each tailed light uses a GE Link bulb which is connected to a Wink hub.  This allows for on/off control, dimming control, on/off scheduling, and dimming scheduling (such as gradual increases in brightness in the mornings).  Wink comes with a nice app but I opted to use their API so I could incorporate it into the custom UI/dashboard along with everything else.
Cameras
I’m using an old D-Link camera to gain outdoor views and the RPI camera for inside the apartment, I setup scripts to take snapshots once every minute and dump them into a MySQL db running on the Pi and also update the snapshot to include in the UI.
Weather Reporting
Outdoor weather (temperature, “feels like” temperature, humidity, pressure, and windspeed) is pulled from Yahoo! XML weather feeds.
Indoor temperature and humidity is polled every minute using a DHT11 sensor attached to the RPi.  Historicals for all of these are stored in a MySQL database with the intention of graphing these some time in the future.  I’d like to incorporate a Nest-style thermostat for indoor climate control but, alas, I’m a renter and don’t want to deal with that.
Who’s Home?
Using a Bluetooth dongle attached to the RPi, I poll for cell phones to determine who is home and who is away.  Every minute, I log the status of all detected Bluetooth devices so we can see who’s around.  This is also stored in a MySQL db so I can go back in time.
DirecTV Control
Using the DirecTV SHEF API, I currently poll the current program title, the channel number, the station ID (ie NBC or HBO), the program rating, whether or not the DVR is recording.  The API allows you to take full control of the receiver and do all actions you can with the remote but I don’t see much value in that as I can’t watch it while I’m away so why have the functionality…
Appliance Control
Using WEMO plugs, I can power on/off appliances.  This came in handy at Christmas when the outlet was located directly behind the Christmas tree.  At this point, though, there aren’t many appliances I want to control with the Wemo so I have a few of these sitting idle.

Automation and Availability

Amazon Echo Integration
All of these devices have been integrated with the Amazon Echo device either via Echo skills or via IFTT integration.  This allows all of the functionality above to be controlled via voice recognition.  There’s some trial and error getting these setup correctly but I think that’s mostly with the Echo’s voice recognition quality.
IFTT
With integration of IFFT, I can do any number of things if desired.  One of the more useful IFFT setups I’ve found is simply turning on the bedroom lamp ~8 minutes after my alarm goes off and gradually increasing the bulb’s brightness every minute.  Another possible option is to turn some lights on when the Pi’s bluetooth dongle detects that I’m nearby.
Web Server
In order to make this valuable, I installed Apache on the Pi and used ngrok to tunnel to localhost so that I don’t have to worry about the vulnerabilities of port forwarding on my router.  I have this forwarded over to a domain name I wasn’t using and added some .htaccess protection (among other things) to keep it private.

Future Plans

Living in a small apartment limits the value and the opportunities of home automation.  Things like adding reed switches to windows and door don’t make sense in my scenario as I doubt anyone will be climbing through my 7th floor window or trying to break into my door.  Some more practical things I’ll be doing, though, is adding a gas, CO2, and smoke sensor to the Pi so that I’m alerted via text message and push notification if the Pi detects any of those levels becoming elevated…better than waiting on the neighbors to call the fire department, no?  I’d also like to add a PIR motion detector to trigger the Pi Cam to start capturing video instead of still snapshots if motion is detected during hours that I’m normally away from home.  I’ve had some troubles getting the motion detector to work but I’ll loop back to that eventually.