In response, researchers in the Future Interfaces Group at Carnegie Mellon University’s Human-Computer Interaction Institute (HCII) are developing a tool called EyeMU, which allows users to execute operations on a smartphone by combining gaze control and simple hand gestures.
Users may interact with their screens without raising a finger with EyeMU. As more people use their smartphones to view movies, edit video, read the news, and keep up with social media, these devices have evolved to accommodate larger screens and more processing power for more demanding activities. Unwieldy phones have the disadvantage of requiring a second hand or voice commands to operate, which can be awkward and inconvenient.
“We asked the question, ‘Is there a more natural mechanism to use to interact with the phone?’ And the precursor for a lot of what we do is to look at something,” said Karan Ahuja, a doctoral student in human-computer interaction. Gaze analysis and prediction aren’t new, but achieving an acceptable level of functionality on a smartphone would be a noteworthy advance.
SCS researchers are developing a tool, called EyeMU, that allows users to execute operations on a smartphone by combining gaze control with simple hand gestures.
“The eyes have what you would call the Midas touch problem,’ said Chris Harrison, an associate professor in the HCII and director of the Future Interfaces Group. “You can’t have a situation in which something happens on the phone everywhere you look. Too many applications would open.”
CMU researchers show how gaze estimation using a phone’s user-facing camera can be paired with motion gestures to enable a rapid interaction technique on handheld phones.
Software that tracks the eyes with precision can solve this problem. Andy Kong, a senior majoring in computer science, had been interested in eye-tracking technologies since he first came to CMU. He found commercial versions pricey, so he wrote a program that used a laptop’s built-in camera to track the user’s eyes, which in turn moved the cursor around the screen — an important early step toward EyeMU.
“Current phones only respond when we ask them for things, whether by speech, taps or button clicks,” Kong said. “If the phone is widely used now, imagine how much more useful it would be if we could predict what the user wanted by analyzing gaze or other biometrics.”
Kong, the paper’s lead author, presented the team’s findings with Ahuja, Harrison and Assistant Professor of HCII Mayank Goel at last year’s International Conference on Multimodal Interaction. Having a peer-reviewed paper accepted to a major conference was a huge achievement for Kong, an undergraduate researcher.
Kong and Ahuja advanced that early prototype by using Google’s Face Mesh tool to study the gaze patterns of users looking at different areas of the screen and render the mapping data. Next, the team developed a gaze predictor that uses the smartphone’s front-facing camera to lock in what the viewer is looking at and register it as the target. The team made the tool more productive by combining the gaze predictor with the smartphone’s built-in motion sensors to enable commands. For example, a user could look at a notification long enough to secure it as a target and flick the phone to the left to dismiss it or to the right to respond to the notification. Similarly, a user might pull the phone closer to enlarge an image or move the phone away to disengage the gaze control, all while holding a large latte in the other hand.
“The big tech companies like Google and Apple have gotten pretty close with gaze prediction, but just staring at something alone doesn’t get you there,” Harrison said. “The real innovation in this project is the addition of a second modality, such as flicking the phone left or right, combined with gaze prediction. That’s what makes it powerful. It seems so obvious in retrospect, but it’s a clever idea that makes EyeMU much more intuitive.”