Tennis Serve Analyzer Expert System (Part 1)

5 min readNov 16, 2020

Hello! My name is Ariaan-Thor (Ariaan) Ghatate and I am currently a junior at St. Stephen’s Episcopal School in Austin, Texas. Other than coding, I really enjoy participating in math competitions with my friends and I love playing tennis on a competitive level. Before my experience with Inspirit AI, I wasn’t really too aware or knowledgeable about AI and ML algorithms and really the majority of what I knew came from articles discussing the cutting edge of this fascinating field. This really sparked my interest in the subject area and I realized just how unique it really is.

Motivation

One of my very first project ideas was to make an expert system that would analyze a video of someone hitting a serve in tennis and provide highly accurate stats of the serve (location of bounce, spin, speed, etc). You are probably wondering why a Serve Analyzer. Well, in competitive tennis, you win or lose on your serve. Having a powerful serve gives you a huge advantage at the start of a point and makes it easy to win the game. When I first started learning ML and AI, I was going through a slump in tennis and my serve was failing me time and time again. That’s when I decided I would try and build a solution for this problem.

Goal

There are two main elements of this solution:

Capturing the data to describe the serve (Direction of swing, Point of contact, angle of impact, etc.)
Turning the data into a number rating, classifying it, and then modeling it.

Despite the simplicity of the steps, the solution is very difficult to conceptualize and implement. This article will show you how I created a model using AI and ML to analyze a serve. There are 3 major steps that I needed to take in order to reach my goal:

Exploring currently available solutions and then understanding what libraries I can utilize to enhance my model.
Exploring different types of AI models and diving deeper into more libraries.
Implementing AI models and optimizing for tennis serve description and ball path projection.

Solution

My first approach was to use some kind of device to acquire the motion data for the serve. I tried to use many different products, but they were all extremely underwhelming, expensive, and inaccurate. One of the best products I found was the Zepp Tennis 2 Swing and Match Analyzer. It worked well for tracking the overall match stats and the numbers of shots hit of each type (serve, backhand, forehand, overhead, etc), but it was still quite inaccurate with speed and spin. Over the course of multiple baskets of serves, my serve speed had quite a large standard deviation despite the fact that I was more or less hitting the same type of serve with the same swing path. Furthermore, its design was not too great. I had to attach a rubber grip to the grip of my racket which completely messed up my feel for the racket and caused my game to deteriorate. Finally, this whole package cost me over $100 plus tax which in my opinion was not worth the money. It was then that I thought of using AI and video processing to actually acquire the data.

My first step was to identify the object, namely the tennis ball. The first library I investigated was R-CNN. R-CNN is short for “regions with convolutional neural networks.” As its name implies, it’s a modification of a CNN, but instead of having a huge number of regions to analyze, it only analyzes ~2000 regions called “region proposals.” It gets these regions by using a selective search algorithm to identify regions where the monotonous background has been warped by a potential object. The CNN acts as a feature extractor and the output dense layer consists of the features extracted from the image. These features are then fed into a support vector machine which outputs a probability that the object is within the region proposal. In addition to these predictions, the algorithm also predicts four offset values to increase the precision of the bounding box. For example, the algorithm could have predicted the presence of a person, but the face of that person isn’t within the region proposal. As a result, the algorithm can adjust the bounding box of the region proposal based on context. Despite being a major step up from just a normal CNN’s, R-CNN’s still take a long time to train (up to a minute per image) and are not viable for analyzing a video. In addition, the selective search algorithm is a fixed algorithm, meaning that no learning is happening at that stage. This could lead to the generation of bad candidate region proposals.

After R-CNN’s, I investigated YOLO and realized that it had many more benefits than R-CNN’s. YOLO stands for “You only look once” and is extremely unique and efficient in the way it works. All of the previous object detection algorithms use regions to localize the object within the image. The network doesn’t look at the complete image, but rather, parts of the image which have high probabilities of containing the object. YOLO is an object detection algorithm that uses a single convolutional network to predict the bounding boxes and the class probabilities for these boxes. Then, it assigns a probability of a certain object being in that box to that box just as a CNN would. Although YOLO is not as accurate as R-CNN, its accuracy only struggles with very small objects, but its speed is levels ahead of other algorithms in the field as it can process ~45 frames in a second.

Based on my initial research I believe that using YOLO for the majority of the video is the most effective approach, but I can potentially use other models where YOLO’s accuracy isn’t good enough. As of now I’m looking into faster R-CNN and how it works, but right now YOLO is my top option.

Ariaan-Thor (Ariaan) Ghatate is a Student Ambassador in the Inspirit AI Student Ambassadors Program. Inspirit AI is a pre-collegiate enrichment program that exposes curious high school students globally to AI through live online classes. Learn more at https://www.inspiritai.com/.

Sources:

Tennis Serve Analyzer Expert System (Part 1)

Solution

Written by Ariaan-Thor Ghatate