High School Student Blog: Project Bridging AI and Robotics
Self Driving Robot, Part 1: Getting on the Road
The Project
This project consists of a self-driving robot that will be able to follow roads and know what to do at stop signs and traffic lights, all with deep learning. My vision for this self-driving robot was for it to basically be a miniature version of a self-driving car in real life. This project was inspired by this video:
The Robot
A while back I received a SparkFun JetBot kit as a gift but never got around to opening it until recently. The kit contains an NVIDIA Jetson Nano, basically a Raspberry Pi on steroids but it doesn't come with an integrated WiFi card. It is surprisingly powerful: 4GB of RAM, 128 CUDA cores running Ubuntu Linux 18.04. You could probably train machine learning models directly on this beast! This is what my JetBot looks like:
Since the kit is outdated and it’s now retired from SparkFun, I had lots of trouble with the software because it was antiquated. Updating the packages caused the software to not work at all. And if I left all the packages as they were, some of the newer software wouldn’t work as well. If I tried updating some packages, reinstalling others, and leave some as is, it would cause the WiFi dongle drivers to fail more often than not. By this time, I wanted to smash the robots to pieces, but I decided to buy some new parts that aren’t in the kit. That’s the beauty of hardware, you don’t have to follow the guide it gives you but you can tweak some stuff and try to get it to work.
The Track
As I mentioned previously, my vision for this self-driving robot is for it to basically be a miniature version of a self-driving car. Recognize pedestrians, stop lights, traffic lights, etc. and know what actions to take in those instances. And so I put my artistic skills to use. The first obstacle I made was the three traffic lights. They are very simple: they use only three LEDs and they are all connected to an Arduino UNO which will control all traffic lights. I think having an Arduino automate the lights switching was a good idea since I will be preoccupied with the robot most of the time.
I also make three little stop signs. Simple and easy. I wanted to make them myself and not print them because my printer also has driver issues.
Next, I grabbed a big piece of cardboard and made a path by gluing darker paper on it. I then made some holes for the wires of the traffic lights to go trough. I finally I taped everything onto the track.
I plan to make other obstacles in the near future. Maybe a speed bump, or put a toy car in the path so the robot can learn to stop when a car is in front of it, maybe some Lego mini-figures can act as pedestrians, etc. It’s still a work in progress and giving me suggestions would be much appreciated.
Deep Learning
Although my robot doesn’t work (yet), I can still look at the NVIDIA JetBot repository and brainstorm on how exactly I am going to tackle this project. The code for this road following example is in their repository linked above.
The first thing that I deemed most important is for the robot to stay on the track and follow it and ignore all the lights and signs for now. Then I can start individually making different models for the robot to recognize the traffic lights, what color the traffic light is, and the stop signs. Once I got all of that nailed down, I can put it all together.
The Road Following
NVIDIA’s Jupyter notebooks use pytorch
instead of what I’m most familiar with, tensorflow
. But I think the two are very similar and it wouldn’t be a big issue to swap one for the other.
The three steps for the road following are a follows:
Data collection
Training
Deployment
The first and last step will obviously be done directly on the robot, but in the second step you can either train the model directly on the robot with the Jetson Nano, or you could use a more powerful machine to train it then save the model file and send it to the robot (probably what I will do). If you are adventurous you can probably try using cloud training.
Data Collection
The way collecting data works on this project is by just taking photos manually. The robot displays the camera feed, and the user’s goal is to place a moveable green dot in the middle of the road. Variety is key in this portion of the project. Take photos of the robot everywhere on the road, turning on a sharp turn, going on a straight path, maybe a little offset from the center, etc. The important thing is that the green dot stays in the middle of the road. Later in this blog post, I will explain what the green dot does.
Then we need to store all the photos taken as well as the respective (x, y) coordinates of the point. Instead of creating an extra CSV table for the point coordinates, we can just store them in the picture’s name; a clever way of saving space. On top of that, NVIDIA states that only 150–200 different images are needed, much less than I was expecting. All those images are then saved in a folder.
The green dot is the target path the robot will take. It is much like an item carrot on a stick in the game Minecraft where you can ride a pig forever because the pig chases the carrot. In this case, the carrot is the green dot. I am not sure if this is how self-driving cars are done in the real world, but in this scenario, it is very clever.
Training
For this project, NVIDIA uses transfer learning instead of building its own model, which I think is a great idea because self-driving robots/cars have been made before and there is no need to build a model from scratch.
This project is an example of regression: the robot takes in a frame from the camera stream and the model’s goal is to spit out an (x, y) coordinate for the green dot which the robot follows. The model we use for this project is the ResNet18 CNN provided by pytorch
.
The first thing we need to do is to extract the (x, y) coordinates from the name of each picture path. All the images that were taken in the data collection portion.
After that, we need to do some pre-processing, stuff like converting the image frame into an numpy
array. The pre-processing function returns the image with a tensor of the (x, y) coordinates of the green dot, the image, and its label.
Once we have the dataset processed and ready to go, we split it into train and test so we can verify the accuracy of the model. In this case, the training percent will be 90%. We just need this to see how good our model is actually doing before we deploy it on the robot. NVIDIA uses a batch size of 8, but since I will train the model on my laptop that has a much better GPU than the Jetson Nano, I can train it faster with bigger batch size.
Now it’s time to train the regression model. For this, we load the image and the (x, y) label onto the ResNet18. For each epoch iteration, we calculate the loss with the mean squared error function, and when training it we optimize it with the Adam optimizer. After evaluating the model when it is complete, we determine if the model is good enough and can be saved as a .pth
file. If not the user can tweak the parameters a bit, like the number of epochs, the minimum loss number, etc. to try to improve the accuracy. Once the model is saved we are ready to deploy it.
Deployment
The first thing we need to do is initialize the camera stream, fairly simple. Then we create some sliders that will help the user control the speed and smoothness of the robot’s road following. Now we create the function that pre-processes the camera frame, enters it into the neural network, computes the steering value, and controls the motors. In this function, we incorporate the slider values as well to change the motor values. The angle is calculated by the position of the (x, y) coordinate using trigonometry. The JetBot has a very nice observe()
a function that runs the network and the camera stream in a simple way.
That’s all there is to it! Like I said before, their code is in their JetBot repository and I am just explaining what it does. NVIDIA makes it clear and simple to understand, and making the JetBot software free and open-source it makes it even better because you get to tweak the code.
Closing Remarks
The road following seems straightforward, but now I need to think about how I am going to implement a traffic light detector and a stop sign detector. I will first implement a stop sign detector because it is static and doesn’t change colors like the traffic lights so it will be easier. I still need to find inspiration, research, and look for advice on how to make this. When I finally fix the issues on my robot and get the road following to work, I will hopefully collaborate with some of my Inspirit AI ambassadors. Overall I think this is a very nice project. Combining machine learning and hardware is super fun and reward when it finally works. I hope you enjoyed the first part of this three-part blog series. :)
Jose S. Gallo is a Student Ambassador in the Inspirit AI Student Ambassadors
Program. Inspirit AI is a pre-college enrichment program that exposes curious high
school students globally to AI through live online classes. Learn more at
https://www.inspiritai.com/.
https://gallojsantiago.medium.com/self-driving-robot-part-1-getting-on-the-road-7b41f9a53a71