The Machine Learning Process: Building Models from a High School Student

If you are in high school, and are also interested in STEM, it is absolutely worth your time to familiarize yourself with the processes associated with machine learning. For one, it is a concept that will continue to transform today’s world in almost all aspects. And secondly, maybe A.I. will pique your interest as it did mine and become something you want to study in college.

Personally, I’m excited to have discovered that I am passionate about machine learning as a junior in high school. This is because I am now able to choose a university with this in mind. If the rest of this post sounds like anything of interest to you, then read on to find out how to build models for Machine Learning!

Step-by-Step Guide to Building Your Own Machine Learning Model

Step 1: Choosing Your Data Set

The basic step-by-step machine learning process of training and testing models involves first choosing data to utilize. Supervised learning requires labeled input and output data that allows models to work off of a previously determined rule or pattern for categorizing the set. On the other hand, unsupervised learning does not require a labeled dataset. The technique instead leaves it up to the machine to determine how to correctly categorize the data based on patterns it identifies.

While supervised learning tends to produce more accurate results, it is also less likely to be helpful in real-world A.I. applications. Viable labeled datasets are hard to come by. In many cases where it would be preferable to utilize A.I., these datasets don’t exist. For example, developing driver safety technology requires the machine to analyze an infinite amount of situations, each one different from the last.

No previously collected dataset could encompass all of the possible inputs a model would come across--not to mention the difficulty of actually collecting that data.

Step 2: Importing Libraries, Testing & Training, and Building Models

The next steps in the machine learning process are to import helpful libraries (such as sci kit learn, numpy, pandas), creating testing and training sets, building models, and evaluating and comparing accuracy. Splitting data into separate testing and training sets serves the purpose of ensuring that models don’t simply memorize the initial data. If you train with one set and test accuracy with another, it will produce a meaningful accuracy score.

Confusion matrices are also a helpful tool in analyzing a model’s performance. They report true and false positives and negatives.

While working on my first machine learning project as a student participant in the Inspirit AI summer program, I saw an interesting example of this. Our models were supposed to classify NASA telescope data as either exoplanets or non-exoplanets. Although one received an accuracy score of 99%, its confusion matrix revealed that it had classified every actual exoplanet as a non-exoplanet. The model had learned that because the percentage of exoplanets was so small compared to the percentage of non-exoplanets. Classifying every data point as a non-exoplanet produced the “best results”.

Issues like these can be resolved by augmenting the data. This essentially meant that we created more data to feed our model so that it had a better chance to learn and produce more accurate results.

Previous
Previous

AI Technologies: 5 Interesting Applications You May Never Have Heard Of

Next
Next

How to Get Into AI as a High School Student