Application Programming Interface (API)

Application Programming Interface (API)


One of the best ways of gathering data for
date science projects is through the use of APIs. Now, API stands for Application Programming
Interface and it’s something that allows computer programs to talk to each other. There’s a lot of them that operate within
your operating system directly, but it’s also a way of accessing web data and pulling in
data live from multiple sources. The most common API used in data science is
going to be what is called a REST API where REST stands for Representational State Transfer
and that’s the software architectural style of the worldwide web. What we’re going to do with REST APIs is access
data on a webpage via HTTP and the data that we get is usually going to be in the JSON
or Java Script Object Notation format. What’s nice about this is that by using this
REST API, we can send data directly from that webpage into other programs for analysis. Also, REST APIs are language agnostic. Some of the most popular APIs are Social APIs. Facebook for example has information like
that. Twitter has a very common API and you can
get data from Twitter. Google Talk is one of the most popular. Foursquare and SoundCloud are also very popular
social APIs. You can build applications that connect with
these social networks directly. In some situations, you can pull the data
in for a data science project. Also, very common are visual APIs such as
Google Maps or Youtube or AccuWeather is very common, and so is Pinterest and Flickr. Let me give you a brief example of a very,
very simple API through R. I’m going to get some data from a website called Ergast, which
is for automobile racing data. And here is what you see if you go to Ergast.com. It’s not much but if you click on that link,
it’ll take you to the developer API that has information about Formula One races from 1950
to 2014. All you have to do is create a URL. I’m going to do a particular search for the
people who won in 1957. This is my data in JSON format, the people
who won each race in 1957. Now that I know what the URL is, I can go
to R. I’m going to use a package called jsonlite, which is a way of accessing JSON data from
the web along with its dependency, curl, which is for URLs. If you don’t have those installed already,
you’ll want to use install packages and I’ll use require here to get jsonlite loaded. It loads curl by default. Then what I’m going to do is I take that URL
from earlier, which is right here, and I use the command from JSON and feed it into an
object called F1 for Formula One. Now, if you want to see what’s in there, it’s
a list object. And so that means that it’s really big and
just lots and lots of text. But it is structured the same way we saw earlier. In fact, I’m going to pull up the structure
here and this is a way of indicating the nested structure. But what I’m going to do is I’m going to get
a piece of that list from the overall object F1 and I’m going to go down a level to MR
data to another level, race tables, and so on until I get to driver and I’m going to
feed that into an object called driver. And then, I’ll get the column names for that. There’s seven columns, I’m going to take four
of them and put them in a different order and then here are the first five drivers listed
in this data set. We have the first name, the last name, Juan
Fangio who won the first race from Argentina, one of the greatest drivers ever. And so what we have here is wonderful way
of accessing this structured JSON data from the web and feeding it into R so we can use
it in our own analyses. In conclusion, APIs make structured web data
easy to get. You can get that data and put it directly
into the program you’re using for analysis. APIs are one of the best tools for data science
in terms of retrieving data that’s structured and can be incorporated into your analysis
to get the insights that you need.

Leave a Reply

Your email address will not be published. Required fields are marked *