Widget HTML Atas

Cars93 Dataset In R Download

In this R tutorial, we will learn some basic functions with the used car's data set. Within this dataset, we will learn how the mileage of a car plays into the final price of a used car with data analysis.

Install and Load Packages

Below are the packages and libraries that we will need to load to complete this tutorial.

Input:

install . packages ("ggplot2 )

library ( ggplot2 )

Download and Load the Used Cars Dataset

Since we will be using the used cars dataset, you will need to download this dataset. This dataset is already packaged and available for an easy download from the dataset page or directly from here Used Cars Dataset – usedcars.csv

Input:

usedcars < - read . csv ( "usedcars.csv" , stringsAsFactors = FALSE )

View the Used Cars Dataset Data

Once the data is imported, you can run a series of commands to see sample data of the used cars.

A few that I chose to use are below:

str ( )

summary ( )

range ( )

diff ( )

str(usedcars)

The str() command displays the internal structure of an R object. This function is an alternative to summary(). When using the str() function, only one line for each basic structure will be displayed.

Input:

Output:

'data.frame' : 150 obs . of 6 variables :

$ year : int 2011 2011 2011 2011 2012 2010 2011 2010 2011 2010 . . .

$ model : chr "SEL" "SEL" "SEL" "SEL" . . .

$ price : int 21992 20995 19995 17809 17500 17495 17000 16995 16995 16995 . . .

$ mileage : int 7413 10926 7351 11613 8367 25125 27393 21026 32655 36116 . . .

$ color : chr "Yellow" "Gray" "Silver" "Gray" . . .

$ transmission : chr "AUTO" "AUTO" "AUTO" "AUTO" . . .

summary(usedcars)

The summary() function is a basic function that issued to produce the result summary of various model functions.

Input:

Output:

year model price mileage

Min . : 2000 Length : 150 Min . : 3800 Min . : 4867

1st Qu . : 2008 Class : character 1st Qu . : 10995 1st Qu . : 27200

Median : 2009 Mode : character Median : 13592 Median : 36385

Mean : 2009 Mean : 12962 Mean : 44261

3rd Qu . : 2010 3rd Qu . : 14904 3rd Qu . : 55125

Max . : 2012 Max . : 21992 Max . : 151479

color transmission

Length : 150 Length : 150

Class : character Class : character

Mode : character Mode : character

In addition, you can print only one column of the used cars dataset. For example, lets complete a summary of only the year of the used cars.

Input:

Output:

Min . 1st Qu . Median Mean 3rd Qu . Max .

2000 2008 2009 2009 2010 2012

range()

The range() function returns a vector containing the maximum and minimum of all the given arguments.

Input:

Output:

In addition, you can use the diff() function on the range() function to return suitably lagged and iterated differences.

Input:

diff ( range ( usedcars $ price ) )

Output:

 Quantile Function of Probabilities

The quantile() function produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1.

spealial cases of statistics - quantiles

tertiles - three parts

quintiles - 5 parts

deciles - 10 parts

percentiles - 100 parts

The difference between q1 and q3 is known as Interquartile Range(IQR).

Input:

Output:

The probs parameter using methods to handle ties among values and data sets with no middle values.

Input:

quantile ( usedcars $ price , probs = c ( 0.01 , 0.99 ) )

Output:

1 % 99 %

5428.69 20505.00

seq()

The seq() function is used to generate vectors of evenly-spaced values.

Input:

quantile ( usedcars $ price , seq ( from = 0 , to = 1 , by = 0.20 ) )

Output:

0 % 20 % 40 % 60 % 80 % 100 %

3800.0 10759.4 12993.8 13992.0 14999.0 21992.0

Used Car Boxplots

The boxplot is for common visualization of the five-number summary. In addition, the boxplot produces box-and-whisker plot(s) of the given (grouped) values. Which you will see below, the median is the dark line in the plot

In addition, you can add extra parameters such as main and ylab to add a title to the figure and label the y-axis(vertical axis).

Boxplot of Used Car Prices

Input:

boxplot ( usedcars $ price ,

main = "Boxplot of Used Car Prices" ,

ylab = "Price ($)" )

Output:

Boxplot of used car prices with a median of total cost

Boxplot of Used Car Mileage

Input:

boxplot ( usedcars $ mileage ,

main = "Boxplot of Used Car Mileage" ,

ylab = "Odometer (mi.)" )

Output:

Boxplot to view the used car mileage with a median of mileage.

Used Car Histograms

Histograms are another way to graphically depict the spread of a numeric variable. Similar to a boxplot in a way that it divides the variables values into a predefined. Also, the number of portions called bins that act as containers for values.

Histogram of Used Car Mileage

Input:

hist ( usedcars $ price ,

main = "Histogram of Used Car Prices" ,

xlab = "Price ($)" )

Output:

Histogram to view used car prices.

 Histogram of Used Car Mileage

Input:

hist ( usedcars $ mileage ,

main = "Histogram of Used Car Mileage" ,

xlab = "Odometer (mi.)" )

Output:

Boxplot to view the used car mileage with a median of mileage.

 Table

The table() function uses the cross-classifying factors to build a contingency table of the counts at each combination of factor levels.

Input:

table ( usedcarsmodel )

prop . table ( model_table )

Output:

Black Blue Gold Gray Green Red Silver White Yellow

35 17 1 16 5 25 32 16 3

SE SEL SES

0.5200000 0.1533333 0.3266667

Black Blue Gold Gray Green Red Silver White Yellow

23.3 11.3 0.7 10.7 3.3 16.7 21.3 10.7 2.0

 Scatterplot

The scatterplot pairs up values of two quantitative variables in a data set and display them as geometric points inside a Cartesian diagram.

Input:

plot ( x = usedcars $ mileage , y = usedcars $ price ,

main = "Scatterplot of Price vs. Mileage" ,

xlab = "Used Car Odometer (mi.)" ,

ylab = "Used Car Price ($)" )

Output:

Scatterplot to visualize the price versus the mileage for each car

 Value Matching

Let's say you wanted a vehicle in a specific color and only wanted to return the colors that matched. The match returns a vector of the positions of (first) matches of its first argument in its second.

%in%

%in% is a more intuitive interface as a binary operator, which returns a logical vector indicating if there is a match or not for its left operand.

Input:

usedcars$conservative <- usedcars$color %in% c ( "Black" , "Gray" , "Silver" , "White" )

table ( usedcars$conservative )

Output:

As we can see from the above output, there are 99 cars that are TRUE for Black, Gray, Silver, and White. However, there are 51 cars that do not meet the color criteria of choice.

Posted by: redblues003.blogspot.com

Source: https://www.engineeringbigdata.com/used-cars-data-set-analysis-r/