Budgeting and planning is fun! Am I crazy???

Please tell me that you think budgeting and planning are fun too! This article is NOT going to be all about finance acronyms. When I went through Bynder’s budget process last year, there were a lot…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Used Car Data Set Modeling

Global sales of automobiles are forecast to fall to just under 64 million units in 2020, down from a peak of almost 80 million units in 2017

Driving a car is important for people in general because it provides status and the opportunity for personal control and autonomy. In sparsely populated areas, owning a car is even more important, since it provides the only opportunity for travelling long distances due to a lack of public transport. Global sales of automobiles are forecast to fall to just under 64 million units in 2020, down from a peak of almost 80 million units in 2017. This blog post can help the public if they want to sell their vehicles in different ways because this shows the reduction of price with vehicle’s condition. Each factor has its own effect on the vehicle price. World car sale trend:

Problem Statement

Which attribute contribute the most towards determining the car price?

Goal

Our goal is to make a model from which we can predict the attributes affecting the Used Car price.

Mini-problems(Sub-questions):

Understanding the German used car market:

This statistic shows the revenue of the used car market in Germany between 2000 and 2019. In 2019, the revenue amounted to roughly 89.73 billion euros. The biggest opportunity for growth of used car industry in Germany is that buying a used car is a more affordable alternative to purchasing a new car. When new cars are comparatively affordable, the appeal of used cars diminishes, which reduces industry sales. Likewise, as new car prices increase, demand for used cars grows. The price of new cars is expected to marginally increase in 2021, representing a potential opportunity for the industry.

Understanding the Data set:

Description:

Features:

From the total columns give, we remove the unnecessary columns which contain the data which was not important for the vehicle’s description. The main features we focus on were:

Our main focus is to clean the Data set and remove the garbage values. The given Data set is from Germany and it contains some words not from English dictionary. First step is to remove the Null values from this data set. So what we did to remove them is if the null values in the data set are less than 10% of the total data, remove them. if they are larger than this, edit them to a average range of data. Like there is a column which contain almost 80% null values of data set and it was unnecessary too, so we removed it.

Variation in price with Vehicle Type

Type of vehicle affects its price. The Convertible and Coupe have higher prices as compared to the other types because these cars are rare and popular since the starting age of the vehicles. The German market shows the variation of price with its vehicle type and it represent like this:

Above graph shows the variation of price according to the type of vehicle but if we see the average prices of all vehicles, the SUV’s are on the top because of there more resale values. It’s not necessary that a rare and expensive car has a good resale value.

Variation in price by Brand:

Vehicle Brand is another important factor which decides the price of vehicle. The main manufacturers in Germany are Audi, Porsche, BMW, and Mercedes. It means that these brands will have high priced vehicles. Below is the graph shows the Variation of vehicle price by its brand.

The graph below shows the average prices of vehicle brands. Porsche is on top because rarity and popularity of the brand. The vehicles with low price belongs to commonly used brands. There can be reason that these prices are not as same as the original used car price because these are the averages prices and the models of car range from 1980’s to 2019.

Effect of Covered Distance on price:

The factor which affects the price of a used vehicle is the covered distance. More a car drives, more its price reduces. Distance covered affects the engine of a car, so it directly affects the price of it. The average distance covered which doesn’t have serious effect on car price ranges from 10,000 to 15,000 kilometres. As the graph shown below, the price of car starts dropping after the covered distance is more than 15,000 kilometres.

Average price of car with Gearbox Configuration:

Gearbox configuration is an important factor for which people look for while buying a car, so it affects the car price. Automatic gear boxes are more complex than manuals, and you’ll generally find that automatic cars are more upmarket. This can also mean insurance premiums are often higher for automatic car, as the repairs involved are more expensive. The graph below shows the comparison of average car’s prices with gearbox configuration.

Average price of Vehicle with Fuel Type:

Fuel type of a car changes its price. According to a MIT center for energy and environmental policy research, “Germany imposes ownership taxes based on the vehicle’s engine size and carbon-dioxide emissions. These taxes vary quite substantially across vehicles within countries. For example, the purchase tax can be several thousand euros higher for a vehicle with very low fuel economy than for a vehicle with very high fuel economy. The taxes also vary by an order of magnitude across countries.” The graph below shows the average price of vehicle with its fuel type. Hybrid is one of the most expensive cars because of its engine mechanism.

Effect of postal code on price:

Postal code is one those factor which affects the price of vehicle sometimes and sometimes not. The rural areas have specific type of vehicles like trucks, mountain areas have jeeps and SUV, and cities have mostly every type of vehicle. So the price of vehicle varies from place to place sometimes. The graph below shows the relation of price with postal code.

Correlation of Data set:

This is a correlation graph of the whole data set which shows the relation of all attributes. Some of them might not have correlation, but some factors have important correlation.

Process:

Through the machine Learning, we aim to distinguish the car attribute which contribute the most towards car price.

For machine learning part, we have to convert data types of all columns to integer as we use sklearn to create our models and sklearn library does not work with other data types. We have sklearn’s built in LabelEncoder that allows to change datatype to integer. First we check which columns have non-integer datatype.

Now when all datatypes are in the form that we want, we then divide our dataframe into feature set and target variable and convert them to numpy array, as SkLearn exclusively works with array-like objects.

Once all the data is in numpy array form, the only thing left before applying our regression models is to split it into train and testdata and apply scaling to standardize our train data.

We then train three different regression models on our training data to get variety of answers.

First we apply Linear regression model to the training data and get feature importance for different features. We get the following result, which clearly indicates that kilometer driven by the car play the most important role in determining car price:

2. DecisionTreeRegressor:

Then we get Decisiontreeregressor algorithm to decide the feature importance and get the following results, which shows that registration year plays the most significant role.

3. RandomForestRegressor:

At last, we train RandomForestRegressor model on our dataset and again got registration year as the most significant attribute in deciding second hand car price.

So, to make final call about which algorithm answer to choose as the final, we then did each model’s prediction on the test data to find the Mean Squared Error(MSE), and the results are:

Since Random forest gives the least MSE on test data. We consider the registration year as the most important factor in deciding the price of the used car.

Add a comment

Related posts:

Web Application in 10 minutes with Streamlit

Creating Web Application for Data Science or Machine Learning projects with Streamlit in Python in just 15-20 lines of code.

Build native apps with GraalVM

GraalVM is a polyglot virtual machine that provides a shared runtime to execute applications written in multiple languages like Java, C, Python, and JavaScript. GraalVM also provides a Native Image…

What is paleo food

A Thai-inspired shrimp salad has got to be one of the most refreshing and satisfying meals out there. When you order it in a restaurant, the dressing tends to be loaded with sugar, so I made this…