context
a supermarket is a self-service shop offering a wide variety of food, beverages and household products, organized into sections. this kind of store is larger and has a wider selection than earlier grocery stores, but is smaller and more limited in the range of merchandise than a hypermarket or big-box market. in everyday usage, however, "grocery store" is synonymous with supermarket, and is not used to refer to other types of stores that sell groceries.
content
in the dataset, you'll get data of different stores of a supermarket company as per their store ids which for ease has been converted to positive integers.
store id: (index) id of the particular store.
store_area: physical area of the store in yard square.
items_available: number of different items available in the corresponding store.
daily_customer_count: number of customers who visited to stores on an average over month.
store_sales: sales in (us $) that stores made.
what I have done
dataset was downloaded from kaggle.com.
link from where the dataset was downloaded: https://www.kaggle.com/datasets/surajjha101/stores-area-and-sales-data.
the link to the analysis and prediction: https://github.com/akshitanchan/ds-ml-notebooks/blob/main/sales-analysis-visualisation-and-prediction/sales-analysis-visualisation-and-prediction.ipynb.
in this python environment, a concise analysis and linear regression model development for a dataset related to stores, their areas, available items, daily customer count, and store sales have been presented. the process includes data loading, exploration, preprocessing, visualization, and the creation of a linear regression model. the analysis involves checking for null values, dropping irrelevant columns, exploring descriptive statistics and skewness, visualizing feature distributions, creating joint plots, heatmaps, and pair plots, scaling data, training a linear regression model, and evaluating its performance. the results are summarized with key metrics incl. mean absolute error (mae), mean squared error (mse), root mean squared error (rmse), r-squared, root mean squared log error (rmsle), mean absolute percentage error (mape), and adjusted r squared. the visualizations include scatter plots with regression lines using seaborn and plotly express, providing insights into predicted versus actual sales.