In this report, a year's dataset has been analyzed with the help of the R programming language in order to determine the sales and predict the regular customers based on the dataset. Here, several analyses have been done and data visualization is taken place. To get the proper data here, a total of three years' data has been collected and defined properly. In this case, for each day a single dataset has been provided and in total, 1073 CSV files are available, which contain all the details of the customers, their orders, as well as the quantity they purchase from the supermarket. Here, for the analysis purpose, three CSV files of the same date have been taken, and an analysis has been done. Some data visualization is done in order to meet the research criteria and meet all the requirements of the research question. Here several transactional datasets have been given based on three years, which are 2013, 2014, and 2015. Based on these three years of transactional information related to supermarkets has been analyzed to develop suitable results.
Get free samples written by our Top-Notch subject experts for taking online Assignment Helper services.
In this research process, customer analysis has been done based on some transactional datasets. There are three-year details of transactions have been given in the given datasets. In this research process total of nine datasets have been used to assess the details process of analysis. Here, R code language has been used to execute the analysis process based on this dataset that is given dataset. Here three years of date-wise information has been given in the dataset. All details that are given in the datasets are supermarket-related information. All details have been mentioned in the details analysis process. All types of activity have been done on this given dataset. Within this large dataset, nine data sets, three for each year, have been chosen to execute the entire research process. All types of supermarket transactional details have been mentioned in the given dataset. All details information has been given in the datasets based on each day. Here, holiday details have been missing. Details have been mentioned in this research process, all types of activity have been databased on this dataset, all types of activity has been done in R coding language, where linear regression will be done to meet the requirements (Kumari and Yadav, 2018). In this research process, first, three days of data have been taken from the huge dataset.
Here 2013, 2014, and 2015 transactional details of the supermarket. The data set that contains information for 2013 2nd January has been described in detail. Here, sales date, time, receipt number, customer number, and other transactional details have been mentioned in this dataset. The dataset also contains information on total sales, and total records based on whether this commodity there has any offer or not. In this dataset, both quantitative and qualitative values are present. Hence in R code data set preprocessing has been needed. All type of commodity details has been mentioned. After analyzing the dataset it is understood the maximum commodity does not contain any offer. There are very few commodities that have offers. Here all the details are not in proper form hence the dataset need to convert into a suitable form. here a large number of the dataset that contains information related to transactional details has been mentioned. Here each dataset contains the same information about transactions in the supermarket. Here details analysis has been done to understand the customer analytics of an organization. Barcode details for each product have been mentioned in the given dataset. In 2014 dataset contains information about the offer that is given based on the type of commodity. Here maximum commodities do not contain any offer. In this research process, the total number of purchase and total money spent on each purchase has been calculated. The dataset contains both numerical and string information. This dataset also contains various imperfect information this data set needs to convert into a suitable form to execute the entire research process. The regression analysis has been done on this given dataset to develop a suitable result (Maulud. and Abdulazeez, 2020). Data set contain information about customer purchases of supermarkets. Clustering analysis and neural network analysis have been executed in\ this research process. All type of activity has been done by the R studio software platform.
Get assistance from our PROFESSIONAL ASSIGNMENT WRITERS to receive 100% assured AI-free and high-quality documents on time, ensuring an A+ grade in all subjects.
Discussions
In order to meet the requirements, linear regression will be done where all the datasets taken will be used and the proper accuracy score will be evaluated (Liu et al., 2019). The research questions have been answered through data visualization in all three-year supermarket datasets and regression analysis has been done over the target column offer based on the value “total-sale-amount-inclusive GST”. The developed result can calculate higher profit rates for the organization.
Data visualization comes with the transformation of data into a better understandable format as it identifies outliers, trends, and patterns between a set of data attributes. In this data visualization process, different charts, graphs, and mapped data have been generated that easily describe the patterns in these large data attributes. After merging all the data into one data frame visualization process has been implemented as described above. Various packages have been imported to execute this entire research process.
The above figure describes the margin process of datasets. Here, three different year datasets have been given based on day basis transactional details. Within this large informational set, nine datasets have been chosen to execute regression analysis. The above figure describes the details data set reading in the R Studio platform.
The above figure describes the data concatenation where all three large datasets have been merged into a single dataset. Visualization has been done over this merge that provides a comparative analysis between the data attributes.
Data processing is an important process in this analysis that has been mentioned here. In this preprocessing, null value dropping and numerical dataset conversion has been done by using R coding. Here "is. null" comment is used to check all null values that are present in the datasets.
The above pictorial representation delivers the merged data charts, which it show all the data values for the particular attributes. Here data concatenation has been done.
The above figure shows the categorical value conversion from string to numerical value as it produces better visualization at the time of prediction. The offer column in the large dataset has been transformed for proper visualization based on these data attributes.
The above figure displays the histogram plot that has been generated based on the 2014 transactional dataset.
Here histogram plot has been displayed that was developed based on three transactional datasets. Here customer-based histogram plot has been developed using R coding.
The above figure shows the total state amount, including all three years, based on the three transactional datasets.
This is the scatter plot that has been obtained with the help of the data, where the offer and the quantity have been taken into consideration
The above figure describes the scatter plot, where all the data for the year 2014 are taken into consideration, and an evaluation is done.
The above figure shows the mean, median, and maximum value generation based on these datasets. This dataset has been used here to develop a better output result.
In this figure, the actual price is compared with the predicted price, where all the related prices are taken into consideration
The above figure shows the linear regression process that has been executed on the R studio platform. The regression analysis has been executed based on the transactional dataset.
The above figure shows the linear regression graph through a red regression line that shows the possible changes over prediction. The relationship between these two variables has been mapped through this graphical representation.
Conclusion
In this project, the market analysis is done where a large organization is taken into consideration and all the data is evaluated in the R programming language. Regression analysis between the offer and the total sales amount has been mapped through a graphical representation. Based on the data visualization trends, outliers and patterns have been identified. The data of 2013, 2014, and 2015 is taken into consideration and a process is done and various visualizations is being done in addition to this. Logical regression is done in order to get the accuracy and prediction for the customer rate of the supermarket.
Reference
1.0 Intoduction - Job Resources, Demand, and Their Impact on Anxiety and Burnout in Health Care Workers Assignment...View and Download
Introduction to Management in Healthcare Assignment Question 1 The organization to be discussed is a multi-specialty...View and Download
Introduction Get free samples written by our Top-Notch subject experts for taking online Assignment...View and Download
Introduction: Entrepreneurship and Small Business Management Struggling with entrepreneurship assignments? Assignment Help...View and Download
Explaining How I have Developed as a Learner During This Module Developing as a learner is essential for academic success. This...View and Download
Introduction Get free samples written by our Top-Notch subject experts for taking online Assignment Help services. A...View and Download