About this deal
Data cleaning refers to the process of transforming raw data into data that is suitable for analysis or model-building. Note that you could also replace median in the formula with mean to instead replace missing values with the mean value of each column.
Now that tidyverse is loaded into memory, take a “glimpse” of the Brooklyn dataset: glimpse(brooklyn) ## Observations: 20,185 If you are new to R and the tidyverse, we recommend starting with the Dataquest Introduction to Data Analysis in R course. This is the first course in the Dataquest Data Analyst in R path. Notice that the second row has been removed from the data frame because each of the values in the second row were duplicates of the values in the first row.
Clean Cocktails
Take the column names from the NYC_property_sales data frame, and then update all column names to replace all spaces with underscores, and then update all column names to lower case.
The principles of tidy data provide a standard way to organise data values within a dataset. A standard makes initial data cleaning easier because you don't need to start from scratch and reinvent the wheel every time.GROSS SQUARE FEET (i.e. the size of the property) is of type “double”, which part of the “numeric” class in R.