Midterm pt II

Questions will be listed on this page and also in the midterm.R script.

Answers and R code should be filled out in the R script midterm.R.

Every sentence or word in the R script that is not R code should be preceeded with the # symbol.

That means when I run your script, it won’t give me an error because it is not R code.

Do not comment out R code / queries if that is part of your answer.

I am going to run all the lines of code to see if the output is correct and answers the question.

1. Look at each of these data sets and explain why data set is superior to another

Two data sets with similar topics. Traffic stop data in San Diego and in Little Rock.

Make the case that one data set is better than another.

Provide 3 concrete examples of stories that are possible to research and tell because of the better data set’s structure.

This is the only question in which you will be allowed to browse outside this page

  • San Diego Police Vehicle Stops - link
  • Little Rock Traffic Violations - link

2. Import this data set

Assign it the name parkingtix

http://andrewbatran.com/ccsu-2017/assets/data/2013.csv

This is a data set from New York City of every parking ticket issued in 2013.


3. Load the dplyr package


4. What are the column names of the imported data frame?


5. What are the first 5 rows of the imported data frame?


6. Complete this query

What is the structure of the data frame?

...(parkingtix)

7. Translate these lines of code into human language

parkingtix %>%
summarize(count=n())

8. What’s wrong with my code?

Which are the 5 most-ticketed types of cars in New York City?

What am I doing wrong? Fix the query below.

parkingtix %>%
groupby(Vehicle_Make) %>%
summarize(total=n()) %>%
arrange(total) %>%
head(5)

9. Fix the Fine column so it’s numeric

Use the gsub() function and the as.numeric() function.


10. What are the top 10 most violations?


11. Which Boro were Blue cars ticketed the most?


12. What are the top 5 violations that generated the most sum of money for the city?


13. How many different (unique) parking violations are there?


14. What are the 4 most-expensive parking violation fines?


15. What are the 4 cheapest parking violation fines?


16. Create a dataframe called avgboro that has the average and total amount of fines per Boro


17. What year car was the most ticketed in Manhattan and Brooklyn?


18. What’s the percent breakdown between parking tickets among all the Boros?


19. How many tickets were issued and how many distinct vehicles were ticketed by Boro?


20. What was the most common violation for each Boro?