Chapter 2 Data sources
The data source is taken from the official side of Citibike website. Each file contains monthly trip data, which we take from 2018-01 to 2020-10, total 34 months.
[data source link:] https://s3.amazonaws.com/tripdata/index.html
There are 15 features in the dataset.
trip duration in seconds.(int 789 1541 1464)
start time (chr “2020-01-01 00:00:55.3900”)
end time (chr “2020-01-01 00:14:05.1470”)
start station id (int 504 3423 3687)
start station name (chr “1 Ave & E 16 St”)
start station latitude (num 40.7 40.7 40.7)
start station longitude (num -74 -74 -74)
end station id (int 307 3300 259)
end station name (chr “Canal St & Rutgers St”)
end station latitude (num 40.7 40.7 40.7)
end station longitude (num -74 -74 -74)
bike id (int 30326 17105 40177)
user type (chr “Subscriber” “Customer”)
birth year (int 1992 1969 1963)
gender (int 1 1 1)
[You can use the R code to download our files:] https://github.com/ds-eda-viz/NYC-Bike-Share/blob/main/sql/download_raw_data.R