Chapter 2 Data sources

The data source is taken from the official side of Citibike website. Each file contains monthly trip data, which we take from 2018-01 to 2020-10, total 34 months.

[data source link:] https://s3.amazonaws.com/tripdata/index.html

There are 15 features in the dataset.

  1. trip duration in seconds.(int 789 1541 1464)

  2. start time (chr “2020-01-01 00:00:55.3900”)

  3. end time (chr “2020-01-01 00:14:05.1470”)

  4. start station id (int 504 3423 3687)

  5. start station name (chr “1 Ave & E 16 St”)

  6. start station latitude (num 40.7 40.7 40.7)

  7. start station longitude (num -74 -74 -74)

  8. end station id (int 307 3300 259)

  9. end station name (chr “Canal St & Rutgers St”)

  10. end station latitude (num 40.7 40.7 40.7)

  11. end station longitude (num -74 -74 -74)

  12. bike id (int 30326 17105 40177)

  13. user type (chr “Subscriber” “Customer”)

  14. birth year (int 1992 1969 1963)

  15. gender (int 1 1 1)

[You can use the R code to download our files:] https://github.com/ds-eda-viz/NYC-Bike-Share/blob/main/sql/download_raw_data.R