visit
1546297300.400974
1546297344.968518
1546297402.914051
1546297466.029219
1546353011
1546356642
1546439450
1546443000
fmt = "YYYY-MM-dd hh:mm:ss"
col = f.col('ts').cast(TimestampType())
formatter = f.date_format(col, fmt)
data = data.withColumn('datetime', formatter)
data = data.withColumn('datetime', formatter)
data = data.withColumn('year', f.year('datetime'))
data = data.withColumn('month', f.month('datetime'))
data = data.withColumn('day', f.dayofmonth('datetime'))
data = data.withColumn('hour', f.hour('datetime'))
data = data.withColumn('minute', f.minute('datetime'))
+----------+-------------------+----+-----+---+----+------+
| ts| datetime|year|month|day|hour|minute|
+----------+-------------------+----+-----+---+----+------+
|1546300800|2019-01-01 12:00:00|2019| 1| 1| 12| 0|
|1546304400|2019-01-01 01:00:00|2019| 1| 1| 1| 0|
|1546308000|2019-01-01 02:00:00|2019| 1| 1| 2| 0|
|1546311600|2019-01-01 03:00:00|2019| 1| 1| 3| 0|
|1546315200|2019-01-01 04:00:00|2019| 1| 1| 4| 0|
|1546318800|2019-01-01 05:00:00|2019| 1| 1| 5| 0|
|1546322400|2019-01-01 06:00:00|2019| 1| 1| 6| 0|
|1546326000|2019-01-01 07:00:00|2019| 1| 1| 7| 0|
|1546329600|2019-01-01 08:00:00|2019| 1| 1| 8| 0|
|1546333200|2019-01-01 09:00:00|2019| 1| 1| 9| 0|
|1546336800|2019-01-01 10:00:00|2019| 1| 1| 10| 0|
|1546340400|2019-01-01 11:00:00|2019| 1| 1| 11| 0|
|1546344000|2019-01-01 12:00:00|2019| 1| 1| 12| 0|
|1546347600|2019-01-01 01:00:00|2019| 1| 1| 1| 0|
|1546351200|2019-01-01 02:00:00|2019| 1| 1| 2| 0|
|1546354800|2019-01-01 03:00:00|2019| 1| 1| 3| 0|
|1546358400|2019-01-01 04:00:00|2019| 1| 1| 4| 0|
|1546362000|2019-01-01 05:00:00|2019| 1| 1| 5| 0|
|1546365600|2019-01-01 06:00:00|2019| 1| 1| 6| 0|
|1546369200|2019-01-01 07:00:00|2019| 1| 1| 7| 0|
+----------+-------------------+----+-----+---+----+------+
fmt = "YYYY-MM-dd hh:mm:ss"
Do you see it? Right, these two lowercase letters h. Their meaning is not what you could expect. This time formatter returns you time in o'clock format, i.e., in the range from one to twelve! It is useful when you need to display your time in AM/PM format but not something one would like to see during data analysis.
Beware that the lowercase h in the timestamp means time o'clock, while the uppercase H is from 0 to 23.
The solution is simple: replace these misleading lowercase letters with the uppercase H, and you'll get your 0..23 range, as the following snippet shows.
fmt = "YYYY-MM-dd HH:mm:ss"
col = f.col('ts').cast(TimestampType())
formatter = f.date_format(col, fmt)
data = data.withColumn('datetime', formatter)
data = data.withColumn('year', f.year('datetime'))
data = data.withColumn('month', f.month('datetime'))
data = data.withColumn('day', f.dayofmonth('datetime'))
data = data.withColumn('hour', f.hour('datetime'))
data = data.withColumn('minute', f.minute('datetime'))
data.show()
+----------+-------------------+----+-----+---+----+------+
| ts| datetime|year|month|day|hour|minute|
+----------+-------------------+----+-----+---+----+------+
|1546300800|2019-01-01 00:00:00|2019| 1| 1| 0| 0|
|1546304400|2019-01-01 01:00:00|2019| 1| 1| 1| 0|
|1546308000|2019-01-01 02:00:00|2019| 1| 1| 2| 0|
|1546311600|2019-01-01 03:00:00|2019| 1| 1| 3| 0|
|1546315200|2019-01-01 04:00:00|2019| 1| 1| 4| 0|
|1546318800|2019-01-01 05:00:00|2019| 1| 1| 5| 0|
|1546322400|2019-01-01 06:00:00|2019| 1| 1| 6| 0|
|1546326000|2019-01-01 07:00:00|2019| 1| 1| 7| 0|
|1546329600|2019-01-01 08:00:00|2019| 1| 1| 8| 0|
|1546333200|2019-01-01 09:00:00|2019| 1| 1| 9| 0|
|1546336800|2019-01-01 10:00:00|2019| 1| 1| 10| 0|
|1546340400|2019-01-01 11:00:00|2019| 1| 1| 11| 0|
|1546344000|2019-01-01 12:00:00|2019| 1| 1| 12| 0|
|1546347600|2019-01-01 13:00:00|2019| 1| 1| 13| 0|
|1546351200|2019-01-01 14:00:00|2019| 1| 1| 14| 0|
|1546354800|2019-01-01 15:00:00|2019| 1| 1| 15| 0|
|1546358400|2019-01-01 16:00:00|2019| 1| 1| 16| 0|
|1546362000|2019-01-01 17:00:00|2019| 1| 1| 17| 0|
|1546365600|2019-01-01 18:00:00|2019| 1| 1| 18| 0|
|1546369200|2019-01-01 19:00:00|2019| 1| 1| 19| 0|
|1546372800|2019-01-01 20:00:00|2019| 1| 1| 20| 0|
|1546376400|2019-01-01 21:00:00|2019| 1| 1| 21| 0|
|1546380000|2019-01-01 22:00:00|2019| 1| 1| 22| 0|
|1546383600|2019-01-01 23:00:00|2019| 1| 1| 23| 0|
+----------+-------------------+----+-----+---+----+------+
If you're interested in some other posts where I write about Machine Learning and Python language in general, or you would like to reach me out, check out where the references to other stories and projects are listed.
References