Python ce super tableur grâce à Pandas

Matthieu Falce

20 février 2019

Au programme

  • analyse de données ?
  • tableur vs langage
  • présentation de pandas
  • analyse de données
    • contexte
    • qualité
    • comment répondre à des questions métier ?
  • conclusion

But de la présentation

  • pourquoi python / pandas + Matthieu = 😍
  • tutoriel basique sur pandas
  • voir comment pré-traiter des données rapidement
  • réfléchir sur la donnée en général

Qui suis-je ?

  • Lillois depuis 4-5 ans
    • originaire de Monaco (vous verrez le rapport très vite)
  • freelance python / data science / web fullstack (2 ans)
    • conseil / développement
    • formation
    • financement de l'innovation
  • coorganisateurs des meetups python et datascience
    • meetup data science le 28 février : Tableau et Alteryx pour la BI
    • meetup python le 14 mars : Python et Data
    • bientôt (mi / fin mars) organisation de masterclass deep learning

Analyse de données

  • suite d'étape pour extraire de l'information d'une "source"
    • récupération
    • nettoyage
    • extraction d'information (modélisation)
    • amélioration

Tableur vs langage de programmation

Tableur

Tableur

  • fichiers de données tabulaires
  • en 2d / manipulation graphique des données
  • nettoyage / exploration / traitement des données
  • visualisation
  • besoin métier

Utilisable par "tout le monde"

Langage de programmation

Langage de programmation

def parse(filename):
    e = xml.etree.ElementTree.parse(filename).getroot()
    total = {}
    total["totalPresent"] = e.find("totalPresent").text
    total["totalLibre"] = e.find("totalLibre").text

    parcs = {}
    # XPATH notation
    # https://fr.wikipedia.org/wiki/XPath
    for parc in e.findall("Quartier/Parc"):
        nom = parc.find("libelleParc").text
        try:
            libre = int(parc.find("placesLibresParc").text)
        except ValueError:
            # certaines cases ont écrit obsolete...
            libre = np.nan
        parcs[nom] = libre

    date = parser.parse(parc.find("placesLibresUpdate").text)

    return date, total, parcs

Langage de programmation

  • données complexes (arborescentes...)
  • textuel
  • toutes les opérations sont possibles
  • gestion de la compléxité

Utilisable par "les spécialistes" (courbe d'apprentissage / gestion IT)

Complémentaires

  • rajouter un côté programmation aux tableurs => macros
    • automatisation des tâches
    • partage des pratiques
  • rajouter un côté tableur aux langages => dataframes
    • R / Python + pandas
    • facilite la manipulation des données dans le langage

Pandas / Dataframe

Ecosysteme calcul scientifique

In [357]:
import pandas as pd

d = {'col1': [1, 2, 3, 4, 5], 'col2': [6, 7, 8, 9, 10], 'col3':[11, 12, 13, 14, 15]}
df = pd.DataFrame(data=d)
df 
Out[357]:
col1 col2 col3
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15
In [359]:
# opérations vectorielles / matricielles
#df * 3 
df * [1, 2, 3]
Out[359]:
col1 col2 col3
0 1 12 33
1 2 14 36
2 3 16 39
3 4 18 42
4 5 20 45

Axes

  • Détermines la façon dont sont effectuées certaines actions (sommes...)
+------------+---------+--------+
|            |  A      |  B     |
+------------+---------+---------
|      0     | 0.626386| 1.52325|----axis=1----->
+------------+---------+--------+
             |         |
             | axis=0  |
             ↓         ↓
In [361]:
axis = 1
print("axis : ", axis) 
print(df.sum(axis=axis))
axis :  1
0    18
1    21
2    24
3    27
4    30
dtype: int64
In [369]:
### sélection des colonnes
df.col1  
type(df.col1)
df.col1.index
df.col1.values

# sélection des lignes
df.iloc[[0]]
Out[369]:
col1 col2 col3
0 1 6 11

Séries temporelles (timeseries)

  • évolution d'une quantité au cours du temps
  • pandas les gère particulièrement bien
In [373]:
# création
date_range = pd.date_range('2011-01-01', '2011-02-01', freq='W')
valeurs = [i**2/5 for i in range(len(date_range))]
valeurs[3] = numpy.nan
 
time_serie = pd.Series(valeurs, index=date_range)
time_serie  # 2011-01-09     0.2

# indexing
time_serie["2011-01-06":"2011-01-25"]

# reindexing / interpolation
time_serie = time_serie.resample('D').mean()
time_serie = time_serie.interpolate(how='linear')
time_serie  # 2011-01-09     0.2 
Out[373]:
2011-01-02    0.000000
2011-01-03    0.028571
2011-01-04    0.057143
2011-01-05    0.085714
2011-01-06    0.114286
2011-01-07    0.142857
2011-01-08    0.171429
2011-01-09    0.200000
2011-01-10    0.285714
2011-01-11    0.371429
2011-01-12    0.457143
2011-01-13    0.542857
2011-01-14    0.628571
2011-01-15    0.714286
2011-01-16    0.800000
2011-01-17    0.971429
2011-01-18    1.142857
2011-01-19    1.314286
2011-01-20    1.485714
2011-01-21    1.657143
2011-01-22    1.828571
2011-01-23    2.000000
2011-01-24    2.171429
2011-01-25    2.342857
2011-01-26    2.514286
2011-01-27    2.685714
2011-01-28    2.857143
2011-01-29    3.028571
2011-01-30    3.200000
Freq: D, dtype: float64

Données

  • open data sur des parkings
  • les places de parkings libres et occupés à Monaco sont disponibles en ligne

Contexte -- Monaco

  • ville / état
  • 2 km$^2$
  • "densément peuplé"
  • grosse quantité de travailleurs pendulaires (bus / train / voiture)
  • parking quasi exclusement souterrain

Carte du pays

Carte des quartiers

Carte des parkings

  • 38 parkings
  • 10 000 places proposées

Présentation des données

  • XML récupéré toutes les 5 min

Présentation des données

  • information redondantes
  • potentiellement non synchronisées

Place à l'analyse

In [374]:
# imports
import numpy
import pandas
import matplotlib.pyplot as plt

from read_data import get_dataframe, parse, _get_dataframe
In [261]:
# commande magique qui permet de manipuler les graphiques dans le notebook
%matplotlib notebook 

Détails des fonctions

  • utiliser ? pour avoir la documentation d'une fonction
  • utiliser ?? pour afficher le code d'une fonction
In [375]:
get_dataframe??
#_get_dataframe??
#parse??

Récupération des données

Il y a un mécanisme de cache mis en place pour aller plus vite.

In [378]:
df_places_libres = get_dataframe()
df = df_places_libres
df
using ./data/mars.feather stored dataframe
Out[378]:
ABBAYE ANNONCIADE ATHENA BOSIO C.C.F. CASINO CHPG 1 (HAUT) CHPG 2 (BAS) CONDAMINE COSTA ... REGULATION BUS ROQUEVILLE SQUARE GASTAUD ST ANTOINE ST CHARLES ST LAURENT ST NICOLAS STADE TESTIMONIO VISITATION
date
2018-12-31 23:58:45.513715 11.0 61.0 0.0 0.0 163.0 0.0 0.0 225.0 0.0 0.0 ... 1.0 16.0 0.0 36.0 1.0 0.0 68.0 139.0 0.0 20.0
2019-01-01 00:03:53.162553 11.0 61.0 0.0 0.0 161.0 0.0 0.0 225.0 0.0 0.0 ... 3.0 16.0 0.0 36.0 1.0 0.0 68.0 139.0 0.0 20.0
2019-01-01 00:08:55.503312 11.0 61.0 0.0 0.0 160.0 0.0 0.0 225.0 0.0 0.0 ... 3.0 16.0 0.0 36.0 1.0 0.0 69.0 137.0 0.0 20.0
2019-01-01 00:13:20.806756 11.0 61.0 0.0 0.0 160.0 0.0 0.0 225.0 0.0 0.0 ... 3.0 16.0 0.0 36.0 0.0 0.0 69.0 138.0 0.0 20.0
2019-01-01 00:18:25.333208 11.0 62.0 0.0 0.0 158.0 0.0 0.0 225.0 0.0 0.0 ... 3.0 16.0 0.0 36.0 1.0 0.0 69.0 138.0 0.0 20.0
2019-01-01 00:23:23.809532 11.0 62.0 0.0 0.0 152.0 29.0 0.0 223.0 3.0 0.0 ... 3.0 16.0 0.0 36.0 1.0 0.0 69.0 137.0 0.0 20.0
2019-01-01 00:28:27.377364 11.0 62.0 0.0 0.0 161.0 40.0 0.0 224.0 15.0 4.0 ... 3.0 17.0 2.0 36.0 6.0 0.0 68.0 136.0 0.0 20.0
2019-01-01 00:33:22.104059 11.0 64.0 0.0 0.0 176.0 46.0 0.0 224.0 19.0 6.0 ... 3.0 17.0 9.0 36.0 8.0 5.0 68.0 136.0 0.0 20.0
2019-01-01 00:38:29.046510 10.0 64.0 0.0 2.0 185.0 73.0 0.0 224.0 34.0 5.0 ... 3.0 17.0 10.0 36.0 14.0 8.0 68.0 136.0 0.0 20.0
2019-01-01 00:43:29.351093 10.0 65.0 0.0 4.0 200.0 79.0 0.0 224.0 41.0 7.0 ... 3.0 16.0 14.0 37.0 22.0 12.0 68.0 140.0 0.0 20.0
2019-01-01 00:48:36.611024 10.0 65.0 0.0 4.0 217.0 80.0 0.0 228.0 46.0 8.0 ... 3.0 16.0 14.0 37.0 29.0 16.0 68.0 142.0 0.0 20.0
2019-01-01 00:53:36.516809 9.0 66.0 0.0 6.0 241.0 81.0 0.0 230.0 55.0 5.0 ... 3.0 16.0 19.0 37.0 40.0 19.0 69.0 148.0 0.0 20.0
2019-01-01 00:58:38.622161 9.0 66.0 0.0 7.0 252.0 92.0 0.0 230.0 62.0 9.0 ... 3.0 18.0 15.0 37.0 52.0 22.0 69.0 152.0 0.0 20.0
2019-01-01 01:03:39.326647 9.0 67.0 0.0 7.0 269.0 93.0 0.0 230.0 74.0 11.0 ... 3.0 18.0 17.0 38.0 62.0 25.0 69.0 155.0 0.0 20.0
2019-01-01 01:08:57.683416 9.0 68.0 0.0 7.0 280.0 111.0 0.0 230.0 78.0 11.0 ... 3.0 19.0 12.0 39.0 66.0 26.0 69.0 157.0 0.0 20.0
2019-01-01 01:14:00.256048 7.0 70.0 0.0 11.0 289.0 116.0 0.0 231.0 86.0 13.0 ... 3.0 21.0 12.0 39.0 71.0 31.0 69.0 158.0 0.0 20.0
2019-01-01 01:18:09.696105 6.0 70.0 0.0 11.0 298.0 112.0 0.0 230.0 87.0 13.0 ... 3.0 23.0 14.0 39.0 75.0 31.0 70.0 159.0 0.0 20.0
2019-01-01 01:23:13.187660 6.0 72.0 0.0 11.0 308.0 113.0 0.0 230.0 91.0 13.0 ... 3.0 25.0 16.0 40.0 84.0 30.0 70.0 163.0 0.0 20.0
2019-01-01 01:28:12.490395 6.0 72.0 0.0 12.0 316.0 127.0 0.0 230.0 97.0 16.0 ... 3.0 28.0 20.0 41.0 93.0 29.0 70.0 167.0 0.0 20.0
2019-01-01 01:33:15.772777 6.0 73.0 0.0 14.0 334.0 131.0 0.0 230.0 107.0 17.0 ... 3.0 29.0 24.0 41.0 103.0 28.0 71.0 168.0 7.0 20.0
2019-01-01 01:38:10.126652 6.0 74.0 0.0 16.0 338.0 139.0 0.0 229.0 114.0 19.0 ... 3.0 31.0 26.0 42.0 109.0 28.0 71.0 172.0 7.0 20.0
2019-01-01 01:43:15.472919 6.0 77.0 0.0 18.0 352.0 146.0 0.0 233.0 120.0 20.0 ... 3.0 33.0 26.0 42.0 114.0 31.0 71.0 175.0 8.0 20.0
2019-01-01 01:48:16.924851 5.0 77.0 0.0 18.0 360.0 146.0 0.0 233.0 127.0 22.0 ... 3.0 33.0 27.0 42.0 120.0 34.0 71.0 182.0 8.0 20.0
2019-01-01 01:53:17.641957 5.0 77.0 0.0 20.0 369.0 146.0 0.0 233.0 131.0 25.0 ... 3.0 33.0 32.0 42.0 125.0 35.0 71.0 183.0 8.0 20.0
2019-01-01 01:58:43.727033 5.0 78.0 0.0 21.0 374.0 148.0 0.0 236.0 135.0 25.0 ... 3.0 35.0 37.0 42.0 130.0 34.0 72.0 184.0 8.0 20.0
2019-01-01 02:03:43.208523 5.0 79.0 0.0 23.0 383.0 156.0 0.0 235.0 140.0 26.0 ... 3.0 36.0 40.0 42.0 136.0 35.0 72.0 188.0 8.0 20.0
2019-01-01 02:08:52.602849 5.0 79.0 0.0 1.0 397.0 152.0 0.0 235.0 143.0 25.0 ... 3.0 37.0 46.0 42.0 139.0 36.0 72.0 190.0 8.0 20.0
2019-01-01 02:12:57.921321 5.0 79.0 0.0 1.0 406.0 157.0 0.0 235.0 148.0 25.0 ... 3.0 37.0 47.0 42.0 144.0 35.0 72.0 192.0 8.0 20.0
2019-01-01 02:17:57.898175 5.0 79.0 0.0 1.0 418.0 153.0 0.0 235.0 157.0 29.0 ... 3.0 37.0 56.0 42.0 144.0 35.0 72.0 193.0 8.0 20.0
2019-01-01 02:22:53.474961 5.0 79.0 0.0 1.0 421.0 156.0 0.0 234.0 162.0 29.0 ... 3.0 38.0 58.0 42.0 146.0 42.0 72.0 196.0 6.0 20.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-01-31 21:27:46.206819 10.0 22.0 0.0 0.0 524.0 18.0 0.0 235.0 53.0 29.0 ... 0.0 55.0 32.0 69.0 124.0 26.0 27.0 288.0 9.0 12.0
2019-01-31 21:33:50.135226 11.0 22.0 0.0 0.0 528.0 15.0 0.0 235.0 54.0 29.0 ... 0.0 55.0 34.0 69.0 124.0 26.0 27.0 290.0 9.0 12.0
2019-01-31 21:38:47.066052 11.0 22.0 0.0 0.0 531.0 8.0 0.0 235.0 56.0 29.0 ... 0.0 55.0 37.0 69.0 124.0 27.0 27.0 293.0 11.0 12.0
2019-01-31 21:43:51.447938 12.0 22.0 0.0 0.0 532.0 12.0 0.0 237.0 57.0 29.0 ... 0.0 56.0 39.0 69.0 124.0 27.0 27.0 294.0 11.0 12.0
2019-01-31 21:48:57.912843 12.0 6.0 0.0 0.0 533.0 17.0 0.0 237.0 57.0 29.0 ... 0.0 55.0 42.0 69.0 126.0 26.0 27.0 295.0 12.0 12.0
2019-01-31 21:53:10.028467 12.0 5.0 0.0 0.0 532.0 17.0 0.0 238.0 58.0 29.0 ... 0.0 55.0 41.0 69.0 126.0 26.0 27.0 296.0 12.0 12.0
2019-01-31 21:58:11.571964 12.0 5.0 0.0 0.0 534.0 18.0 0.0 238.0 59.0 29.0 ... 0.0 53.0 42.0 69.0 127.0 27.0 27.0 297.0 12.0 12.0
2019-01-31 22:03:13.412295 12.0 7.0 0.0 0.0 536.0 19.0 0.0 243.0 60.0 28.0 ... 0.0 53.0 43.0 69.0 127.0 27.0 27.0 303.0 11.0 12.0
2019-01-31 22:08:15.514640 12.0 7.0 0.0 0.0 535.0 20.0 0.0 244.0 61.0 28.0 ... 0.0 53.0 43.0 70.0 127.0 27.0 27.0 304.0 11.0 12.0
2019-01-31 22:13:20.492405 12.0 7.0 0.0 0.0 539.0 23.0 0.0 245.0 62.0 28.0 ... 0.0 54.0 44.0 70.0 127.0 27.0 27.0 304.0 11.0 12.0
2019-01-31 22:18:23.078005 13.0 7.0 0.0 0.0 538.0 27.0 0.0 245.0 64.0 29.0 ... 0.0 54.0 46.0 72.0 127.0 28.0 27.0 305.0 11.0 12.0
2019-01-31 22:23:26.651440 13.0 7.0 0.0 0.0 539.0 28.0 0.0 246.0 64.0 29.0 ... 0.0 54.0 47.0 72.0 128.0 28.0 27.0 310.0 11.0 12.0
2019-01-31 22:28:26.148944 11.0 7.0 0.0 0.0 539.0 30.0 0.0 247.0 65.0 30.0 ... 0.0 55.0 49.0 72.0 127.0 31.0 27.0 311.0 18.0 12.0
2019-01-31 22:33:33.933291 11.0 7.0 0.0 0.0 540.0 30.0 0.0 249.0 39.0 30.0 ... 0.0 55.0 54.0 72.0 130.0 31.0 27.0 311.0 18.0 12.0
2019-01-31 22:38:37.469309 11.0 7.0 0.0 0.0 539.0 32.0 0.0 248.0 43.0 30.0 ... 0.0 55.0 56.0 72.0 131.0 30.0 27.0 314.0 17.0 12.0
2019-01-31 22:43:51.681951 11.0 7.0 0.0 0.0 538.0 31.0 0.0 248.0 43.0 30.0 ... 0.0 55.0 58.0 72.0 131.0 30.0 27.0 317.0 17.0 12.0
2019-01-31 22:47:47.382396 11.0 7.0 0.0 0.0 539.0 35.0 0.0 248.0 45.0 30.0 ... 0.0 55.0 61.0 71.0 131.0 31.0 27.0 319.0 17.0 12.0
2019-01-31 22:52:48.298390 11.0 7.0 0.0 0.0 540.0 36.0 0.0 249.0 47.0 30.0 ... 0.0 55.0 60.0 71.0 133.0 31.0 27.0 320.0 19.0 12.0
2019-01-31 22:57:51.937293 11.0 8.0 0.0 0.0 540.0 41.0 0.0 249.0 48.0 30.0 ... 0.0 55.0 63.0 71.0 133.0 31.0 27.0 321.0 19.0 12.0
2019-01-31 23:03:01.586139 11.0 8.0 0.0 0.0 541.0 70.0 0.0 250.0 48.0 30.0 ... 0.0 55.0 64.0 71.0 133.0 32.0 27.0 321.0 19.0 12.0
2019-01-31 23:08:00.045280 11.0 8.0 0.0 0.0 540.0 99.0 0.0 250.0 48.0 30.0 ... 0.0 56.0 64.0 71.0 133.0 34.0 27.0 322.0 22.0 12.0
2019-01-31 23:13:05.809609 11.0 9.0 0.0 0.0 541.0 112.0 0.0 251.0 48.0 31.0 ... 0.0 56.0 70.0 71.0 135.0 34.0 27.0 322.0 22.0 12.0
2019-01-31 23:18:08.733336 11.0 9.0 0.0 0.0 541.0 123.0 0.0 252.0 49.0 32.0 ... 0.0 56.0 71.0 71.0 138.0 35.0 27.0 322.0 23.0 12.0
2019-01-31 23:23:10.170335 11.0 10.0 0.0 0.0 540.0 138.0 0.0 252.0 49.0 32.0 ... 0.0 56.0 72.0 71.0 138.0 35.0 27.0 323.0 23.0 12.0
2019-01-31 23:28:23.002361 11.0 10.0 0.0 0.0 540.0 153.0 0.0 252.0 49.0 32.0 ... 0.0 58.0 72.0 71.0 140.0 36.0 27.0 323.0 22.0 12.0
2019-01-31 23:33:34.376227 11.0 10.0 0.0 0.0 541.0 157.0 0.0 251.0 49.0 32.0 ... 0.0 59.0 72.0 71.0 140.0 36.0 27.0 323.0 22.0 13.0
2019-01-31 23:38:44.117248 11.0 10.0 0.0 0.0 542.0 161.0 0.0 252.0 50.0 32.0 ... 0.0 59.0 72.0 71.0 140.0 36.0 27.0 322.0 23.0 13.0
2019-01-31 23:43:44.644075 11.0 10.0 0.0 0.0 541.0 162.0 0.0 252.0 51.0 32.0 ... 0.0 60.0 72.0 70.0 140.0 37.0 27.0 322.0 24.0 13.0
2019-01-31 23:48:59.779344 11.0 10.0 0.0 0.0 542.0 167.0 0.0 252.0 51.0 32.0 ... 0.0 60.0 72.0 70.0 141.0 37.0 27.0 322.0 24.0 13.0
2019-01-31 23:54:01.214992 11.0 10.0 0.0 0.0 543.0 172.0 0.0 252.0 51.0 32.0 ... 0.0 60.0 72.0 70.0 141.0 37.0 27.0 314.0 24.0 13.0

8928 rows × 44 columns

In [379]:
df.describe()
Out[379]:
ABBAYE ANNONCIADE ATHENA BOSIO C.C.F. CASINO CHPG 1 (HAUT) CHPG 2 (BAS) CONDAMINE COSTA ... REGULATION BUS ROQUEVILLE SQUARE GASTAUD ST ANTOINE ST CHARLES ST LAURENT ST NICOLAS STADE TESTIMONIO VISITATION
count 8865.000000 8854.000000 8865.000000 8864.000000 8861.000000 8847.000000 8865.0 8865.000000 8861.000000 8862.000000 ... 8865.000000 8836.000000 8861.000000 8831.000000 8864.000000 8827.000000 8860.000000 8861.000000 8863.000000 8865.000000
mean 9.476368 38.172351 0.000226 6.910537 398.136215 199.657963 0.0 174.313480 86.913441 25.438389 ... 2.430795 53.476799 61.035662 99.672064 108.812613 34.140365 31.952822 322.871121 18.715446 16.493739
std 4.138936 33.725513 0.015019 10.748654 178.069002 130.961641 0.0 77.492839 57.309808 19.855032 ... 0.913435 37.488724 41.110327 97.942175 59.597936 24.016058 24.324109 317.795280 12.390793 3.231478
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 7.000000 9.000000 0.000000 0.000000 206.000000 70.000000 0.0 119.000000 48.000000 6.000000 ... 2.000000 29.000000 20.000000 28.000000 57.000000 6.000000 18.000000 85.000000 6.000000 15.000000
50% 9.000000 17.000000 0.000000 2.000000 513.000000 229.000000 0.0 204.000000 66.000000 24.000000 ... 3.000000 62.000000 71.000000 54.000000 138.000000 40.000000 23.000000 225.000000 21.000000 17.000000
75% 12.000000 80.000000 0.000000 10.000000 550.000000 308.000000 0.0 241.000000 142.000000 40.000000 ... 3.000000 73.000000 90.000000 248.000000 151.000000 54.000000 39.000000 461.000000 30.000000 19.000000
max 29.000000 87.000000 1.000000 54.000000 559.000000 456.000000 0.0 267.000000 199.000000 85.000000 ... 3.000000 903.000000 136.000000 273.000000 210.000000 88.000000 89.000000 930.000000 40.000000 20.000000

8 rows × 44 columns

In [344]:
df.plot(legend=False)
Out[344]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fda13056a58>

Nettoyage des données

  • estimation / correction données manquantes
  • elimination colonnes bizarres
  • normalisation
In [380]:
trop_petits = set(df.columns[df.max() < 40])
bus = set(df.columns[["BUS" in nom for nom in df.columns]])
a_enlever = bus.union(trop_petits)

print("a enlever", a_enlever)
 
df = df.drop(a_enlever, 1)
df
a enlever {'PLATI', 'VISITATION', 'LES CARMES', 'PECHEURS BUS', 'ABBAYE', 'ECOLES', 'REGULATION BUS', 'DES OLIVIERS', 'GRIMALDI FORUM BUS', 'LES AGAVES', 'ATHENA', 'CHPG 1 (HAUT)'}
Out[380]:
ANNONCIADE BOSIO C.C.F. CASINO CHPG 2 (BAS) CONDAMINE COSTA ECOLES RDC ENGELIN GARE ... PORT QUAI ANTOINE 1ER ROQUEVILLE SQUARE GASTAUD ST ANTOINE ST CHARLES ST LAURENT ST NICOLAS STADE TESTIMONIO
date
2018-12-31 23:58:45.513715 61.0 0.0 163.0 0.0 225.0 0.0 0.0 32.0 58.0 0.0 ... 7.0 0.0 16.0 0.0 36.0 1.0 0.0 68.0 139.0 0.0
2019-01-01 00:03:53.162553 61.0 0.0 161.0 0.0 225.0 0.0 0.0 31.0 58.0 0.0 ... 7.0 0.0 16.0 0.0 36.0 1.0 0.0 68.0 139.0 0.0
2019-01-01 00:08:55.503312 61.0 0.0 160.0 0.0 225.0 0.0 0.0 31.0 58.0 1.0 ... 6.0 0.0 16.0 0.0 36.0 1.0 0.0 69.0 137.0 0.0
2019-01-01 00:13:20.806756 61.0 0.0 160.0 0.0 225.0 0.0 0.0 31.0 58.0 1.0 ... 6.0 0.0 16.0 0.0 36.0 0.0 0.0 69.0 138.0 0.0
2019-01-01 00:18:25.333208 62.0 0.0 158.0 0.0 225.0 0.0 0.0 31.0 60.0 1.0 ... 7.0 0.0 16.0 0.0 36.0 1.0 0.0 69.0 138.0 0.0
2019-01-01 00:23:23.809532 62.0 0.0 152.0 29.0 223.0 3.0 0.0 31.0 60.0 2.0 ... 10.0 7.0 16.0 0.0 36.0 1.0 0.0 69.0 137.0 0.0
2019-01-01 00:28:27.377364 62.0 0.0 161.0 40.0 224.0 15.0 4.0 31.0 60.0 2.0 ... 12.0 17.0 17.0 2.0 36.0 6.0 0.0 68.0 136.0 0.0
2019-01-01 00:33:22.104059 64.0 0.0 176.0 46.0 224.0 19.0 6.0 31.0 60.0 9.0 ... 14.0 19.0 17.0 9.0 36.0 8.0 5.0 68.0 136.0 0.0
2019-01-01 00:38:29.046510 64.0 2.0 185.0 73.0 224.0 34.0 5.0 30.0 59.0 13.0 ... 16.0 28.0 17.0 10.0 36.0 14.0 8.0 68.0 136.0 0.0
2019-01-01 00:43:29.351093 65.0 4.0 200.0 79.0 224.0 41.0 7.0 29.0 61.0 23.0 ... 21.0 31.0 16.0 14.0 37.0 22.0 12.0 68.0 140.0 0.0
2019-01-01 00:48:36.611024 65.0 4.0 217.0 80.0 228.0 46.0 8.0 30.0 62.0 36.0 ... 23.0 40.0 16.0 14.0 37.0 29.0 16.0 68.0 142.0 0.0
2019-01-01 00:53:36.516809 66.0 6.0 241.0 81.0 230.0 55.0 5.0 31.0 63.0 46.0 ... 28.0 43.0 16.0 19.0 37.0 40.0 19.0 69.0 148.0 0.0
2019-01-01 00:58:38.622161 66.0 7.0 252.0 92.0 230.0 62.0 9.0 33.0 65.0 58.0 ... 30.0 49.0 18.0 15.0 37.0 52.0 22.0 69.0 152.0 0.0
2019-01-01 01:03:39.326647 67.0 7.0 269.0 93.0 230.0 74.0 11.0 33.0 67.0 65.0 ... 30.0 53.0 18.0 17.0 38.0 62.0 25.0 69.0 155.0 0.0
2019-01-01 01:08:57.683416 68.0 7.0 280.0 111.0 230.0 78.0 11.0 34.0 67.0 71.0 ... 31.0 57.0 19.0 12.0 39.0 66.0 26.0 69.0 157.0 0.0
2019-01-01 01:14:00.256048 70.0 11.0 289.0 116.0 231.0 86.0 13.0 34.0 69.0 81.0 ... 35.0 58.0 21.0 12.0 39.0 71.0 31.0 69.0 158.0 0.0
2019-01-01 01:18:09.696105 70.0 11.0 298.0 112.0 230.0 87.0 13.0 34.0 71.0 83.0 ... 37.0 60.0 23.0 14.0 39.0 75.0 31.0 70.0 159.0 0.0
2019-01-01 01:23:13.187660 72.0 11.0 308.0 113.0 230.0 91.0 13.0 35.0 71.0 92.0 ... 42.0 59.0 25.0 16.0 40.0 84.0 30.0 70.0 163.0 0.0
2019-01-01 01:28:12.490395 72.0 12.0 316.0 127.0 230.0 97.0 16.0 37.0 72.0 99.0 ... 44.0 67.0 28.0 20.0 41.0 93.0 29.0 70.0 167.0 0.0
2019-01-01 01:33:15.772777 73.0 14.0 334.0 131.0 230.0 107.0 17.0 36.0 73.0 104.0 ... 44.0 66.0 29.0 24.0 41.0 103.0 28.0 71.0 168.0 7.0
2019-01-01 01:38:10.126652 74.0 16.0 338.0 139.0 229.0 114.0 19.0 40.0 75.0 109.0 ... 51.0 70.0 31.0 26.0 42.0 109.0 28.0 71.0 172.0 7.0
2019-01-01 01:43:15.472919 77.0 18.0 352.0 146.0 233.0 120.0 20.0 43.0 77.0 115.0 ... 52.0 74.0 33.0 26.0 42.0 114.0 31.0 71.0 175.0 8.0
2019-01-01 01:48:16.924851 77.0 18.0 360.0 146.0 233.0 127.0 22.0 43.0 77.0 125.0 ... 52.0 72.0 33.0 27.0 42.0 120.0 34.0 71.0 182.0 8.0
2019-01-01 01:53:17.641957 77.0 20.0 369.0 146.0 233.0 131.0 25.0 43.0 77.0 132.0 ... 54.0 75.0 33.0 32.0 42.0 125.0 35.0 71.0 183.0 8.0
2019-01-01 01:58:43.727033 78.0 21.0 374.0 148.0 236.0 135.0 25.0 43.0 77.0 137.0 ... 54.0 79.0 35.0 37.0 42.0 130.0 34.0 72.0 184.0 8.0
2019-01-01 02:03:43.208523 79.0 23.0 383.0 156.0 235.0 140.0 26.0 43.0 78.0 138.0 ... 55.0 88.0 36.0 40.0 42.0 136.0 35.0 72.0 188.0 8.0
2019-01-01 02:08:52.602849 79.0 1.0 397.0 152.0 235.0 143.0 25.0 45.0 79.0 110.0 ... 55.0 95.0 37.0 46.0 42.0 139.0 36.0 72.0 190.0 8.0
2019-01-01 02:12:57.921321 79.0 1.0 406.0 157.0 235.0 148.0 25.0 45.0 83.0 113.0 ... 55.0 97.0 37.0 47.0 42.0 144.0 35.0 72.0 192.0 8.0
2019-01-01 02:17:57.898175 79.0 1.0 418.0 153.0 235.0 157.0 29.0 47.0 83.0 122.0 ... 56.0 102.0 37.0 56.0 42.0 144.0 35.0 72.0 193.0 8.0
2019-01-01 02:22:53.474961 79.0 1.0 421.0 156.0 234.0 162.0 29.0 47.0 83.0 127.0 ... 56.0 108.0 38.0 58.0 42.0 146.0 42.0 72.0 196.0 6.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-01-31 21:27:46.206819 22.0 0.0 524.0 18.0 235.0 53.0 29.0 41.0 89.0 63.0 ... 9.0 45.0 55.0 32.0 69.0 124.0 26.0 27.0 288.0 9.0
2019-01-31 21:33:50.135226 22.0 0.0 528.0 15.0 235.0 54.0 29.0 42.0 90.0 63.0 ... 11.0 52.0 55.0 34.0 69.0 124.0 26.0 27.0 290.0 9.0
2019-01-31 21:38:47.066052 22.0 0.0 531.0 8.0 235.0 56.0 29.0 42.0 90.0 63.0 ... 11.0 57.0 55.0 37.0 69.0 124.0 27.0 27.0 293.0 11.0
2019-01-31 21:43:51.447938 22.0 0.0 532.0 12.0 237.0 57.0 29.0 42.0 90.0 65.0 ... 12.0 59.0 56.0 39.0 69.0 124.0 27.0 27.0 294.0 11.0
2019-01-31 21:48:57.912843 6.0 0.0 533.0 17.0 237.0 57.0 29.0 42.0 90.0 66.0 ... 12.0 61.0 55.0 42.0 69.0 126.0 26.0 27.0 295.0 12.0
2019-01-31 21:53:10.028467 5.0 0.0 532.0 17.0 238.0 58.0 29.0 43.0 90.0 66.0 ... 13.0 63.0 55.0 41.0 69.0 126.0 26.0 27.0 296.0 12.0
2019-01-31 21:58:11.571964 5.0 0.0 534.0 18.0 238.0 59.0 29.0 43.0 90.0 65.0 ... 14.0 64.0 53.0 42.0 69.0 127.0 27.0 27.0 297.0 12.0
2019-01-31 22:03:13.412295 7.0 0.0 536.0 19.0 243.0 60.0 28.0 43.0 90.0 66.0 ... 14.0 64.0 53.0 43.0 69.0 127.0 27.0 27.0 303.0 11.0
2019-01-31 22:08:15.514640 7.0 0.0 535.0 20.0 244.0 61.0 28.0 43.0 90.0 66.0 ... 22.0 67.0 53.0 43.0 70.0 127.0 27.0 27.0 304.0 11.0
2019-01-31 22:13:20.492405 7.0 0.0 539.0 23.0 245.0 62.0 28.0 44.0 90.0 67.0 ... 23.0 70.0 54.0 44.0 70.0 127.0 27.0 27.0 304.0 11.0
2019-01-31 22:18:23.078005 7.0 0.0 538.0 27.0 245.0 64.0 29.0 44.0 90.0 65.0 ... 23.0 70.0 54.0 46.0 72.0 127.0 28.0 27.0 305.0 11.0
2019-01-31 22:23:26.651440 7.0 0.0 539.0 28.0 246.0 64.0 29.0 44.0 91.0 66.0 ... 24.0 71.0 54.0 47.0 72.0 128.0 28.0 27.0 310.0 11.0
2019-01-31 22:28:26.148944 7.0 0.0 539.0 30.0 247.0 65.0 30.0 44.0 90.0 65.0 ... 26.0 73.0 55.0 49.0 72.0 127.0 31.0 27.0 311.0 18.0
2019-01-31 22:33:33.933291 7.0 0.0 540.0 30.0 249.0 39.0 30.0 44.0 90.0 65.0 ... 28.0 81.0 55.0 54.0 72.0 130.0 31.0 27.0 311.0 18.0
2019-01-31 22:38:37.469309 7.0 0.0 539.0 32.0 248.0 43.0 30.0 44.0 91.0 65.0 ... 27.0 80.0 55.0 56.0 72.0 131.0 30.0 27.0 314.0 17.0
2019-01-31 22:43:51.681951 7.0 0.0 538.0 31.0 248.0 43.0 30.0 37.0 91.0 66.0 ... 27.0 85.0 55.0 58.0 72.0 131.0 30.0 27.0 317.0 17.0
2019-01-31 22:47:47.382396 7.0 0.0 539.0 35.0 248.0 45.0 30.0 37.0 91.0 66.0 ... 29.0 91.0 55.0 61.0 71.0 131.0 31.0 27.0 319.0 17.0
2019-01-31 22:52:48.298390 7.0 0.0 540.0 36.0 249.0 47.0 30.0 37.0 91.0 66.0 ... 31.0 93.0 55.0 60.0 71.0 133.0 31.0 27.0 320.0 19.0
2019-01-31 22:57:51.937293 8.0 0.0 540.0 41.0 249.0 48.0 30.0 37.0 90.0 66.0 ... 31.0 101.0 55.0 63.0 71.0 133.0 31.0 27.0 321.0 19.0
2019-01-31 23:03:01.586139 8.0 0.0 541.0 70.0 250.0 48.0 30.0 38.0 90.0 66.0 ... 31.0 103.0 55.0 64.0 71.0 133.0 32.0 27.0 321.0 19.0
2019-01-31 23:08:00.045280 8.0 0.0 540.0 99.0 250.0 48.0 30.0 41.0 89.0 67.0 ... 33.0 109.0 56.0 64.0 71.0 133.0 34.0 27.0 322.0 22.0
2019-01-31 23:13:05.809609 9.0 0.0 541.0 112.0 251.0 48.0 31.0 41.0 89.0 67.0 ... 34.0 113.0 56.0 70.0 71.0 135.0 34.0 27.0 322.0 22.0
2019-01-31 23:18:08.733336 9.0 0.0 541.0 123.0 252.0 49.0 32.0 41.0 89.0 67.0 ... 36.0 115.0 56.0 71.0 71.0 138.0 35.0 27.0 322.0 23.0
2019-01-31 23:23:10.170335 10.0 0.0 540.0 138.0 252.0 49.0 32.0 41.0 90.0 68.0 ... 38.0 120.0 56.0 72.0 71.0 138.0 35.0 27.0 323.0 23.0
2019-01-31 23:28:23.002361 10.0 0.0 540.0 153.0 252.0 49.0 32.0 41.0 90.0 68.0 ... 40.0 127.0 58.0 72.0 71.0 140.0 36.0 27.0 323.0 22.0
2019-01-31 23:33:34.376227 10.0 0.0 541.0 157.0 251.0 49.0 32.0 42.0 91.0 68.0 ... 40.0 134.0 59.0 72.0 71.0 140.0 36.0 27.0 323.0 22.0
2019-01-31 23:38:44.117248 10.0 0.0 542.0 161.0 252.0 50.0 32.0 42.0 91.0 68.0 ... 40.0 135.0 59.0 72.0 71.0 140.0 36.0 27.0 322.0 23.0
2019-01-31 23:43:44.644075 10.0 0.0 541.0 162.0 252.0 51.0 32.0 42.0 91.0 68.0 ... 40.0 141.0 60.0 72.0 70.0 140.0 37.0 27.0 322.0 24.0
2019-01-31 23:48:59.779344 10.0 0.0 542.0 167.0 252.0 51.0 32.0 42.0 91.0 68.0 ... 40.0 146.0 60.0 72.0 70.0 141.0 37.0 27.0 322.0 24.0
2019-01-31 23:54:01.214992 10.0 0.0 543.0 172.0 252.0 51.0 32.0 42.0 91.0 68.0 ... 40.0 148.0 60.0 72.0 70.0 141.0 37.0 27.0 314.0 24.0

8928 rows × 32 columns

Etude des valeurs manquantes

In [347]:
nb_nan = df.isna().sum().sum()
print("Nombre de valeurs manquantes :", nb_nan)
Nombre de valeurs manquantes : 2243
In [348]:
# analyse par ligne et colonnes
df.isnull().sum(axis=0).sort_values()
Out[348]:
LARVOTTO             63
QUAI ANTOINE 1ER     63
CHPG 2 (BAS)         63
PECHEURS             63
LOUIS II             63
ENGELIN              63
LA DIGUE             63
JARDIN EXOTIQUE      63
BOSIO                64
ST CHARLES           64
GARE                 64
OSTENDE              65
TESTIMONIO           65
COSTA                66
STADE                67
C.C.F.               67
PAPALINS             67
CONDAMINE            67
SQUARE GASTAUD       67
ST NICOLAS           68
PLACE D ARMES        68
ECOLES RDC           69
HELIPORT             69
PORT                 73
ANNONCIADE           74
MOULINS              74
GRIMALDI FORUM       74
LA COLLE             76
CASINO               81
ROQUEVILLE           92
ST ANTOINE           97
ST LAURENT          101
dtype: int64
In [349]:
t = df.isnull().sum(axis=1)
plt.figure()
t.plot()
plt.show()

Downsampling et nettoyage

In [318]:
df = df.resample('H').mean()
df = df.interpolate(how='linear')
df 
Out[318]:
ANNONCIADE BOSIO C.C.F. CASINO CHPG 2 (BAS) CONDAMINE COSTA ECOLES RDC ENGELIN GARE ... PORT QUAI ANTOINE 1ER ROQUEVILLE SQUARE GASTAUD ST ANTOINE ST CHARLES ST LAURENT ST NICOLAS STADE TESTIMONIO
date
2018-12-31 23:00:00 61.000000 0.000000 163.000000 0.000000 225.000000 0.000000 0.000000 32.000000 58.000000 0.000000 ... 7.000000 0.000000 16.000000 0.000000 36.000000 1.000000 0.000000 68.000000 139.000000 0.000000
2019-01-01 00:00:00 63.250000 1.916667 185.250000 43.333333 225.583333 22.916667 3.666667 30.833333 60.333333 16.000000 ... 15.000000 19.500000 16.416667 6.916667 36.333333 14.583333 6.833333 68.500000 139.916667 0.000000
2019-01-01 01:00:00 72.916667 13.833333 323.916667 127.333333 231.250000 103.916667 17.083333 37.916667 72.750000 101.083333 ... 43.833333 65.833333 27.333333 21.916667 40.583333 96.000000 30.166667 70.333333 168.583333 3.833333
2019-01-01 02:00:00 78.833333 5.750000 427.666667 158.166667 234.833333 165.416667 29.916667 46.583333 83.750000 132.000000 ... 57.500000 110.250000 39.583333 59.583333 42.750000 154.666667 38.166667 72.416667 197.666667 8.083333
2019-01-01 03:00:00 79.833333 10.000000 487.250000 212.166667 236.166667 164.000000 44.416667 52.000000 87.916667 165.833333 ... 68.750000 133.583333 44.500000 83.500000 46.750000 184.416667 36.916667 73.750000 211.166667 10.000000
2019-01-01 04:00:00 81.000000 12.000000 520.166667 272.416667 239.416667 172.083333 53.750000 53.833333 91.250000 187.000000 ... 75.416667 144.583333 45.250000 99.333333 48.083333 201.250000 44.166667 73.083333 225.916667 10.916667
2019-01-01 05:00:00 81.166667 12.000000 533.500000 309.500000 242.083333 179.833333 58.750000 54.666667 91.416667 198.333333 ... 78.416667 172.500000 45.833333 114.416667 48.916667 208.000000 49.916667 73.250000 231.583333 12.583333
2019-01-01 06:00:00 81.666667 12.000000 544.500000 354.666667 227.583333 180.750000 58.500000 54.833333 91.000000 199.500000 ... 80.583333 190.416667 46.000000 122.250000 50.000000 156.416667 60.083333 73.750000 234.000000 11.416667
2019-01-01 07:00:00 81.000000 10.666667 548.166667 370.833333 225.750000 181.666667 59.583333 55.000000 90.916667 199.750000 ... 82.083333 197.750000 45.750000 123.166667 50.000000 158.916667 62.000000 72.916667 233.000000 8.000000
2019-01-01 08:00:00 80.833333 10.000000 548.916667 374.416667 223.500000 132.333333 60.583333 54.416667 90.833333 196.583333 ... 83.416667 206.583333 47.000000 120.333333 50.000000 159.000000 62.666667 72.833333 233.666667 9.583333
2019-01-01 09:00:00 80.166667 10.000000 547.416667 374.166667 215.583333 63.750000 61.583333 53.666667 90.583333 199.833333 ... 83.750000 209.000000 48.000000 112.833333 50.416667 159.416667 62.750000 73.000000 233.250000 10.333333
2019-01-01 10:00:00 81.000000 10.000000 543.000000 352.250000 207.916667 60.250000 65.166667 54.833333 90.833333 200.916667 ... 85.166667 199.083333 50.083333 107.750000 50.000000 161.083333 55.916667 72.416667 231.916667 12.000000
2019-01-01 11:00:00 81.500000 12.916667 536.416667 280.916667 191.833333 48.833333 61.750000 55.000000 89.916667 202.166667 ... 83.416667 177.916667 51.500000 100.250000 50.000000 157.166667 51.000000 73.000000 230.750000 9.000000
2019-01-01 12:00:00 83.000000 13.000000 529.166667 169.500000 167.916667 138.166667 58.750000 54.833333 89.333333 198.500000 ... 80.333333 130.333333 53.000000 63.333333 47.916667 154.333333 43.750000 73.000000 229.250000 9.250000
2019-01-01 13:00:00 83.500000 12.416667 521.333333 67.750000 163.166667 129.416667 51.583333 55.666667 89.250000 200.500000 ... 75.750000 52.833333 53.500000 28.083333 46.583333 151.416667 32.416667 74.166667 225.750000 5.500000
2019-01-01 14:00:00 84.000000 11.750000 514.916667 10.916667 149.250000 107.416667 46.250000 55.000000 90.750000 199.833333 ... 70.083333 3.083333 53.666667 8.666667 44.166667 153.000000 26.666667 75.000000 224.000000 4.750000
2019-01-01 15:00:00 83.666667 10.666667 502.750000 7.166667 130.250000 69.166667 40.416667 52.583333 91.416667 197.750000 ... 72.750000 1.666667 53.833333 1.250000 44.500000 152.166667 28.916667 75.000000 222.166667 4.500000
2019-01-01 16:00:00 84.750000 11.583333 498.333333 4.333333 136.333333 65.250000 51.166667 53.916667 87.000000 196.166667 ... 80.250000 3.000000 48.583333 2.666667 47.750000 146.500000 43.000000 74.000000 225.750000 8.833333
2019-01-01 17:00:00 84.833333 11.833333 508.250000 10.666667 156.750000 88.583333 45.916667 56.833333 84.500000 196.583333 ... 82.916667 6.000000 46.833333 3.250000 50.500000 148.250000 42.000000 74.000000 227.833333 11.666667
2019-01-01 18:00:00 85.500000 12.833333 523.750000 34.416667 174.500000 134.083333 38.583333 58.416667 82.500000 204.666667 ... 85.833333 34.166667 47.333333 16.916667 53.083333 149.000000 59.166667 72.083333 229.000000 13.583333
2019-01-01 19:00:00 86.166667 13.500000 530.833333 89.750000 198.083333 155.416667 47.750000 59.000000 84.916667 208.833333 ... 85.833333 77.500000 50.166667 48.333333 52.166667 159.666667 66.666667 73.916667 231.916667 10.416667
2019-01-01 20:00:00 87.000000 14.000000 536.916667 135.083333 209.416667 129.500000 33.916667 58.583333 90.166667 216.750000 ... 81.750000 106.083333 50.666667 56.583333 52.833333 163.333333 75.000000 74.000000 231.166667 12.250000
2019-01-01 21:00:00 86.333333 6.250000 541.833333 157.250000 218.000000 54.833333 39.583333 59.000000 93.000000 68.250000 ... 76.416667 135.416667 51.166667 72.000000 54.000000 164.833333 77.166667 73.750000 233.166667 14.666667
2019-01-01 22:00:00 83.000000 7.000000 546.333333 161.833333 228.500000 60.666667 41.083333 58.916667 93.166667 66.833333 ... 78.750000 167.250000 53.000000 88.833333 54.000000 167.250000 77.916667 74.000000 234.833333 12.000000
2019-01-01 23:00:00 83.000000 6.083333 548.416667 186.833333 228.750000 64.583333 42.916667 58.250000 91.166667 68.500000 ... 82.916667 186.750000 54.416667 97.750000 54.000000 169.500000 81.750000 74.000000 234.000000 13.083333
2019-01-02 00:00:00 82.750000 6.000000 549.333333 236.583333 231.916667 64.500000 43.000000 58.916667 90.833333 72.750000 ... 81.583333 195.416667 55.000000 88.333333 54.000000 170.833333 73.000000 73.916667 235.583333 16.000000
2019-01-02 01:00:00 83.000000 6.000000 555.000000 305.250000 236.500000 65.000000 48.166667 59.000000 91.166667 70.916667 ... 43.000000 189.333333 55.333333 81.000000 54.083333 164.833333 64.250000 74.000000 237.333333 15.583333
2019-01-02 02:00:00 83.000000 6.000000 554.916667 345.833333 239.250000 65.000000 50.333333 59.000000 92.666667 70.000000 ... 43.000000 173.416667 55.583333 81.000000 55.000000 149.333333 65.416667 40.583333 235.500000 14.916667
2019-01-02 03:00:00 82.750000 5.750000 554.666667 367.666667 240.083333 65.833333 50.750000 58.833333 93.000000 69.083333 ... 44.000000 173.000000 56.000000 81.500000 54.833333 148.666667 67.000000 24.000000 235.416667 15.000000
2019-01-02 04:00:00 83.000000 5.000000 553.750000 381.916667 241.000000 67.000000 51.000000 59.000000 93.000000 69.083333 ... 44.000000 172.666667 56.000000 82.000000 54.083333 171.916667 67.250000 23.416667 234.666667 15.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-01-30 18:00:00 38.333333 0.000000 229.166667 207.416667 177.416667 19.916667 3.833333 37.666667 62.833333 70.750000 ... 23.916667 87.083333 36.916667 59.583333 50.250000 69.166667 35.083333 21.000000 174.166667 5.666667
2019-01-30 19:00:00 40.916667 0.000000 359.666667 268.500000 212.666667 15.333333 7.500000 42.166667 74.416667 69.750000 ... 16.000000 73.583333 46.166667 59.916667 61.666667 103.833333 52.250000 21.500000 233.750000 6.166667
2019-01-30 20:00:00 43.666667 0.000000 457.750000 287.916667 238.000000 29.250000 12.666667 41.083333 78.000000 76.416667 ... 7.416667 63.166667 50.000000 58.833333 69.416667 130.666667 55.416667 22.000000 275.166667 8.000000
2019-01-30 21:00:00 18.454545 0.000000 512.545455 290.090909 243.272727 43.545455 15.636364 41.818182 79.545455 79.272727 ... 7.909091 70.272727 54.454545 66.272727 73.909091 138.636364 56.727273 24.909091 292.909091 9.454545
2019-01-30 22:00:00 13.916667 0.000000 524.416667 298.416667 249.250000 51.416667 16.083333 43.750000 80.250000 81.583333 ... 20.333333 87.666667 63.333333 79.000000 79.666667 140.750000 62.583333 26.000000 308.750000 14.000000
2019-01-30 23:00:00 15.000000 0.000000 527.833333 310.750000 252.750000 59.250000 19.583333 45.000000 79.666667 84.000000 ... 32.833333 112.750000 63.916667 81.166667 80.833333 142.833333 69.500000 26.000000 314.833333 21.500000
2019-01-31 00:00:00 14.750000 0.000000 548.416667 323.416667 255.166667 61.666667 20.000000 46.000000 79.833333 73.500000 ... 36.916667 122.583333 64.000000 83.916667 80.000000 143.000000 71.750000 26.000000 276.833333 26.416667
2019-01-31 01:00:00 15.000000 0.000000 555.000000 332.666667 250.583333 61.500000 17.333333 46.000000 79.000000 60.750000 ... 40.000000 125.250000 63.916667 85.666667 80.000000 143.000000 71.750000 26.000000 232.583333 26.833333
2019-01-31 02:00:00 15.000000 0.000000 554.833333 342.916667 247.916667 62.000000 17.000000 42.583333 78.833333 60.500000 ... 40.000000 127.000000 63.583333 86.000000 76.583333 143.916667 71.916667 25.916667 231.250000 27.000000
2019-01-31 03:00:00 14.666667 0.000000 554.833333 352.583333 247.416667 62.000000 16.833333 42.000000 79.000000 60.083333 ... 40.000000 127.000000 64.000000 86.000000 71.000000 144.583333 72.000000 25.916667 231.333333 26.916667
2019-01-31 04:00:00 14.833333 0.000000 554.000000 352.750000 247.000000 62.000000 17.000000 42.000000 79.000000 59.166667 ... 40.000000 127.000000 63.916667 86.000000 70.000000 148.500000 48.166667 24.916667 230.750000 27.000000
2019-01-31 05:00:00 14.000000 0.000000 553.916667 357.833333 247.750000 61.250000 15.583333 41.833333 78.750000 60.750000 ... 39.583333 126.166667 63.750000 86.000000 67.500000 147.750000 44.583333 24.833333 230.000000 27.416667
2019-01-31 06:00:00 12.916667 0.000000 551.416667 353.500000 237.500000 61.333333 12.916667 41.083333 78.083333 60.083333 ... 40.833333 121.916667 61.416667 83.166667 63.166667 146.000000 42.833333 25.000000 223.666667 27.333333
2019-01-31 07:00:00 9.166667 0.000000 535.833333 241.583333 211.000000 59.500000 8.416667 38.833333 70.250000 57.250000 ... 37.500000 115.333333 46.833333 67.416667 53.833333 135.583333 34.333333 24.416667 180.500000 19.750000
2019-01-31 08:00:00 5.000000 0.000000 449.250000 138.750000 156.000000 52.583333 3.083333 25.000000 57.166667 46.583333 ... 24.166667 80.500000 21.083333 34.333333 37.416667 99.250000 6.916667 23.333333 94.750000 4.916667
2019-01-31 09:00:00 0.750000 0.000000 302.416667 54.916667 86.000000 44.500000 3.333333 24.666667 49.000000 34.916667 ... 6.083333 65.833333 0.750000 3.666667 25.416667 30.416667 0.000000 23.333333 14.500000 0.750000
2019-01-31 10:00:00 1.666667 0.000000 194.666667 0.000000 42.000000 28.000000 5.166667 14.083333 50.083333 22.750000 ... 0.500000 54.916667 525.916667 1.083333 23.416667 3.333333 0.083333 19.083333 2.416667 0.333333
2019-01-31 11:00:00 5.000000 0.000000 186.333333 0.000000 24.666667 19.333333 2.500000 8.416667 50.416667 15.583333 ... 0.916667 45.416667 75.750000 0.416667 22.166667 1.250000 0.250000 16.083333 4.250000 0.833333
2019-01-31 12:00:00 5.166667 0.000000 198.750000 0.000000 68.083333 10.166667 2.750000 15.333333 68.500000 14.666667 ... 0.250000 24.666667 1.166667 0.083333 14.666667 7.583333 0.750000 16.666667 9.166667 0.333333
2019-01-31 13:00:00 3.750000 0.000000 224.416667 0.000000 40.166667 9.083333 0.416667 16.916667 63.666667 27.083333 ... 0.333333 16.166667 0.000000 3.500000 13.583333 0.166667 0.000000 19.416667 30.750000 0.166667
2019-01-31 14:00:00 5.500000 0.000000 189.916667 0.000000 10.500000 8.416667 0.333333 14.833333 56.250000 23.250000 ... 1.916667 25.750000 4.500000 0.416667 15.500000 0.833333 0.000000 21.000000 29.000000 1.583333
2019-01-31 15:00:00 8.750000 0.000000 128.500000 0.000000 18.000000 6.916667 0.666667 14.833333 54.333333 24.333333 ... 7.416667 27.750000 2.916667 2.250000 16.333333 1.583333 0.333333 20.500000 30.333333 3.833333
2019-01-31 16:00:00 16.166667 0.000000 147.833333 0.000000 58.916667 16.500000 2.000000 11.750000 59.166667 32.000000 ... 12.500000 47.250000 9.916667 15.833333 21.166667 5.000000 0.916667 22.916667 64.166667 4.166667
2019-01-31 17:00:00 10.583333 0.000000 162.666667 39.333333 133.916667 25.750000 6.583333 30.583333 58.916667 46.000000 ... 25.666667 63.500000 28.416667 31.666667 30.583333 13.083333 2.583333 22.750000 105.166667 5.166667
2019-01-31 18:00:00 8.500000 0.000000 226.333333 83.333333 182.333333 8.750000 12.666667 43.583333 71.000000 56.083333 ... 23.500000 76.833333 40.500000 39.250000 37.916667 54.500000 8.833333 23.166667 182.500000 5.250000
2019-01-31 19:00:00 16.250000 0.000000 350.166667 65.416667 208.750000 12.750000 16.000000 44.500000 83.666667 55.333333 ... 14.250000 83.250000 50.500000 33.500000 50.166667 89.166667 17.000000 25.750000 220.333333 8.666667
2019-01-31 20:00:00 20.583333 0.000000 463.583333 14.250000 229.416667 16.750000 24.416667 40.916667 86.583333 58.250000 ... 6.416667 43.500000 52.333333 17.916667 64.833333 111.166667 19.500000 27.000000 257.083333 8.416667
2019-01-31 21:00:00 17.833333 0.000000 523.416667 14.500000 234.833333 48.833333 29.000000 41.833333 89.166667 63.416667 ... 10.583333 51.583333 54.666667 33.250000 68.583333 124.083333 25.583333 27.000000 288.833333 9.666667
2019-01-31 22:00:00 7.083333 0.000000 538.500000 29.333333 246.750000 53.416667 29.333333 41.500000 90.416667 65.750000 ... 25.416667 78.833333 54.416667 52.000000 71.166667 129.333333 29.333333 27.000000 311.583333 15.000000
2019-01-31 23:00:00 9.454545 0.000000 541.090909 137.636364 251.454545 49.363636 31.545455 41.181818 90.181818 67.545455 ... 37.454545 126.454545 57.727273 70.272727 70.727273 138.090909 35.363636 27.000000 321.454545 22.545455

745 rows × 32 columns

Normalisation des échelles

  • grandes différences dans les tailles des parkings
  • on va diviser chaque parking par son nombre de place total
In [382]:
pourcentage_libre = df.apply(lambda x: x / x.max())
pourcentage_libre.plot(legend=False)
Out[382]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fda12afd160>

Place aux questions métier

  • les parkings sont-ils suffisamment grands ?
  • y-a-t'il une récurrence temporelle au niveau des places libres ?
  • par quel endroit les travailleurs arrivent-ils ?
  • les patterns sont-ils les même en été et en hiver ?
  • ...

Les parkings sont-ils assez grands ?

On regarde si a un moment on a plus du tout de places libres.

In [322]:
total_places_libres = df.sum(axis=1)
total_places_libres.describe()
plt.figure()
total_places_libres.plot()
plt.show()

Les parkings suivent-ils les même tendances ?

Sont-ils tous quasiment plein à 8h du mat et quasiment vide la nuit ?

In [351]:
coeff_variation = df.std() / df.mean()
coeff_variation.dropna().sort_values()
Out[351]:
ENGELIN             0.162077
GRIMALDI FORUM      0.283716
MOULINS             0.378335
LOUIS II            0.408160
LARVOTTO            0.429566
PECHEURS            0.430500
CHPG 2 (BAS)        0.444560
C.C.F.              0.447256
JARDIN EXOTIQUE     0.474097
ST CHARLES          0.547712
ECOLES RDC          0.556609
QUAI ANTOINE 1ER    0.607473
OSTENDE             0.641370
CASINO              0.655930
CONDAMINE           0.659389
TESTIMONIO          0.662062
GARE                0.673498
SQUARE GASTAUD      0.673546
LA COLLE            0.692405
ROQUEVILLE          0.701028
ST LAURENT          0.703451
LA DIGUE            0.758203
PLACE D ARMES       0.759196
ST NICOLAS          0.761251
PORT                0.772705
COSTA               0.780515
ANNONCIADE          0.883506
ST ANTOINE          0.982644
STADE               0.984279
PAPALINS            1.023247
HELIPORT            1.158611
BOSIO               1.555401
dtype: float64
In [385]:
#pourcentage_libre[["ENGELIN", "PAPALINS", "STADE", "BOSIO"]].plot()
pourcentage_libre[["STADE"]].plot()
Out[385]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fda127dd080>
In [354]:
import calmap
calmap.calendarplot(df.STADE)
plt.show() 

Conclusions

  • python / pandas / matplotlib => facilitent l'analyse de données
    • change la donne par rapport à numpy de base
  • données > tout le reste
    • origine
    • déterminer la volonté du fournisseur
      • quels sont les biais ?
    • le reste consiste à maitriser les outils (mathématiques, techniques...)
  • importance de comprendre les aspects métiers
    • pas de "balles en argent" en data science
  • toujours chercher à répondre à des questions
    • sinon papillonage
    • computer disease (R. Feynman): “Anybody who works with computers knows about [it]. It's a very serious disease and it interferes completely with the work. The trouble with computers is that you 'play' with them!”

Merci

Questions ?