model persistence doc, added improvements from ogrisel comments

df27e261 · Raul Garreta · Ignacio Rossi · 8489c330 · df27e261 · df27e261
Commit df27e261 authored 11 years ago by Raul Garreta Committed by Ignacio Rossi 11 years ago
--- a/doc/model_persistence.rst
+++ b/doc/model_persistence.rst
@@ -36,12 +36,25 @@ persistence model, namely `pickle <http://docs.python.org/library/pickle.html>`_
 In the specific case of the scikit, it may be more interesting to use
 joblib's replacement of pickle (``joblib.dump`` & ``joblib.load``),
-which is more efficient on big data, but can only pickle to the disk
+which is more efficient on objects that carry large numpy arrays internally as
-and not to a string::
+is often the case for fitted scikit-learn estimators, but can only pickle to the
+disk and not to a string::
  >>> from sklearn.externals import joblib
  >>> joblib.dump(clf, 'filename.pkl') # doctest: +SKIP
+Later you can load back the pickled model (possibly in another Python process)
+with::
+  >>> clf = joblib.load('filename.pkl') # doctest:+SKIP
+.. note::
+   joblib.dump returns a list of filenames. Each individual numpy array
+   contained in the `clf` object is serialized as a separate file on the
+   filesystem. All files are required in the same folder when reloading the
+   model with joblib.load.
 Security & maintainability limitations
 --------------------------------------

--- a/doc/tutorial/basic/tutorial.rst
+++ b/doc/tutorial/basic/tutorial.rst
@@ -234,7 +234,19 @@ and not to a string::
  >>> from sklearn.externals import joblib
  >>> joblib.dump(clf, 'filename.pkl') # doctest: +SKIP
-It's important for you to know that pickle has some security and maintainability
+Later you can load back the pickled model (possibly in another Python process)
-issues. Please refer to section :ref:`model_persistence` for more detailed
+with::
-information about model persistence with scikit-learn.
+  >>> clf = joblib.load('filename.pkl') # doctest:+SKIP
+.. note::
+   joblib.dump returns a list of filenames. Each individual numpy array
+   contained in the `clf` object is serialized as a separate file on the
+   filesystem. All files are required in the same folder when reloading the
+   model with joblib.load.
+Note that pickle has some security and maintainability issues. Please refer to
+section :ref:`model_persistence` for more detailed information about model
+persistence with scikit-learn.