diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 182999a5832178b8715b367756ea7130f4a6b397..7f20cc293541aaa01cbc59019a8d52244940d470 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -12,10 +12,10 @@ How to contribute
 -----------------
 
 The preferred way to contribute to scikit-learn is to fork the 
-[main repository](http://github.com/scikit-learn/scikit-learn/) on
+[main repository](https://github.com/scikit-learn/scikit-learn) on
 GitHub:
 
-1. Fork the [project repository](http://github.com/scikit-learn/scikit-learn):
+1. Fork the [project repository](https://github.com/scikit-learn/scikit-learn):
    click on the 'Fork' button near the top of the page. This creates
    a copy of the code under your account on the GitHub server.
 
diff --git a/doc/README b/doc/README
index d859cf63b2ade8c4bd124c829a165b8f78504b2b..745143024535f8d068341cd04fbe054f61f18317 100644
--- a/doc/README
+++ b/doc/README
@@ -35,5 +35,5 @@ to update the http://scikit-learn.org/dev tree of the website.
 
 The configuration of this server is managed at:
 
-  http://github.com/scikit-learn/sklearn-docbuilder
+  https://github.com/scikit-learn/sklearn-docbuilder
 
diff --git a/doc/about.rst b/doc/about.rst
index 5d7a014f41bed5b801a3a17f8f0b3633820b9eae..69bea48faac85d2dd11c56d455c4c694278932bd 100644
--- a/doc/about.rst
+++ b/doc/about.rst
@@ -63,7 +63,7 @@ High quality PNG and SVG logos are available in the `doc/logos/ <https://github.
 Funding
 -------
 
-`INRIA <http://www.inria.fr>`_ actively supports this project. It has
+`INRIA <https://www.inria.fr>`_ actively supports this project. It has
 provided funding for Fabian Pedregosa (2010-2012), Jaques Grobler
 (2012-2013) and Olivier Grisel (2013-2015) to work on this project
 full-time. It also hosts coding sprints and other events.
@@ -88,9 +88,9 @@ Environment also funds several students to work on the project part-time.
    :width: 200pt
    :align: center
 
-The following students were sponsored by `Google <http://code.google.com/opensource/>`_
+The following students were sponsored by `Google <https://developers.google.com/open-source/>`_
 to work on scikit-learn through the
-`Google Summer of Code <http://en.wikipedia.org/wiki/Google_Summer_of_Code>`_
+`Google Summer of Code <https://en.wikipedia.org/wiki/Google_Summer_of_Code>`_
 program.
 
 - 2007 - David Cournapeau
@@ -102,14 +102,14 @@ program.
 It also provided funding for sprints and events around scikit-learn. If
 you would like to participate in the next Google Summer of code
 program, please see `this page
-<http://github.com/scikit-learn/scikit-learn/wiki/SummerOfCode>`_
+<https://github.com/scikit-learn/scikit-learn/wiki/SummerOfCode>`_
 
 The `NeuroDebian <http://neuro.debian.net>`_ project providing `Debian
 <http://www.debian.org>`_ packaging and contributions is supported by
 `Dr. James V. Haxby <http://haxbylab.dartmouth.edu/>`_ (`Dartmouth
-College <http://www.dartmouth.edu/~psych/>`_).
+College <http://pbs.dartmouth.edu>`_).
 
-The `PSF <http://www.python.org/psf/>`_ helped find and manage funding for our
+The `PSF <https://www.python.org/psf/>`_ helped find and manage funding for our
 2011 Granada sprint. More information can be found `here
 <https://github.com/scikit-learn/scikit-learn/wiki/Past-sprints#granada-19th-21th-dec-2011>`__
 
@@ -121,12 +121,12 @@ Donating to the project
 ~~~~~~~~~~~~~~~~~~~~~~~
 
 If you are interested in donating to the project or to one of our code-sprints, you can use
-the *Paypal* button below or the `NumFOCUS Donations Page <http://numfocus.org/donatejoin/>`_ (if you use the latter, please indicate that you are donating for the scikit-learn project).
+the *Paypal* button below or the `NumFOCUS Donations Page <http://www.numfocus.org/support-numfocus.html>`_ (if you use the latter, please indicate that you are donating for the scikit-learn project).
 
 All donations will be handled by `NumFOCUS
-<http://numfocus.org/donations>`_, a non-profit-organization which is
+<http://www.numfocus.org>`_, a non-profit-organization which is
 managed by a board of `Scipy community members
-<http://numfocus.org/board>`_. NumFOCUS's mission is to foster
+<http://www.numfocus.org/board>`_. NumFOCUS's mission is to foster
 scientific computing software, in particular in Python. As a fiscal home
 of scikit-learn, it ensures that money is available when needed to keep
 the project funded and available while in compliance with tax regulations.
diff --git a/doc/datasets/twenty_newsgroups.rst b/doc/datasets/twenty_newsgroups.rst
index 0a2f313934f507f8b7ac44fb68aec056a5cc858f..01c2a53ff77e5401068b3991fe052ff5559ddfcb 100644
--- a/doc/datasets/twenty_newsgroups.rst
+++ b/doc/datasets/twenty_newsgroups.rst
@@ -111,7 +111,7 @@ components by sample in a more than 30000-dimensional space
 ready-to-use tfidf features instead of file names.
 
 .. _`20 newsgroups website`: http://people.csail.mit.edu/jrennie/20Newsgroups/
-.. _`TF-IDF`: http://en.wikipedia.org/wiki/Tf-idf
+.. _`TF-IDF`: https://en.wikipedia.org/wiki/Tf-idf
 
 
 Filtering text for more realistic training
diff --git a/doc/developers/advanced_installation.rst b/doc/developers/advanced_installation.rst
index 8fa4d5bb98c4a96d4db84e85748e27ad9bd70a3e..29e8e54d275d3f216a02471a3b392f8c7a26d052 100644
--- a/doc/developers/advanced_installation.rst
+++ b/doc/developers/advanced_installation.rst
@@ -140,7 +140,7 @@ from source package
 ~~~~~~~~~~~~~~~~~~~
 
 download the source package from 
-`pypi <http://pypi.python.org/pypi/scikit-learn/>`_,
+`pypi <https://pypi.python.org/pypi/scikit-learn>`_,
 , unpack the sources and cd into the source directory.
 
 this packages uses distutils, which is the default way of installing
@@ -163,12 +163,12 @@ or alternatively (also from within the scikit-learn source folder)::
 windows
 -------
 
-first, you need to install `numpy <http://numpy.scipy.org/>`_ and `scipy
+first, you need to install `numpy <http://www.numpy.org/>`_ and `scipy
 <http://www.scipy.org/>`_ from their own official installers.
 
 wheel packages (.whl files) for scikit-learn from `pypi
 <https://pypi.python.org/pypi/scikit-learn/>`_ can be installed with the `pip
-<http://pip.readthedocs.org/en/latest/installing.html>`_ utility.
+<https://pip.readthedocs.org/en/stable/installing/>`_ utility.
 open a console and type the following to install or upgrade scikit-learn to the
 latest stable release::
 
@@ -280,9 +280,7 @@ path environment variable.
 
 for 32-bit python it is possible use the standalone installers for
 `microsoft visual c++ express 2008 <http://go.microsoft.com/?linkid=7729279>`_
-for python 2 or
-`microsoft visual c++ express 2010 <http://go.microsoft.com/?linkid=9709949>`_
-or python 3.
+for python 2 or microsoft visual c++ express 2010 for python 3.
 
 once installed you should be able to build scikit-learn without any
 particular configuration by running the following command in the scikit-learn
diff --git a/doc/developers/contributing.rst b/doc/developers/contributing.rst
index 604326b34298241adbcd955a236f8331d562be72..ed89072d596a5846338ac0e2381841accdcf09a8 100644
--- a/doc/developers/contributing.rst
+++ b/doc/developers/contributing.rst
@@ -7,7 +7,7 @@ Contributing
 This project is a community effort, and everyone is welcome to
 contribute.
 
-The project is hosted on http://github.com/scikit-learn/scikit-learn
+The project is hosted on https://github.com/scikit-learn/scikit-learn
 
 Scikit-learn is somewhat :ref:`selective <selectiveness>` when it comes to
 adding new algorithms, and the best way to contribute and to help the project
@@ -19,7 +19,7 @@ Submitting a bug report
 
 In case you experience issues using this package, do not hesitate to submit a
 ticket to the
-`Bug Tracker <http://github.com/scikit-learn/scikit-learn/issues>`_. You are
+`Bug Tracker <https://github.com/scikit-learn/scikit-learn/issues>`_. You are
 also welcome to post feature requests or pull requests.
 
 
@@ -29,7 +29,7 @@ Retrieving the latest code
 ==========================
 
 We use `Git <http://git-scm.com/>`_ for version control and
-`GitHub <http://github.com/>`_ for hosting our main repository.
+`GitHub <https://github.com/>`_ for hosting our main repository.
 
 You can check out the latest sources with the command::
 
@@ -82,14 +82,14 @@ How to contribute
 -----------------
 
 The preferred way to contribute to scikit-learn is to fork the `main
-repository <http://github.com/scikit-learn/scikit-learn/>`__ on GitHub,
+repository <https://github.com/scikit-learn/scikit-learn/>`__ on GitHub,
 then submit a "pull request" (PR):
 
- 1. `Create an account <https://github.com/signup/free>`_ on
+ 1. `Create an account <https://github.com/join>`_ on
     GitHub if you do not already have one.
 
  2. Fork the `project repository
-    <http://github.com/scikit-learn/scikit-learn>`__: click on the 'Fork'
+    <https://github.com/scikit-learn/scikit-learn>`__: click on the 'Fork'
     button near the top of the page. This creates a copy of the code under your
     account on the GitHub server.
 
@@ -237,8 +237,8 @@ and are viewable in a web browser. See the README file in the doc/ directory
 for more information.
 
 For building the documentation, you will need `sphinx
-<http://sphinx.pocoo.org/>`_,
-`matplotlib <http://matplotlib.sourceforge.net/>`_ and
+<http://sphinx-doc.org/>`_,
+`matplotlib <http://matplotlib.org>`_ and
 `pillow <http://pillow.readthedocs.org/en/latest/>`_.
 
 **When you are writing documentation**, it is important to keep a good
@@ -297,7 +297,7 @@ Finally, follow the formatting rules below to make it consistently good:
 Testing and improving test coverage
 ------------------------------------
 
-High-quality `unit testing <http://en.wikipedia.org/wiki/Unit_testing>`_
+High-quality `unit testing <https://en.wikipedia.org/wiki/Unit_testing>`_
 is a corner-stone of the scikit-learn development process. For this
 purpose, we use the `nose <http://nose.readthedocs.org/en/latest/>`_
 package. The tests are functions appropriately named, located in `tests`
@@ -313,7 +313,7 @@ We expect code coverage of new features to be at least around 90%.
 .. note:: **Workflow to improve test coverage**
 
    To test code coverage, you need to install the `coverage
-   <http://pypi.python.org/pypi/coverage>`_ package in addition to nose.
+   <https://pypi.python.org/pypi/coverage>`_ package in addition to nose.
 
    1. Run 'make test-coverage'. The output lists for each file the line
       numbers that are not tested.
@@ -392,7 +392,7 @@ the review easier so new code can be integrated in less time.
 
 Uniformly formatted code makes it easier to share code ownership. The
 scikit-learn project tries to closely follow the official Python guidelines
-detailed in `PEP8 <http://www.python.org/dev/peps/pep-0008/>`_ that
+detailed in `PEP8 <https://www.python.org/dev/peps/pep-0008>`_ that
 detail how code should be formatted and indented. Please read it and
 follow it.
 
@@ -414,7 +414,7 @@ In addition, we add the following guidelines:
 
     * **Please don't use** ``import *`` **in any case**. It is considered harmful
       by the `official Python recommendations
-      <http://docs.python.org/howto/doanddont.html#from-module-import>`_.
+      <https://docs.python.org/2/howto/doanddont.html#from-module-import>`_.
       It makes the code harder to read as the origin of symbols is no
       longer explicitly referenced, but most important, it prevents
       using a static analysis tool like `pyflakes
diff --git a/doc/developers/performance.rst b/doc/developers/performance.rst
index d476c45d0b181347ebf6708677911739855ea971..5127fddbc8aeda060bc3ba38b32ec7784e9922cc 100644
--- a/doc/developers/performance.rst
+++ b/doc/developers/performance.rst
@@ -40,7 +40,7 @@ this means trying to **replace any nested for loops by calls to equivalent
 Numpy array methods**. The goal is to avoid the CPU wasting time in the
 Python interpreter rather than crunching numbers to fit your statistical
 model. It's generally a good idea to consider NumPy and SciPy performance tips:
-http://wiki.scipy.org/PerformanceTips
+http://scipy.github.io/old-wiki/pages/PerformanceTips
 
 Sometimes however an algorithm cannot be expressed efficiently in simple
 vectorized Numpy code. In this case, the recommended strategy is the
@@ -304,7 +304,7 @@ Memory usage profiling
 ======================
 
 You can analyze in detail the memory usage of any Python code with the help of
-`memory_profiler <http://pypi.python.org/pypi/memory_profiler>`_. First,
+`memory_profiler <https://pypi.python.org/pypi/memory_profiler>`_. First,
 install the latest version::
 
     $ pip install -U memory_profiler
@@ -401,7 +401,7 @@ project.
 TODO: html report, type declarations, bound checks, division by zero checks,
 memory alignment, direct blas calls...
 
-- http://www.euroscipy.org/file/3696?vid=download
+- https://www.youtube.com/watch?v=gMvkiQ-gOW8
 - http://conference.scipy.org/proceedings/SciPy2009/paper_1/
 - http://conference.scipy.org/proceedings/SciPy2009/paper_2/
 
@@ -421,8 +421,8 @@ Using yep and google-perftools
 
 Easy profiling without special compilation options use yep:
 
-- http://pypi.python.org/pypi/yep
-- http://fseoane.net/blog/2011/a-profiler-for-python-extensions/
+- https://pypi.python.org/pypi/yep
+- http://fa.bianp.net/blog/2011/a-profiler-for-python-extensions
 
 .. note::
 
@@ -430,7 +430,7 @@ Easy profiling without special compilation options use yep:
   can be triggered with the ``--lines`` option. However this
   does not seem to work correctly at the time of writing. This
   issue can be tracked on the `project issue tracker
-  <https://code.google.com/p/google-perftools/issues/detail?id=326>`_.
+  <https://github.com/gperftools/gperftools>`_.
 
 
 
@@ -460,7 +460,7 @@ TODO: give a simple teaser example here.
 
 Checkout the official joblib documentation:
 
-- http://packages.python.org/joblib/
+- https://pythonhosted.org/joblib
 
 
 .. _warm-restarts:
diff --git a/doc/developers/utilities.rst b/doc/developers/utilities.rst
index 88096a1b77519f04163933ccd4865f00cee04bc1..9ef9f6cd3a88607268906ef5db19eef00b5e260a 100644
--- a/doc/developers/utilities.rst
+++ b/doc/developers/utilities.rst
@@ -93,7 +93,7 @@ Efficient Linear Algebra & Array Operations
   by directly calling the BLAS
   ``nrm2`` function.  This is more stable than ``scipy.linalg.norm``.  See
   `Fabian's blog post
-  <http://fseoane.net/blog/2011/computing-the-vector-norm/>`_ for a discussion.
+  <http://fa.bianp.net/blog/2011/computing-the-vector-norm>`_ for a discussion.
 
 - :func:`extmath.fast_logdet`: efficiently compute the log of the determinant
   of a matrix.
diff --git a/doc/install.rst b/doc/install.rst
index 7edcd72c9a4d7cc1da4b6a9e274db0fa9811f0b2..0b58c0b6e28a281866e763359c0991d9b9c5a487 100644
--- a/doc/install.rst
+++ b/doc/install.rst
@@ -51,8 +51,8 @@ Canopy and Anaconda for all supported platforms
 -----------------------------------------------
 
 `Canopy
-<http://www.enthought.com/products/canopy>`_ and `Anaconda
-<https://store.continuum.io/cshop/anaconda/>`_ both ship a recent
+<https://www.enthought.com/products/canopy>`_ and `Anaconda
+<https://www.continuum.io/downloads>`_ both ship a recent
 version of scikit-learn, in addition to a large set of scientific python
 library for Windows, Mac OSX and Linux.
 
@@ -83,9 +83,8 @@ Anaconda offers scikit-learn as part of its free distribution.
 Python(x,y) for Windows
 -----------------------
 
-The `Python(x,y) <https://code.google.com/p/pythonxy/>`_ project distributes
-scikit-learn as an additional plugin, which can be found in the `Additional
-plugins <http://code.google.com/p/pythonxy/wiki/AdditionalPlugins>`_ page.
+The `Python(x,y) <https://python-xy.github.io>`_ project distributes
+scikit-learn as an additional plugin.
 
 
 For installation instructions for particular operating systems or for compiling
diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
index 49efecb697741edcffd18db18ecc049a4aca785c..70b5b1f879160ee6412f89a06a471675478187f6 100644
--- a/doc/modules/clustering.rst
+++ b/doc/modules/clustering.rst
@@ -1010,7 +1010,7 @@ random labelings by defining the adjusted Rand index as follows:
 .. topic:: References
 
  * `Comparing Partitions
-   <http://www.springerlink.com/content/x64124718341j1j0/>`_
+   <http://link.springer.com/article/10.1007%2FBF01908075>`_
    L. Hubert and P. Arabie, Journal of Classification 1985
 
  * `Wikipedia entry for the adjusted Rand index
@@ -1170,7 +1170,7 @@ calculated using a similar form to that of the adjusted Rand index:
  * Vinh, Epps, and Bailey, (2009). "Information theoretic measures
    for clusterings comparison". Proceedings of the 26th Annual International
    Conference on Machine Learning - ICML '09.
-   `doi:10.1145/1553374.1553511 <http://dx.doi.org/10.1145/1553374.1553511>`_.
+   `doi:10.1145/1553374.1553511 <https://dl.acm.org/citation.cfm?doid=1553374.1553511>`_.
    ISBN 9781605585161.
 
  * Vinh, Epps, and Bailey, (2010). Information Theoretic Measures for
diff --git a/doc/modules/computational_performance.rst b/doc/modules/computational_performance.rst
index cc5a792a47d572cc3a69f4d78eb25d79570a478d..a3a488ca6ddcc3910024a34aeae294092d7101b8 100644
--- a/doc/modules/computational_performance.rst
+++ b/doc/modules/computational_performance.rst
@@ -241,8 +241,8 @@ Linear algebra libraries
 As scikit-learn relies heavily on Numpy/Scipy and linear algebra in general it
 makes sense to take explicit care of the versions of these libraries.
 Basically, you ought to make sure that Numpy is built using an optimized `BLAS
-<http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms>`_ /
-`LAPACK <http://en.wikipedia.org/wiki/LAPACK>`_ library.
+<https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms>`_ /
+`LAPACK <https://en.wikipedia.org/wiki/LAPACK>`_ library.
 
 Not all models benefit from optimized BLAS and Lapack implementations. For
 instance models based on (randomized) decision trees typically do not rely on
@@ -308,7 +308,7 @@ compromise between model compactness and prediction power. One can also
 further tune the ``l1_ratio`` parameter (in combination with the
 regularization strength ``alpha``) to control this tradeoff.
 
-A typical `benchmark <https://github.com/scikit-learn/scikit-learn/tree/master/benchmarks/bench_sparsify.py>`_
+A typical `benchmark <https://github.com/scikit-learn/scikit-learn/blob/master/benchmarks/bench_sparsify.py>`_
 on synthetic data yields a >30% decrease in latency when both the model and
 input are sparse (with 0.000024 and 0.027400 non-zero coefficients ratio
 respectively). Your mileage may vary depending on the sparsity and size of
diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
index c4e4965457f013a241fdc7eb16622f9c3b9267ae..92be9e6e9ebf7b7f5e216dee7b912e89f80a69de 100644
--- a/doc/modules/cross_validation.rst
+++ b/doc/modules/cross_validation.rst
@@ -66,7 +66,7 @@ and the results can depend on a particular random choice for the pair of
 (train, validation) sets.
 
 A solution to this problem is a procedure called
-`cross-validation <http://en.wikipedia.org/wiki/Cross-validation_(statistics)>`_
+`cross-validation <https://en.wikipedia.org/wiki/Cross-validation_(statistics)>`_
 (CV for short).
 A test set should still be held out for final evaluation,
 but the validation set is no longer needed when doing CV.
@@ -337,11 +337,11 @@ fold cross validation should be preferred to LOO.
 
  * `<http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html>`_;
  * T. Hastie, R. Tibshirani, J. Friedman,  `The Elements of Statistical Learning
-   <http://www-stat.stanford.edu/~tibs/ElemStatLearn>`_, Springer 2009;
+   <http://statweb.stanford.edu/~tibs/ElemStatLearn>`_, Springer 2009
  * L. Breiman, P. Spector `Submodel selection and evaluation in regression: The X-random case
    <http://digitalassets.lib.berkeley.edu/sdtr/ucb/text/197.pdf>`_, International Statistical Review 1992;
  * R. Kohavi, `A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
-   <http://www.cs.iastate.edu/~jtian/cs573/Papers/Kohavi-IJCAI-95.pdf>`_, Intl. Jnt. Conf. AI;
+   <http://web.cs.iastate.edu/~jtian/cs573/Papers/Kohavi-IJCAI-95.pdf>`_, Intl. Jnt. Conf. AI
  * R. Bharat Rao, G. Fung, R. Rosales, `On the Dangers of Cross-Validation. An Experimental Evaluation
    <http://www.siam.org/proceedings/datamining/2008/dm08_54_Rao.pdf>`_, SIAM 2008;
  * G. James, D. Witten, T. Hastie, R Tibshirani, `An Introduction to
diff --git a/doc/modules/decomposition.rst b/doc/modules/decomposition.rst
index a7905ffebed2acdf0b28f477a3b7bdf0884934d4..aec4a74dca9eb1c38ecc749f378fb3018469ff18 100644
--- a/doc/modules/decomposition.rst
+++ b/doc/modules/decomposition.rst
@@ -732,7 +732,7 @@ and the regularized objective function is:
 .. topic:: References:
 
     * `"Learning the parts of objects by non-negative matrix factorization"
-      <http://hebb.mit.edu/people/seung/papers/ls-lponm-99.pdf>`_
+      <http://www.columbia.edu/~jwp2128/Teaching/W4721/papers/nmf_nature.pdf>`_
       D. Lee, S. Seung, 1999
 
     * `"Non-negative Matrix Factorization with Sparseness Constraints"
diff --git a/doc/modules/density.rst b/doc/modules/density.rst
index c9f5c271f7f15ec2190181671130f2c3779903e9..f96f4004e73238fe17e3ea65fe30610e17808128 100644
--- a/doc/modules/density.rst
+++ b/doc/modules/density.rst
@@ -139,7 +139,7 @@ The kernel density estimator can be used with any of the valid distance
 metrics (see :class:`sklearn.neighbors.DistanceMetric` for a list of available metrics), though
 the results are properly normalized only for the Euclidean metric.  One
 particularly useful metric is the
-`Haversine distance <http://en.wikipedia.org/wiki/Haversine_formula>`_
+`Haversine distance <https://en.wikipedia.org/wiki/Haversine_formula>`_
 which measures the angular distance between points on a sphere.  Here
 is an example of using a kernel density estimate for a visualization
 of geospatial data, in this case the distribution of observations of two
diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst
index 2a340f7a19e03078e39f55825073511cd20acaed..d00a9e968836353fcad736dfa7e31d4c09d7c7bf 100644
--- a/doc/modules/ensemble.rst
+++ b/doc/modules/ensemble.rst
@@ -414,7 +414,7 @@ decision trees).
 Gradient Tree Boosting
 ======================
 
-`Gradient Tree Boosting <http://en.wikipedia.org/wiki/Gradient_boosting>`_
+`Gradient Tree Boosting <https://en.wikipedia.org/wiki/Gradient_boosting>`_
 or Gradient Boosted Regression Trees (GBRT) is a generalization
 of boosting to arbitrary
 differentiable loss functions. GBRT is an accurate and effective
diff --git a/doc/modules/feature_extraction.rst b/doc/modules/feature_extraction.rst
index 9a64c9d8ac80c39d63846b8f9802cfc2f77aabe2..c0cfa5d3184edb2f94df9529e9fdca66d58a5e31 100644
--- a/doc/modules/feature_extraction.rst
+++ b/doc/modules/feature_extraction.rst
@@ -552,7 +552,7 @@ For an introduction to Unicode and character encodings in general,
 see Joel Spolsky's `Absolute Minimum Every Software Developer Must Know
 About Unicode <http://www.joelonsoftware.com/articles/Unicode.html>`_.
 
-.. _`ftfy`: http://github.com/LuminosoInsight/python-ftfy
+.. _`ftfy`: https://github.com/LuminosoInsight/python-ftfy
 
 
 Applications and examples
@@ -748,7 +748,7 @@ An interesting development of using a :class:`HashingVectorizer` is the ability
 to perform `out-of-core`_ scaling. This means that we can learn from data that
 does not fit into the computer's main memory.
 
-.. _out-of-core: http://en.wikipedia.org/wiki/Out-of-core_algorithm
+.. _out-of-core: https://en.wikipedia.org/wiki/Out-of-core_algorithm
 
 A strategy to implement out-of-core scaling is to stream data to the estimator
 in mini-batches. Each mini-batch is vectorized using :class:`HashingVectorizer`
diff --git a/doc/modules/feature_selection.rst b/doc/modules/feature_selection.rst
index 1fee0773aede8fc80e67913f78148db99856fc12..63acf09c7903baff4a8b83b2d92471ce1fe68704 100644
--- a/doc/modules/feature_selection.rst
+++ b/doc/modules/feature_selection.rst
@@ -225,7 +225,7 @@ alpha parameter, the fewer features selected.
 
    **Reference** Richard G. Baraniuk "Compressive Sensing", IEEE Signal
    Processing Magazine [120] July 2007
-   http://dsp.rice.edu/files/cs/baraniukCSlecture07.pdf
+   http://dsp.rice.edu/sites/dsp.rice.edu/files/cs/baraniukCSlecture07.pdf
 
 .. _randomized_l1:
 
diff --git a/doc/modules/gaussian_process.rst b/doc/modules/gaussian_process.rst
index efe9ad862eed2c6eee37f319a0f1812b6429cd67..44e4eec8775293bc8846b7cc576d813e8842fefb 100644
--- a/doc/modules/gaussian_process.rst
+++ b/doc/modules/gaussian_process.rst
@@ -887,7 +887,7 @@ toolbox.
 .. topic:: References:
 
     * `DACE, A Matlab Kriging Toolbox
-      <http://www2.imm.dtu.dk/~hbn/dace/>`_ S Lophaven, HB Nielsen, J
+      <http://imedea.uib-csic.es/master/cambioglobal/Modulo_V_cod101615/Lab/lab_maps/krigging/DACE-krigingsoft/dace/dace.pdf>`_ S Lophaven, HB Nielsen, J
       Sondergaard 2002,
 
     * W.J. Welch, R.J. Buck, J. Sacks, H.P. Wynn, T.J. Mitchell, and M.D.
diff --git a/doc/modules/kernel_approximation.rst b/doc/modules/kernel_approximation.rst
index 80da380746514bc046a2fedd5ebac91bf506d66a..fd0fe7be0b1d88d05e684e2681356d8fc77f95db 100644
--- a/doc/modules/kernel_approximation.rst
+++ b/doc/modules/kernel_approximation.rst
@@ -13,7 +13,7 @@ algorithms.
 .. currentmodule:: sklearn.linear_model
 
 The advantage of using approximate explicit feature maps compared to the
-`kernel trick <http://en.wikipedia.org/wiki/Kernel_trick>`_,
+`kernel trick <https://en.wikipedia.org/wiki/Kernel_trick>`_,
 which makes use of feature maps implicitly, is that explicit mappings
 can be better suited for online learning and can significantly reduce the cost
 of learning with very large datasets.
@@ -196,12 +196,12 @@ or store training examples.
       <http://www.robots.ox.ac.uk/~vgg/rg/papers/randomfeatures.pdf>`_
       Rahimi, A. and Recht, B. - Advances in neural information processing 2007,
     .. [LS2010] `"Random Fourier approximations for skewed multiplicative histogram kernels"
-      <http://sminchisescu.ins.uni-bonn.de/papers/lis_dagm10.pdf>`_
+      <http://www.maths.lth.se/matematiklth/personal/sminchis/papers/lis_dagm10.pdf>`_
       Random Fourier approximations for skewed multiplicative histogram kernels
       - Lecture Notes for Computer Sciencd (DAGM)
     .. [VZ2010] `"Efficient additive kernels via explicit feature maps"
-      <http://eprints.pascal-network.org/archive/00006964/01/vedaldi10.pdf>`_
+      <https://www.robots.ox.ac.uk/~vgg/publications/2011/Vedaldi11/vedaldi11.pdf>`_
       Vedaldi, A. and Zisserman, A. - Computer Vision and Pattern Recognition 2010
     .. [VVZ2010] `"Generalized RBF feature maps for Efficient Detection"
-      <http://eprints.pascal-network.org/archive/00007024/01/inproceedings.pdf.8a865c2a5421e40d.537265656b616e7468313047656e6572616c697a65642e706466.pdf>`_
+      <https://www.robots.ox.ac.uk/~vgg/publications/2010/Sreekanth10/sreekanth10.pdf>`_
       Vempati, S. and Vedaldi, A. and Zisserman, A. and Jawahar, CV - 2010
diff --git a/doc/modules/label_propagation.rst b/doc/modules/label_propagation.rst
index ead3051470a4e350933d2c47e0014239bc38128a..f4af92eff56e96549701d6f0af0b8ce6a374a596 100644
--- a/doc/modules/label_propagation.rst
+++ b/doc/modules/label_propagation.rst
@@ -7,7 +7,7 @@ Semi-Supervised
 .. currentmodule:: sklearn.semi_supervised
 
 `Semi-supervised learning
-<http://en.wikipedia.org/wiki/Semi-supervised_learning>`_ is a situation
+<https://en.wikipedia.org/wiki/Semi-supervised_learning>`_ is a situation
 in which in your training data some of the samples are not labeled. The
 semi-supervised estimators in :mod:`sklearn.semi_supervised` are able to
 make use of this additional unlabeled data to better capture the shape of
diff --git a/doc/modules/learning_curve.rst b/doc/modules/learning_curve.rst
index 8708ef8c7acdf8909fe5ab920097ecbf2f127fec..39ecbcbe76a58a53a8e18053178fdb8582b2a4ea 100644
--- a/doc/modules/learning_curve.rst
+++ b/doc/modules/learning_curve.rst
@@ -29,7 +29,7 @@ very well, i.e. it is very sensitive to varying training data (high variance).
 Bias and variance are inherent properties of estimators and we usually have to
 select learning algorithms and hyperparameters so that both bias and variance
 are as low as possible (see `Bias-variance dilemma
-<http://en.wikipedia.org/wiki/Bias-variance_dilemma>`_). Another way to reduce
+<https://en.wikipedia.org/wiki/Bias-variance_dilemma>`_). Another way to reduce
 the variance of a model is to use more training data. However, you should only
 collect more training data if the true function is too complex to be
 approximated by an estimator with a lower variance.
diff --git a/doc/modules/linear_model.rst b/doc/modules/linear_model.rst
index ccca390d7edd57b567af90dda7c34e954916ebe0..435e026f29450f0c127d4526556a1ab349be60b9 100644
--- a/doc/modules/linear_model.rst
+++ b/doc/modules/linear_model.rst
@@ -530,7 +530,7 @@ parameters in the estimation procedure: the regularization parameter is
 not set in a hard sense but tuned to the data at hand.
 
 This can be done by introducing `uninformative priors
-<http://en.wikipedia.org/wiki/Non-informative_prior#Uninformative_priors>`__
+<https://en.wikipedia.org/wiki/Non-informative_prior#Uninformative_priors>`__
 over the hyper parameters of the model.
 The :math:`\ell_{2}` regularization used in `Ridge Regression`_ is equivalent
 to finding a maximum a-postiori solution under a Gaussian prior over the
@@ -579,7 +579,7 @@ The prior for the parameter :math:`w` is given by a spherical Gaussian:
     \mathcal{N}(w|0,\lambda^{-1}\bold{I_{p}})
 
 The priors over :math:`\alpha` and :math:`\lambda` are chosen to be `gamma
-distributions <http://en.wikipedia.org/wiki/Gamma_distribution>`__, the
+distributions <https://en.wikipedia.org/wiki/Gamma_distribution>`__, the
 conjugate prior for the precision of the Gaussian.
 
 The resulting model is called *Bayesian Ridge Regression*, and is similar to the
@@ -691,10 +691,8 @@ Logistic regression
 
 Logistic regression, despite its name, is a linear model for classification
 rather than regression. Logistic regression is also known in the literature as
-logit regression, maximum-entropy classification (MaxEnt) or the log-linear
-classifier. In this model, the probabilities describing the possible outcomes
-of a single trial are modeled using a `logistic function
-<http://en.wikipedia.org/wiki/Logistic_function>`_.
+logit regression, maximum-entropy classification (MaxEnt)
+or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a `logistic function <https://en.wikipedia.org/wiki/Logistic_function>`_.
 
 The implementation of logistic regression in scikit-learn can be accessed from
 class :class:`LogisticRegression`. This implementation can fit binary, One-vs-
@@ -995,7 +993,7 @@ performance.
 
 .. topic:: References:
 
- * http://en.wikipedia.org/wiki/RANSAC
+ * https://en.wikipedia.org/wiki/RANSAC
  * `"Random Sample Consensus: A Paradigm for Model Fitting with Applications to
    Image Analysis and Automated Cartography"
    <http://www.cs.columbia.edu/~belhumeur/courses/compPhoto/ransac.pdf>`_
@@ -1022,7 +1020,7 @@ better than an ordinary least squares in high dimension.
 
 .. topic:: References:
 
- * http://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator
+ * https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator
 
 Theoretical considerations
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1063,7 +1061,7 @@ considering only a random subset of all possible combinations.
 
 .. topic:: References:
 
-    .. [#f1] Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang: `Theil-Sen Estimators in a Multiple Linear Regression Model. <http://www.math.iupui.edu/~hpeng/MTSE_0908.pdf>`_
+    .. [#f1] Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang: `Theil-Sen Estimators in a Multiple Linear Regression Model. <http://home.olemiss.edu/~xdang/papers/MTSE.pdf>`_
 
     .. [#f2] T. Kärkkäinen and S. Äyrämö: `On Computation of Spatial Median for Robust Data Mining. <http://users.jyu.fi/~samiayr/pdf/ayramo_eurogen05.pdf>`_
 
diff --git a/doc/modules/manifold.rst b/doc/modules/manifold.rst
index 09a3ba222ca6c49dc04731bbaee12514ba9884d0..b1b0aac40e76919eda88de41aabc2ce43ae1086e 100644
--- a/doc/modules/manifold.rst
+++ b/doc/modules/manifold.rst
@@ -343,7 +343,7 @@ The overall complexity of spectral embedding is
 
    * `"Laplacian Eigenmaps for Dimensionality Reduction
      and Data Representation" 
-     <http://www.cse.ohio-state.edu/~mbelkin/papers/LEM_NC_03.pdf>`_
+     <http://web.cse.ohio-state.edu/~mbelkin/papers/LEM_NC_03.pdf>`_
      M. Belkin, P. Niyogi, Neural Computation, June 2003; 15 (6):1373-1396
 
 
@@ -397,7 +397,7 @@ The overall complexity of standard LTSA is
 Multi-dimensional Scaling (MDS)
 ===============================
 
-`Multidimensional scaling <http://en.wikipedia.org/wiki/Multidimensional_scaling>`_
+`Multidimensional scaling <https://en.wikipedia.org/wiki/Multidimensional_scaling>`_
 (:class:`MDS`) seeks a low-dimensional
 representation of the data in which the distances respect well the
 distances in the original high-dimensional space.
@@ -461,15 +461,15 @@ order to avoid that, the disparities :math:`\hat{d}_{ij}` are normalized.
 .. topic:: References:
 
   * `"Modern Multidimensional Scaling - Theory and Applications"
-    <http://www.springer.com/statistics/social+sciences+%26+law/book/978-0-387-25150-9>`_
+    <http://www.springer.com/fr/book/9780387251509>`_
     Borg, I.; Groenen P. Springer Series in Statistics (1997)
 
   * `"Nonmetric multidimensional scaling: a numerical method"
-    <http://www.springerlink.com/content/tj18655313945114/>`_
+    <http://link.springer.com/article/10.1007%2FBF02289694>`_
     Kruskal, J. Psychometrika, 29 (1964)
 
   * `"Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis"
-    <http://www.springerlink.com/content/010q1x323915712x/>`_
+    <http://link.springer.com/article/10.1007%2FBF02289565>`_
     Kruskal, J. Psychometrika, 29, (1964)
 
 .. _t_sne:
@@ -604,7 +604,7 @@ the internal structure of the data.
     van der Maaten, L.J.P.
 
   * `"Accelerating t-SNE using Tree-Based Algorithms."
-    <http://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf>`_
+    <https://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf>`_
     L.J.P. van der Maaten.  Journal of Machine Learning Research 15(Oct):3221-3245, 2014.
 
 Tips on practical use
diff --git a/doc/modules/mixture.rst b/doc/modules/mixture.rst
index a56160367fbb8b03c52ea8ec06b38bd80c50083f..c2ee9da243a798f9a800371626576235dfe61231 100644
--- a/doc/modules/mixture.rst
+++ b/doc/modules/mixture.rst
@@ -122,7 +122,7 @@ data is that it is one usually doesn't know which points came from
 which latent component (if one has access to this information it gets
 very easy to fit a separate Gaussian distribution to each set of
 points). `Expectation-maximization
-<http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm>`_
+<https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm>`_
 is a well-founded statistical
 algorithm to get around this problem by an iterative process. First
 one assumes random components (randomly centered on data points,
@@ -287,7 +287,7 @@ An important question is how can the Dirichlet process use an
 infinite, unbounded number of clusters and still be consistent. While
 a full explanation doesn't fit this manual, one can think of its
 `chinese restaurant process
-<http://en.wikipedia.org/wiki/Chinese_restaurant_process>`_ 
+<https://en.wikipedia.org/wiki/Chinese_restaurant_process>`_
 analogy to help understanding it. The
 chinese restaurant process is a generative story for the Dirichlet
 process. Imagine a chinese restaurant with an infinite number of
diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
index ea69944e1d5622786c497697850356d2af63ee72..05bd3e60bee204508a73c609b1514b13b8290c4d 100644
--- a/doc/modules/model_evaluation.rst
+++ b/doc/modules/model_evaluation.rst
@@ -314,7 +314,7 @@ Accuracy score
 --------------
 
 The :func:`accuracy_score` function computes the
-`accuracy <http://en.wikipedia.org/wiki/Accuracy_and_precision>`_, either the fraction
+`accuracy <https://en.wikipedia.org/wiki/Accuracy_and_precision>`_, either the fraction
 (default) or the count (normalize=False) of correct predictions.
 
 
@@ -332,7 +332,7 @@ defined as
    \texttt{accuracy}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} 1(\hat{y}_i = y_i)
 
 where :math:`1(x)` is the `indicator function
-<http://en.wikipedia.org/wiki/Indicator_function>`_.
+<https://en.wikipedia.org/wiki/Indicator_function>`_.
 
   >>> import numpy as np
   >>> from sklearn.metrics import accuracy_score
@@ -378,7 +378,7 @@ Confusion matrix
 
 The :func:`confusion_matrix` function evaluates
 classification accuracy by computing the `confusion matrix
-<http://en.wikipedia.org/wiki/Confusion_matrix>`_.
+<https://en.wikipedia.org/wiki/Confusion_matrix>`_.
 
 By definition, entry :math:`i, j` in a confusion matrix is
 the number of observations actually in group :math:`i`, but
@@ -457,7 +457,7 @@ Hamming loss
 -------------
 
 The :func:`hamming_loss` computes the average Hamming loss or `Hamming
-distance <http://en.wikipedia.org/wiki/Hamming_distance>`_ between two sets
+distance <https://en.wikipedia.org/wiki/Hamming_distance>`_ between two sets
 of samples.
 
 If :math:`\hat{y}_j` is the predicted value for the :math:`j`-th label of
@@ -470,7 +470,7 @@ Hamming loss :math:`L_{Hamming}` between two samples is defined as:
    L_{Hamming}(y, \hat{y}) = \frac{1}{n_\text{labels}} \sum_{j=0}^{n_\text{labels} - 1} 1(\hat{y}_j \not= y_j)
 
 where :math:`1(x)` is the `indicator function
-<http://en.wikipedia.org/wiki/Indicator_function>`_. ::
+<https://en.wikipedia.org/wiki/Indicator_function>`_. ::
 
   >>> from sklearn.metrics import hamming_loss
   >>> y_pred = [1, 2, 3, 4]
@@ -501,7 +501,7 @@ Jaccard similarity coefficient score
 
 The :func:`jaccard_similarity_score` function computes the average (default)
 or sum of `Jaccard similarity coefficients
-<http://en.wikipedia.org/wiki/Jaccard_index>`_, also called the Jaccard index,
+<https://en.wikipedia.org/wiki/Jaccard_index>`_, also called the Jaccard index,
 between pairs of label sets.
 
 The Jaccard similarity coefficient of the :math:`i`-th samples,
@@ -537,12 +537,12 @@ Precision, recall and F-measures
 ---------------------------------
 
 Intuitively, `precision
-<http://en.wikipedia.org/wiki/Precision_and_recall#Precision>`_ is the ability
+<https://en.wikipedia.org/wiki/Precision_and_recall#Precision>`_ is the ability
 of the classifier not to label as positive a sample that is negative, and
-`recall <http://en.wikipedia.org/wiki/Precision_and_recall#Recall>`_ is the
+`recall <https://en.wikipedia.org/wiki/Precision_and_recall#Recall>`_ is the
 ability of the classifier to find all the positive samples.
 
-The  `F-measure <http://en.wikipedia.org/wiki/F1_score>`_
+The  `F-measure <https://en.wikipedia.org/wiki/F1_score>`_
 (:math:`F_\beta` and :math:`F_1` measures) can be interpreted as a weighted
 harmonic mean of the precision and recall. A
 :math:`F_\beta` measure reaches its best value at 1 and its worst score at 0.
@@ -747,7 +747,7 @@ Hinge loss
 
 The :func:`hinge_loss` function computes the average distance between
 the model and the data using
-`hinge loss <http://en.wikipedia.org/wiki/Hinge_loss>`_, a one-sided metric
+`hinge loss <https://en.wikipedia.org/wiki/Hinge_loss>`_, a one-sided metric
 that considers only prediction errors. (Hinge
 loss is used in maximal margin classifiers such as support vector machines.)
 
@@ -868,7 +868,7 @@ Matthews correlation coefficient
 ---------------------------------
 
 The :func:`matthews_corrcoef` function computes the
-`Matthew's correlation coefficient (MCC) <http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_
+`Matthew's correlation coefficient (MCC) <https://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_
 for binary classes.  Quoting Wikipedia:
 
 
@@ -904,7 +904,7 @@ Receiver operating characteristic (ROC)
 ---------------------------------------
 
 The function :func:`roc_curve` computes the
-`receiver operating characteristic curve, or ROC curve <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_.
+`receiver operating characteristic curve, or ROC curve <https://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_.
 Quoting Wikipedia :
 
   "A receiver operating characteristic (ROC), or simply ROC curve, is a
@@ -944,7 +944,7 @@ operating characteristic (ROC) curve, which is also denoted by
 AUC or AUROC.  By computing the
 area under the roc curve, the curve information is summarized in one number.
 For more information see the `Wikipedia article on AUC
-<http://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_curve>`_.
+<https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`_.
 
   >>> import numpy as np
   >>> from sklearn.metrics import roc_auc_score
@@ -1006,7 +1006,7 @@ then the 0-1 loss :math:`L_{0-1}` is defined as:
    L_{0-1}(y_i, \hat{y}_i) = 1(\hat{y}_i \not= y_i)
 
 where :math:`1(x)` is the `indicator function
-<http://en.wikipedia.org/wiki/Indicator_function>`_.
+<https://en.wikipedia.org/wiki/Indicator_function>`_.
 
 
   >>> from sklearn.metrics import zero_one_loss
@@ -1094,7 +1094,7 @@ score. This metric will yield better scores if you are able to give better rank
 to the labels associated with each sample. The obtained score is always strictly
 greater than 0, and the best value is 1. If there is exactly one relevant
 label per sample, label ranking average precision is equivalent to the `mean
-reciprocal rank <http://en.wikipedia.org/wiki/Mean_reciprocal_rank>`_.
+reciprocal rank <https://en.wikipedia.org/wiki/Mean_reciprocal_rank>`_.
 
 Formally, given a binary indicator matrix of the ground truth labels
 :math:`y \in \mathcal{R}^{n_\text{samples} \times n_\text{labels}}` and the
@@ -1198,11 +1198,11 @@ Explained variance score
 -------------------------
 
 The :func:`explained_variance_score` computes the `explained variance
-regression score <http://en.wikipedia.org/wiki/Explained_variation>`_.
+regression score <https://en.wikipedia.org/wiki/Explained_variation>`_.
 
 If :math:`\hat{y}` is the estimated target output, :math:`y` the corresponding
 (correct) target output, and :math:`Var` is `Variance
-<http://en.wikipedia.org/wiki/Variance>`_, the square of the standard deviation,
+<https://en.wikipedia.org/wiki/Variance>`_, the square of the standard deviation,
 then the explained variance is estimated as follow:
 
 .. math::
@@ -1234,7 +1234,7 @@ Mean absolute error
 -------------------
 
 The :func:`mean_absolute_error` function computes `mean absolute
-error <http://en.wikipedia.org/wiki/Mean_absolute_error>`_, a risk
+error <https://en.wikipedia.org/wiki/Mean_absolute_error>`_, a risk
 metric corresponding to the expected value of the absolute error loss or
 :math:`l1`-norm loss.
 
@@ -1269,7 +1269,7 @@ Mean squared error
 -------------------
 
 The :func:`mean_squared_error` function computes `mean square
-error <http://en.wikipedia.org/wiki/Mean_squared_error>`_, a risk
+error <https://en.wikipedia.org/wiki/Mean_squared_error>`_, a risk
 metric corresponding to the expected value of the squared (quadratic) error loss or
 loss.
 
@@ -1334,7 +1334,7 @@ R² score, the coefficient of determination
 -------------------------------------------
 
 The :func:`r2_score` function computes R², the `coefficient of
-determination <http://en.wikipedia.org/wiki/Coefficient_of_determination>`_.
+determination <https://en.wikipedia.org/wiki/Coefficient_of_determination>`_.
 It provides a measure of how well future samples are likely to
 be predicted by the model. Best possible score is 1.0 and it can be negative
 (because the model can be arbitrarily worse). A constant model that always
diff --git a/doc/modules/model_persistence.rst b/doc/modules/model_persistence.rst
index dfa0d4646638e43b15eb87e18386308074e03f6c..a87688bb4c01a2e9d099cf4575b320547c861613 100644
--- a/doc/modules/model_persistence.rst
+++ b/doc/modules/model_persistence.rst
@@ -14,7 +14,7 @@ Persistence example
 -------------------
 
 It is possible to save a model in the scikit by using Python's built-in
-persistence model, namely `pickle <http://docs.python.org/library/pickle.html>`_::
+persistence model, namely `pickle <http://docs.python.org/2/library/pickle.html>`_::
 
   >>> from sklearn import svm
   >>> from sklearn import datasets
diff --git a/doc/modules/neighbors.rst b/doc/modules/neighbors.rst
index 6e3001822edfe195e0b718a1dbac5ef07f7c6970..2a1d61ca8bf9dba8e5353581d2a760a2d3c9880c 100644
--- a/doc/modules/neighbors.rst
+++ b/doc/modules/neighbors.rst
@@ -685,6 +685,6 @@ candidates, the speedup compared to brute force search is approximately
      '06. 47th Annual IEEE Symposium
 
    * `“LSH Forest: Self-Tuning Indexes for Similarity Search”
-     <http://wwwconference.org/proceedings/www2005/docs/p651.pdf>`_,
+     <http://infolab.stanford.edu/~bawa/Pub/similarity.pdf>`_,
      Bawa, M., Condie, T., Ganesan, P., WWW '05 Proceedings of the 14th
      international conference on World Wide Web  Pages 651-660
diff --git a/doc/modules/neural_networks_supervised.rst b/doc/modules/neural_networks_supervised.rst
index f805c46bbb8d341e39c6d667ff98b1242a8f874a..265adac4b263a7627ee341b307441773ee862dfe 100644
--- a/doc/modules/neural_networks_supervised.rst
+++ b/doc/modules/neural_networks_supervised.rst
@@ -129,7 +129,7 @@ of probability estimates :math:`P(y|x)` per sample :math:`x`::
            [ 0.,  1.]])
 
 :class:`MLPClassifier` supports multi-class classification by
-applying `Softmax <http://en.wikipedia.org/wiki/Softmax_activation_function>`_
+applying `Softmax <https://en.wikipedia.org/wiki/Softmax_activation_function>`_
 as the output function.
 
 Further, the algorithm supports :ref:`multi-label classification <multiclass>`
@@ -198,9 +198,9 @@ Algorithms
 ==========
 
 MLP trains using `Stochastic Gradient Descent
-<http://en.wikipedia.org/wiki/Stochastic_gradient_descent>`_,
+<https://en.wikipedia.org/wiki/Stochastic_gradient_descent>`_,
 `Adam <http://arxiv.org/abs/1412.6980>`_, or
-`L-BFGS <http://en.wikipedia.org/wiki/Limited-memory_BFGS>`__.
+`L-BFGS <https://en.wikipedia.org/wiki/Limited-memory_BFGS>`__.
 Stochastic Gradient Descent (SGD) updates parameters using the gradient of the
 loss function with respect to a parameter that needs adaptation, i.e.
 
diff --git a/doc/modules/preprocessing.rst b/doc/modules/preprocessing.rst
index 9f94b5d762a594dfce1be68a2667c6379b2f1e26..37789df450aac2b8768eea81e8b7831ee751c38e 100644
--- a/doc/modules/preprocessing.rst
+++ b/doc/modules/preprocessing.rst
@@ -259,7 +259,7 @@ such as the dot-product or any other kernel to quantify the similarity
 of any pair of samples.
 
 This assumption is the base of the `Vector Space Model
-<http://en.wikipedia.org/wiki/Vector_Space_Model>`_ often used in text
+<https://en.wikipedia.org/wiki/Vector_Space_Model>`_ often used in text
 classification and clustering contexts.
 
 The function :func:`normalize` provides a quick and easy way to perform this
@@ -322,7 +322,7 @@ Feature binarization
 features to get boolean values**. This can be useful for downstream
 probabilistic estimators that make assumption that the input data
 is distributed according to a multi-variate `Bernoulli distribution
-<http://en.wikipedia.org/wiki/Bernoulli_distribution>`_. For instance,
+<https://en.wikipedia.org/wiki/Bernoulli_distribution>`_. For instance,
 this is the case for the :class:`sklearn.neural_network.BernoulliRBM`.
 
 It is also common among the text processing community to use binary
@@ -517,7 +517,7 @@ In some cases, only interaction terms among features are required, and it can be
 
 The features of X have been transformed from :math:`(X_1, X_2, X_3)` to :math:`(1, X_1, X_2, X_3, X_1X_2, X_1X_3, X_2X_3, X_1X_2X_3)`.
 
-Note that polynomial features are used implicitily in `kernel methods <http://en.wikipedia.org/wiki/Kernel_method>`_ (e.g., :class:`sklearn.svm.SVC`, :class:`sklearn.decomposition.KernelPCA`) when using polynomial :ref:`svm_kernels`.
+Note that polynomial features are used implicitily in `kernel methods <https://en.wikipedia.org/wiki/Kernel_method>`_ (e.g., :class:`sklearn.svm.SVC`, :class:`sklearn.decomposition.KernelPCA`) when using polynomial :ref:`svm_kernels`.
 
 See :ref:`example_linear_model_plot_polynomial_interpolation.py` for Ridge regression using created polynomial features.
 
diff --git a/doc/modules/random_projection.rst b/doc/modules/random_projection.rst
index e6ef3cb63e02a035886f048fe8bb7537bf3d633f..d0f733b532c54c6338a3880c6f3d8f691df9a7b3 100644
--- a/doc/modules/random_projection.rst
+++ b/doc/modules/random_projection.rst
@@ -22,7 +22,7 @@ technique for distance based method.
 .. topic:: References:
 
  * Sanjoy Dasgupta. 2000.
-   `Experiments with random projection. <http://cseweb.ucsd.edu/users/dasgupta/papers/randomf.pdf>`_
+   `Experiments with random projection. <http://cseweb.ucsd.edu/~dasgupta/papers/randomf.pdf>`_
    In Proceedings of the Sixteenth conference on Uncertainty in artificial
    intelligence (UAI'00), Craig Boutilier and Moisés Goldszmidt (Eds.). Morgan
    Kaufmann Publishers Inc., San Francisco, CA, USA, 143-151.
@@ -41,7 +41,7 @@ The Johnson-Lindenstrauss lemma
 
 The main theoretical result behind the efficiency of random projection is the
 `Johnson-Lindenstrauss lemma (quoting Wikipedia)
-<http://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma>`_:
+<https://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma>`_:
 
   In mathematics, the Johnson-Lindenstrauss lemma is a result
   concerning low-distortion embeddings of points from high-dimensional
diff --git a/doc/modules/sgd.rst b/doc/modules/sgd.rst
index 3869b7594d0960d6eb69f7a936aa572d60a8c63e..2ac5647002d31e8e239b2e5da2812d6d51c563fb 100644
--- a/doc/modules/sgd.rst
+++ b/doc/modules/sgd.rst
@@ -9,8 +9,8 @@ Stochastic Gradient Descent
 **Stochastic Gradient Descent (SGD)** is a simple yet very efficient
 approach to discriminative learning of linear classifiers under
 convex loss functions such as (linear) `Support Vector Machines
-<http://en.wikipedia.org/wiki/Support_vector_machine>`_ and `Logistic
-Regression <http://en.wikipedia.org/wiki/Logistic_regression>`_.
+<https://en.wikipedia.org/wiki/Support_vector_machine>`_ and `Logistic
+Regression <https://en.wikipedia.org/wiki/Logistic_regression>`_.
 Even though SGD has been around in the machine learning community for
 a long time, it has received a considerable amount of attention just
 recently in the context of large-scale learning.
@@ -212,7 +212,7 @@ Stochastic Gradient Descent for sparse data
   intercept.
 
 There is built-in support for sparse data given in any matrix in a format
-supported by `scipy.sparse <http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.html>`_. For maximum efficiency, however, use the CSR
+supported by `scipy.sparse <https://docs.scipy.org/doc/scipy/reference/sparse.html>`_. For maximum efficiency, however, use the CSR
 matrix format as defined in `scipy.sparse.csr_matrix
 <http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html>`_.
 
diff --git a/doc/modules/tree.rst b/doc/modules/tree.rst
index 591786ac8605328964da8869a3d43f79d81af3fe..118d22de7d291c4e504bbfed6b5722fa40202734 100644
--- a/doc/modules/tree.rst
+++ b/doc/modules/tree.rst
@@ -410,8 +410,8 @@ and threshold that yield the largest information gain at each node.
 
 scikit-learn uses an optimised version of the CART algorithm.
 
-.. _ID3: http://en.wikipedia.org/wiki/ID3_algorithm
-.. _CART: http://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees
+.. _ID3: https://en.wikipedia.org/wiki/ID3_algorithm
+.. _CART: https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees
 
 
 .. _tree_mathematical_formulation:
@@ -500,9 +500,9 @@ criterion to minimise is the Mean Squared Error
 
 .. topic:: References:
 
-    * http://en.wikipedia.org/wiki/Decision_tree_learning
+    * https://en.wikipedia.org/wiki/Decision_tree_learning
 
-    * http://en.wikipedia.org/wiki/Predictive_analytics
+    * https://en.wikipedia.org/wiki/Predictive_analytics
 
     * L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and
       Regression Trees. Wadsworth, Belmont, CA, 1984.
diff --git a/doc/presentations.rst b/doc/presentations.rst
index 4a0c08546e4364a0621f70bc995fc48fe95b8792..52977d3daa61e6a0bdac6fdfb819b2a88497b4b4 100644
--- a/doc/presentations.rst
+++ b/doc/presentations.rst
@@ -9,7 +9,7 @@ New to Scientific Python?
 ==========================
 For those that are still new to the scientific Python ecosystem, we highly
 recommend the `Python Scientific Lecture Notes
-<http://scipy-lectures.github.io/>`_. This will help you find your footing a
+<http://scipy-lectures.org>`_. This will help you find your footing a
 bit and will definitely improve your scikit-learn experience.  A basic
 understanding of NumPy arrays is recommended to make the most of scikit-learn.
 
@@ -20,7 +20,7 @@ There are several online tutorials available which are geared toward
 specific subject areas:
 
 - `Machine Learning for NeuroImaging in Python <http://nilearn.github.io/>`_
-- `Machine Learning for Astronomical Data Analysis <http://astroml.github.com/sklearn_tutorial/>`_
+- `Machine Learning for Astronomical Data Analysis <https://github.com/astroML/sklearn_tutorial>`_
 
 .. _videos:
 
@@ -50,7 +50,7 @@ Videos
     section :ref:`stat_learn_tut_index`.
 
 - `Statistical Learning for Text Classification with scikit-learn and NLTK
-  <http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-statistical-machine-learning-for-text-classification-with-scikit-learn-4898362>`_
+  <http://www.pyvideo.org/video/417/pycon-2011--statistical-machine-learning-for-text>`_
   (and `slides <http://www.slideshare.net/ogrisel/statistical-machine-learning-for-text-classification-with-scikitlearn-and-nltk>`_)
   by `Olivier Grisel`_ at PyCon 2011
 
@@ -58,21 +58,21 @@ Videos
     use NLTK and scikit-learn to solve real-world text classification
     tasks and compares against cloud-based solutions.
 
-- `Introduction to Interactive Predictive Analytics in Python with scikit-learn <http://www.youtube.com/watch?v=Zd5dfooZWG4>`_
+- `Introduction to Interactive Predictive Analytics in Python with scikit-learn <https://www.youtube.com/watch?v=Zd5dfooZWG4>`_
   by `Olivier Grisel`_ at PyCon 2012
 
     3-hours long introduction to prediction tasks using scikit-learn.
 
-- `scikit-learn - Machine Learning in Python <http://marakana.com/s/scikit-learn_machine_learning_in_python,1152/index.html>`_
+- `scikit-learn - Machine Learning in Python <https://newcircle.com/s/post/1152/scikit-learn_machine_learning_in_python>`_
   by `Jake Vanderplas`_ at the 2012 PyData workshop at Google
 
     Interactive demonstration of some scikit-learn features. 75 minutes.
 
-- `scikit-learn tutorial <http://vimeo.com/53062607>`_ by `Jake Vanderplas`_ at PyData NYC 2012
+- `scikit-learn tutorial <https://vimeo.com/53062607>`_ by `Jake Vanderplas`_ at PyData NYC 2012
 
     Presentation using the online tutorial, 45 minutes.
 
 
 .. _Gael Varoquaux: http://gael-varoquaux.info
-.. _Jake Vanderplas: http://www.astro.washington.edu/users/vanderplas/
-.. _Olivier Grisel: http://twitter.com/ogrisel
+.. _Jake Vanderplas: http://staff.washington.edu/jakevdp
+.. _Olivier Grisel: https://twitter.com/ogrisel
diff --git a/doc/related_projects.rst b/doc/related_projects.rst
index ea021dc568e4b836bfa83ec0457d68b013c01ba7..fdd66e97ed95cea0d2d4d9ccc3402aedb5f37ab8 100644
--- a/doc/related_projects.rst
+++ b/doc/related_projects.rst
@@ -148,7 +148,7 @@ Domain specific packages
 
 - `AstroML <http://www.astroml.org/>`_  Machine learning for astronomy.
 
-- `MSMBuilder <http://www.msmbuilder.org/>`_  Machine learning for protein
+- `MSMBuilder <http://msmbuilder.org/>`_  Machine learning for protein
   conformational dynamics time series.
 
 Snippets and tidbits
diff --git a/doc/testimonials/testimonials.rst b/doc/testimonials/testimonials.rst
index d4c27b7f4594e645df3206b89265ecb2f9f83ad7..672b581820f8c18844272c7b66eff385b8a01bc1 100644
--- a/doc/testimonials/testimonials.rst
+++ b/doc/testimonials/testimonials.rst
@@ -82,7 +82,7 @@ Gaël Varoquaux, research at Parietal
    </span>
 
 
-`Evernote <http://evernote.com>`_
+`Evernote <https://evernote.com>`_
 ----------------------------------
 
 .. raw:: html
@@ -149,7 +149,7 @@ Alexandre Gramfort, Assistant Professor
    </span>
 
 
-`AWeber <http://aweber.com/>`_
+`AWeber <http://www.aweber.com>`_
 ------------------------------------------
 
 .. raw:: html
@@ -158,7 +158,7 @@ Alexandre Gramfort, Assistant Professor
 
 .. image:: images/aweber.png
     :width: 120pt
-    :target: http://aweber.com/
+    :target: http://www.aweber.com
 
 .. raw:: html
 
@@ -188,7 +188,7 @@ Michael Becker, Software Engineer, Data Analysis and Management Ninjas
 
    </span>
 
-`Yhat <http://yhathq.com/>`_
+`Yhat <https://www.yhat.com>`_
 ------------------------------------------
 
 .. raw:: html
@@ -197,7 +197,7 @@ Michael Becker, Software Engineer, Data Analysis and Management Ninjas
 
 .. image:: images/yhat.png
     :width: 120pt
-    :target: http://yhathq.com/
+    :target: https://www.yhat.com
 
 .. raw:: html
 
@@ -322,8 +322,8 @@ Eustache Diemert, Lead Scientist Bestofmedia Group
 
    </span>
 
-`Change.org <http://www.change.org>`_
---------------------------------------------------
+`Change.org <https://www.change.org>`_
+--------------------------------------
 
 .. raw:: html
 
@@ -331,7 +331,7 @@ Eustache Diemert, Lead Scientist Bestofmedia Group
 
 .. image:: images/change-logo.png
     :width: 120pt
-    :target: http://www.change.org
+    :target: https://www.change.org
 
 .. raw:: html
 
@@ -423,8 +423,8 @@ Daniel Weitzenfeld, Senior Data Scientist at HowAboutWe
    </span>
 
 
-`PeerIndex <http://www.peerindex.com/>`_
-----------------------------------------
+`PeerIndex <https://www.brandwatch.com/peerindex-and-brandwatch>`_
+------------------------------------------------------------------
 
 .. raw:: html
 
@@ -519,8 +519,8 @@ David Koh - Senior Data Scientist at OkCupid
    </span>
    
 
-`Lovely <https://www.livelovely.com/>`_
------------------------------------------
+`Lovely <https://livelovely.com/>`_
+-----------------------------------
 
 .. raw:: html
 
@@ -528,7 +528,7 @@ David Koh - Senior Data Scientist at OkCupid
 
 .. image:: images/lovely.png
     :width: 120pt
-    :target: https://www.livelovely.com
+    :target: https://livelovely.com
 
 .. raw:: html
 
diff --git a/doc/tutorial/basic/tutorial.rst b/doc/tutorial/basic/tutorial.rst
index 873f9f611a798f285418fc5d034b676249d61275..f7e49d4e704c1f63d417055f890b9c93e76bc7ec 100644
--- a/doc/tutorial/basic/tutorial.rst
+++ b/doc/tutorial/basic/tutorial.rst
@@ -6,7 +6,7 @@ An introduction to machine learning with scikit-learn
 .. topic:: Section contents
 
     In this section, we introduce the `machine learning
-    <http://en.wikipedia.org/wiki/Machine_learning>`_
+    <https://en.wikipedia.org/wiki/Machine_learning>`_
     vocabulary that we use throughout scikit-learn and give a
     simple learning example.
 
@@ -15,22 +15,22 @@ Machine learning: the problem setting
 -------------------------------------
 
 In general, a learning problem considers a set of n
-`samples <http://en.wikipedia.org/wiki/Sample_(statistics)>`_ of
+`samples <https://en.wikipedia.org/wiki/Sample_(statistics)>`_ of
 data and then tries to predict properties of unknown data. If each sample is
 more than a single number and, for instance, a multi-dimensional entry
-(aka `multivariate <http://en.wikipedia.org/wiki/Multivariate_random_variable>`_
+(aka `multivariate <https://en.wikipedia.org/wiki/Multivariate_random_variable>`_
 data), it is said to have several attributes or **features**.
 
 We can separate learning problems in a few large categories:
 
- * `supervised learning <http://en.wikipedia.org/wiki/Supervised_learning>`_,
+ * `supervised learning <https://en.wikipedia.org/wiki/Supervised_learning>`_,
    in which the data comes with additional attributes that we want to predict
    (:ref:`Click here <supervised-learning>`
    to go to the scikit-learn supervised learning page).This problem
    can be either:
 
     * `classification
-      <http://en.wikipedia.org/wiki/Classification_in_machine_learning>`_:
+      <https://en.wikipedia.org/wiki/Classification_in_machine_learning>`_:
       samples belong to two or more classes and we
       want to learn from already labeled data how to predict the class
       of unlabeled data. An example of classification problem would
@@ -41,19 +41,19 @@ We can separate learning problems in a few large categories:
       limited number of categories and for each of the n samples provided,
       one is to try to label them with the correct category or class.
 
-    * `regression <http://en.wikipedia.org/wiki/Regression_analysis>`_:
+    * `regression <https://en.wikipedia.org/wiki/Regression_analysis>`_:
       if the desired output consists of one or more
       continuous variables, then the task is called *regression*. An
       example of a regression problem would be the prediction of the
       length of a salmon as a function of its age and weight.
 
- * `unsupervised learning <http://en.wikipedia.org/wiki/Unsupervised_learning>`_,
+ * `unsupervised learning <https://en.wikipedia.org/wiki/Unsupervised_learning>`_,
    in which the training data consists of a set of input vectors x
    without any corresponding target values. The goal in such problems
    may be to discover groups of similar examples within the data, where
-   it is called `clustering <http://en.wikipedia.org/wiki/Cluster_analysis>`_,
+   it is called `clustering <https://en.wikipedia.org/wiki/Cluster_analysis>`_,
    or to determine the distribution of data within the input space, known as
-   `density estimation <http://en.wikipedia.org/wiki/Density_estimation>`_, or
+   `density estimation <https://en.wikipedia.org/wiki/Density_estimation>`_, or
    to project the data from a high-dimensional space down to two or three
    dimensions for the purpose of *visualization*
    (:ref:`Click here <unsupervised-learning>`
@@ -74,7 +74,7 @@ Loading an example dataset
 --------------------------
 
 `scikit-learn` comes with a few standard datasets, for instance the
-`iris <http://en.wikipedia.org/wiki/Iris_flower_data_set>`_ and `digits
+`iris <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ and `digits
 <http://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits>`_
 datasets for classification and the `boston house prices dataset
 <http://archive.ics.uci.edu/ml/datasets/Housing>`_ for regression.
@@ -144,7 +144,7 @@ Learning and predicting
 In the case of the digits dataset, the task is to predict, given an image,
 which digit it represents. We are given samples of each of the 10
 possible classes (the digits zero through nine) on which we *fit* an
-`estimator <http://en.wikipedia.org/wiki/Estimator>`_ to be able to *predict*
+`estimator <https://en.wikipedia.org/wiki/Estimator>`_ to be able to *predict*
 the classes to which unseen samples belong.
 
 In scikit-learn, an estimator for classification is a Python object that
@@ -152,7 +152,7 @@ implements the methods ``fit(X, y)`` and ``predict(T)``.
 
 An example of an estimator is the class ``sklearn.svm.SVC`` that
 implements `support vector classification
-<http://en.wikipedia.org/wiki/Support_vector_machine>`_. The
+<https://en.wikipedia.org/wiki/Support_vector_machine>`_. The
 constructor of an estimator takes as arguments the parameters of the
 model, but for the time being, we will consider the estimator as a black
 box::
@@ -207,7 +207,7 @@ Model persistence
 -----------------
 
 It is possible to save a model in the scikit by using Python's built-in
-persistence model, namely `pickle <http://docs.python.org/library/pickle.html>`_::
+persistence model, namely `pickle <https://docs.python.org/2/library/pickle.html>`_::
 
   >>> from sklearn import svm
   >>> from sklearn import datasets
diff --git a/doc/tutorial/statistical_inference/finding_help.rst b/doc/tutorial/statistical_inference/finding_help.rst
index 96e1ebd7907235eb4ac49a12907475dd4efd443b..3dc1e3215eef665bd714d861e7c8973b3035994d 100644
--- a/doc/tutorial/statistical_inference/finding_help.rst
+++ b/doc/tutorial/statistical_inference/finding_help.rst
@@ -7,7 +7,7 @@ The project mailing list
 
 If you encounter a bug with ``scikit-learn`` or something that needs
 clarification in the docstring or the online documentation, please feel free to
-ask on the `Mailing List <http://scikit-learn.sourceforge.net/support.html>`_
+ask on the `Mailing List <http://scikit-learn.org/stable/support.html>`_
 
 
 Q&A communities with Machine Learning practitioners
@@ -26,7 +26,7 @@ Q&A communities with Machine Learning practitioners
 
     Quora has a topic for Machine Learning related questions that
     also features some interesting discussions:
-    http://quora.com/Machine-Learning
+    https://www.quora.com/topic/Machine-Learning
 
     Have a look at the best questions section, eg: `What are some
     good resources for learning about machine learning`_.
@@ -35,8 +35,8 @@ Q&A communities with Machine Learning practitioners
 
 .. _`good freely available textbooks on machine learning`: http://metaoptimize.com/qa/questions/186/good-freely-available-textbooks-on-machine-learning
 
-.. _`What are some good resources for learning about machine learning`: http://www.quora.com/What-are-some-good-resources-for-learning-about-machine-learning
+.. _`How do I learn machine learning?`: https://www.quora.com/How-do-I-learn-machine-learning-1
 
--- _'An excellent free online course for Machine Learning taught by Professor Andrew Ng of Stanford': https://www.coursera.org/course/ml
+-- _'An excellent free online course for Machine Learning taught by Professor Andrew Ng of Stanford': https://www.coursera.org/learn/machine-learning
 
--- _'Another excellent free online course that takes a more general approach to Artificial Intelligence':http://www.udacity.com/overview/Course/cs271/CourseRev/1
+-- _'Another excellent free online course that takes a more general approach to Artificial Intelligence': https://www.udacity.com/course/intro-to-artificial-intelligence--cs271
diff --git a/doc/tutorial/statistical_inference/index.rst b/doc/tutorial/statistical_inference/index.rst
index 19cfa0130232572126e97b6fec1fdac69ae57959..a298e61d03b133923d059f9bd68e72ae8cbc44f0 100644
--- a/doc/tutorial/statistical_inference/index.rst
+++ b/doc/tutorial/statistical_inference/index.rst
@@ -6,7 +6,7 @@ A tutorial on statistical-learning for scientific data processing
 
 .. topic:: Statistical learning 
 
-    `Machine learning <http://en.wikipedia.org/wiki/Machine_learning>`_ is 
+    `Machine learning <https://en.wikipedia.org/wiki/Machine_learning>`_ is
     a technique with a growing importance, as the
     size of the datasets experimental sciences are facing is rapidly
     growing. Problems it tackles range from building a prediction function
@@ -15,14 +15,14 @@ A tutorial on statistical-learning for scientific data processing
     
     This tutorial will explore *statistical learning*, the use of
     machine learning techniques with the goal of `statistical inference 
-    <http://en.wikipedia.org/wiki/Statistical_inference>`_:
+    <https://en.wikipedia.org/wiki/Statistical_inference>`_:
     drawing conclusions on the data at hand.
 
     Scikit-learn is a Python module integrating classic machine
     learning algorithms in the tightly-knit world of scientific Python
     packages (`NumPy <http://www.scipy.org>`_, `SciPy
     <http://www.scipy.org>`_, `matplotlib
-    <http://matplotlib.sourceforge.net/>`_).
+    <http://matplotlib.org>`_).
 
 .. include:: ../../includes/big_toc_css.rst
 
diff --git a/doc/tutorial/statistical_inference/supervised_learning.rst b/doc/tutorial/statistical_inference/supervised_learning.rst
index 20cd7cae578eed575e30477ee43edfada2548422..d5e69e15dd0f45b7d94a621d3a2bd51a480aa9b4 100644
--- a/doc/tutorial/statistical_inference/supervised_learning.rst
+++ b/doc/tutorial/statistical_inference/supervised_learning.rst
@@ -13,7 +13,7 @@ Supervised learning: predicting an output variable from high-dimensional observa
    are trying to predict, usually called "target" or "labels". Most often,
    ``y`` is a 1D array of length ``n_samples``.
 
-   All supervised `estimators <http://en.wikipedia.org/wiki/Estimator>`_
+   All supervised `estimators <https://en.wikipedia.org/wiki/Estimator>`_
    in scikit-learn implement a ``fit(X, y)`` method to fit the model
    and a ``predict(X)`` method that, given unlabeled observations ``X``,
    returns the predicted labels ``y``.
@@ -59,7 +59,7 @@ k-Nearest neighbors classifier
 -------------------------------
 
 The simplest possible classifier is the
-`nearest neighbor <http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_:
+`nearest neighbor <https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm>`_:
 given a new observation ``X_test``, find in the training set (i.e. the data
 used to train the estimator) the observation with the closest feature vector.
 (Please see the :ref:`Nearest Neighbors section<neighbors>` of the online
@@ -128,7 +128,7 @@ require more training data than the current estimated size of the entire
 internet (±1000 Exabytes or so).
 
 This is called the
-`curse of dimensionality  <http://en.wikipedia.org/wiki/Curse_of_dimensionality>`_
+`curse of dimensionality  <https://en.wikipedia.org/wiki/Curse_of_dimensionality>`_
 and is a core problem that machine learning addresses.
 
 Linear model: from regression to sparsity
@@ -265,9 +265,9 @@ diabetes dataset rather than our synthetic data::
 
     Capturing in the fitted parameters noise that prevents the model to
     generalize to new data is called
-    `overfitting <http://en.wikipedia.org/wiki/Overfitting>`_. The bias introduced
+    `overfitting <https://en.wikipedia.org/wiki/Overfitting>`_. The bias introduced
     by the ridge regression is called a
-    `regularization <http://en.wikipedia.org/wiki/Regularization_%28machine_learning%29>`_.
+    `regularization <https://en.wikipedia.org/wiki/Regularization_%28machine_learning%29>`_.
 
 .. _sparsity:
 
@@ -339,7 +339,7 @@ application of Occam's razor: *prefer simpler models*.
     Different algorithms can be used to solve the same mathematical
     problem. For instance the ``Lasso`` object in scikit-learn
     solves the lasso regression problem using a
-    `coordinate decent <http://en.wikipedia.org/wiki/Coordinate_descent>`_ method,
+    `coordinate decent <https://en.wikipedia.org/wiki/Coordinate_descent>`_ method,
     that is efficient on large datasets. However, scikit-learn also
     provides the :class:`LassoLars` object using the *LARS* algorthm,
     which is very efficient for problems in which the weight vector estimated
@@ -356,7 +356,7 @@ Classification
    :align: right
 
 For classification, as in the labeling
-`iris <http://en.wikipedia.org/wiki/Iris_flower_data_set>`_ task, linear
+`iris <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ task, linear
 regression is not the right approach as it will give too much weight to
 data far from the decision frontier. A linear approach is to fit a sigmoid
 function or **logistic** function:
diff --git a/doc/tutorial/statistical_inference/unsupervised_learning.rst b/doc/tutorial/statistical_inference/unsupervised_learning.rst
index 23db7aa11500e8dd46080b361cc55a6e829ad41c..4d9f4b0ea053baad025dee6f03d34f82a92f3256 100644
--- a/doc/tutorial/statistical_inference/unsupervised_learning.rst
+++ b/doc/tutorial/statistical_inference/unsupervised_learning.rst
@@ -105,8 +105,8 @@ algorithms. The simplest clustering algorithm is
 
     Clustering in general and KMeans, in particular, can be seen as a way
     of choosing a small number of exemplars to compress the information.
-    The problem is sometimes known as
-    `vector quantization <http://en.wikipedia.org/wiki/Vector_quantization>`_.
+    The problem is sometimes known as 
+    `vector quantization <https://en.wikipedia.org/wiki/Vector_quantization>`_.
     For instance, this can be used to posterize an image::
 
         >>> import scipy as sp
diff --git a/doc/tutorial/text_analytics/data/languages/fetch_data.py b/doc/tutorial/text_analytics/data/languages/fetch_data.py
index be7450cde26b35f2deee130c1a81724c591f6fea..5c5c36a322cafda60291583076033b7349dfc144 100644
--- a/doc/tutorial/text_analytics/data/languages/fetch_data.py
+++ b/doc/tutorial/text_analytics/data/languages/fetch_data.py
@@ -19,7 +19,7 @@ import codecs
 pages = {
     u'ar': u'http://ar.wikipedia.org/wiki/%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9%8A%D8%A7',
     u'de': u'http://de.wikipedia.org/wiki/Wikipedia',
-    u'en': u'http://en.wikipedia.org/wiki/Wikipedia',
+    u'en': u'https://en.wikipedia.org/wiki/Wikipedia',
     u'es': u'http://es.wikipedia.org/wiki/Wikipedia',
     u'fr': u'http://fr.wikipedia.org/wiki/Wikip%C3%A9dia',
     u'it': u'http://it.wikipedia.org/wiki/Wikipedia',
diff --git a/doc/tutorial/text_analytics/working_with_text_data.rst b/doc/tutorial/text_analytics/working_with_text_data.rst
index b553aa4c1975dfb62c82c1ba0cc9ae75c631f7e3..75c333c641bbd44c3e80d40e720086a2dad27a11 100644
--- a/doc/tutorial/text_analytics/working_with_text_data.rst
+++ b/doc/tutorial/text_analytics/working_with_text_data.rst
@@ -251,7 +251,7 @@ corpus.
 This downscaling is called `tf–idf`_ for "Term Frequency times
 Inverse Document Frequency".
 
-.. _`tf–idf`: http://en.wikipedia.org/wiki/Tf–idf
+.. _`tf–idf`: https://en.wikipedia.org/wiki/Tf–idf
 
 
 Both **tf** and **tf–idf** can be computed as follows::
@@ -553,7 +553,7 @@ upon the completion of this tutorial:
   at the :ref:`Multiclass and multilabel section <multiclass>`
 
 * Try using :ref:`Truncated SVD <LSA>` for
-  `latent semantic analysis <http://en.wikipedia.org/wiki/Latent_semantic_analysis>`_.
+  `latent semantic analysis <https://en.wikipedia.org/wiki/Latent_semantic_analysis>`_.
 
 * Have a look at using
   :ref:`Out-of-core Classification
diff --git a/doc/whats_new.rst b/doc/whats_new.rst
index 3ebbb72c54d3bb7b83c9b8a3f7597f8ad3bc0eae..a2c7f919ac3ca4313391136c226e4f7516002d97 100644
--- a/doc/whats_new.rst
+++ b/doc/whats_new.rst
@@ -3204,7 +3204,7 @@ as well as several new algorithms and documentation improvements.
 
 This release also includes the dictionary-learning work developed by
 `Vlad Niculae`_ as part of the `Google Summer of Code
-<http://code.google.com/soc/>`_ program.
+<https://developers.google.com/open-source/gsoc>`_ program.
 
 
 
@@ -3897,7 +3897,7 @@ Earlier versions
 Earlier versions included contributions by Fred Mailhot, David Cooke,
 David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
-.. _Olivier Grisel: http://twitter.com/ogrisel
+.. _Olivier Grisel: https://twitter.com/ogrisel
 
 .. _Gael Varoquaux: http://gael-varoquaux.info
 
@@ -3915,27 +3915,27 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
 .. _Vlad Niculae: http://vene.ro
 
-.. _Edouard Duchesnay: http://www.lnao.fr/spip.php?rubrique30
+.. _Edouard Duchesnay: https://sites.google.com/site/duchesnay/home
 
-.. _Peter Prettenhofer: http://sites.google.com/site/peterprettenhofer/
+.. _Peter Prettenhofer: https://sites.google.com/site/peterprettenhofer/
 
 .. _Alexandre Passos: http://atpassos.me
 
-.. _Nicolas Pinto: http://pinto.scripts.mit.edu/
+.. _Nicolas Pinto: https://twitter.com/npinto
 
-.. _Virgile Fritsch: http://parietal.saclay.inria.fr/Members/virgile-fritsch
+.. _Virgile Fritsch: https://github.com/VirgileFritsch
 
-.. _Bertrand Thirion: http://parietal.saclay.inria.fr/Members/bertrand-thirion
+.. _Bertrand Thirion: https://team.inria.fr/parietal/bertrand-thirions-page
 
 .. _Andreas Müller: http://peekaboo-vision.blogspot.com
 
-.. _Matthieu Perrot: http://www.lnao.fr/spip.php?rubrique19
+.. _Matthieu Perrot: http://brainvisa.info/biblio/lnao/en/Author/PERROT-M.html
 
-.. _Jake Vanderplas: http://www.astro.washington.edu/users/vanderplas/
+.. _Jake Vanderplas: http://staff.washington.edu/jakevdp/
 
 .. _Gilles Louppe: http://www.montefiore.ulg.ac.be/~glouppe/
 
-.. _INRIA: http://inria.fr
+.. _INRIA: http://www.inria.fr
 
 .. _Parietal Team: http://parietal.saclay.inria.fr/
 
@@ -3943,23 +3943,23 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
 .. _David Warde-Farley: http://www-etud.iro.umontreal.ca/~wardefar/
 
-.. _Brian Holt: http://info.ee.surrey.ac.uk/Personal/B.Holt/
+.. _Brian Holt: http://personal.ee.surrey.ac.uk/Personal/B.Holt
 
 .. _Satrajit Ghosh: http://www.mit.edu/~satra/
 
-.. _Robert Layton: http://www.twitter.com/robertlayton
+.. _Robert Layton: https://twitter.com/robertlayton
 
-.. _Scott White: http://twitter.com/scottblanc
+.. _Scott White: https://twitter.com/scottblanc
 
 .. _Jaques Grobler: https://github.com/jaquesgrobler/scikit-learn/wiki/Jaques-Grobler
 
 .. _David Marek: http://www.davidmarek.cz/
 
-.. _@kernc: http://github.com/kernc
+.. _@kernc: https://github.com/kernc
 
-.. _Christian Osendorfer: http://osdf.github.com
+.. _Christian Osendorfer: https://osdf.github.io
 
-.. _Noel Dawe: http://noel.dawe.me
+.. _Noel Dawe: https://github.com/ndawe
 
 .. _Arnaud Joly: http://www.ajoly.org
 
@@ -4017,7 +4017,7 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
 .. _Nikolay Mayorov: https://github.com/nmayorov
 
-.. _Jatin Shah: http://jatinshah.org/
+.. _Jatin Shah: https://github.com/jatinshah
 
 .. _Dougal Sutherland: https://github.com/dougalsutherland
 
@@ -4031,7 +4031,7 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
 .. _Florian Wilhelm: https://github.com/FlorianWilhelm
 
-.. _Fares Hedyati: https://github.com/fareshedyati
+.. _Fares Hedyati: http://www.eecs.berkeley.edu/~fareshed
 
 .. _Matt Pico: https://github.com/MattpSoftware
 
@@ -4041,7 +4041,7 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
 .. _Clemens Brunner: https://github.com/cle1109
 
-.. _Martin Billinger: https://github.com/kazemakase
+.. _Martin Billinger: http://tnsre.embs.org/author/martinbillinger
 
 .. _Matteo Visconti di Oleggio Castello: http://www.mvdoc.me
 
@@ -4053,7 +4053,7 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
 .. _Cathy Deng: https://github.com/cathydeng
 
-.. _Will Dawson: http://dawsonresearch.com
+.. _Will Dawson: http://www.dawsonresearch.com
 
 .. _Balazs Kegl: https://github.com/kegl
 
@@ -4065,7 +4065,7 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
 .. _Hanna Wallach: http://dirichlet.net/
 
-.. _Yan Yi: http://www.seowyanyi.org
+.. _Yan Yi: http://seowyanyi.org
 
 .. _Kyle Beauchamp: https://github.com/kyleabeauchamp
 
@@ -4075,7 +4075,7 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 
 .. _Dan Blanchard: https://github.com/dan-blanchard
 
-.. _Eric Martin: http://ericmart.in
+.. _Eric Martin: http://www.ericmart.in
 
 .. _Nicolas Goix: https://webperso.telecom-paristech.fr/front/frontoffice.php?SP_ID=241
 
@@ -4104,7 +4104,7 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 .. _Daniel Galvez: https://github.com/galv
 .. _Jacob Schreiber: https://github.com/jmschrei
 .. _Ankur Ankan: https://github.com/ankurankan
-.. _Valentin Stolbunov: http://vstolbunov.com
+.. _Valentin Stolbunov: http://www.vstolbunov.com
 .. _Jean Kossaifi: https://github.com/JeanKossaifi
 .. _Andrew Lamb: https://github.com/andylamb
 .. _Graham Clenaghan: https://github.com/gclenaghan
diff --git a/examples/applications/plot_species_distribution_modeling.py b/examples/applications/plot_species_distribution_modeling.py
index d327a086c6722cf7850c5f7bb4251ac0f2c0774c..6dab5fa8c906392c2e6157e2a57b47ceb0daaf90 100644
--- a/examples/applications/plot_species_distribution_modeling.py
+++ b/examples/applications/plot_species_distribution_modeling.py
@@ -13,17 +13,17 @@ density estimation problem and use the `OneClassSVM` provided
 by the package `sklearn.svm` as our modeling tool.
 The dataset is provided by Phillips et. al. (2006).
 If available, the example uses
-`basemap <http://matplotlib.sourceforge.net/basemap/doc/html/>`_
+`basemap <http://matplotlib.org/basemap>`_
 to plot the coast lines and national boundaries of South America.
 
 The two species are:
 
  - `"Bradypus variegatus"
-   <http://www.iucnredlist.org/apps/redlist/details/3038/0>`_ ,
+   <http://www.iucnredlist.org/details/3038/0>`_ ,
    the Brown-throated Sloth.
 
  - `"Microryzomys minutus"
-   <http://www.iucnredlist.org/apps/redlist/details/13408/0>`_ ,
+   <http://www.iucnredlist.org/details/13408/0>`_ ,
    also known as the Forest Small Rice Rat, a rodent that lives in Peru,
    Colombia, Ecuador, Peru, and Venezuela.
 
diff --git a/examples/applications/wikipedia_principal_eigenvector.py b/examples/applications/wikipedia_principal_eigenvector.py
index 06b02eb721c5482d81b03712599318b5ebdcb339..17babb68caad8cda28a894d5e88617f0b910c1a2 100644
--- a/examples/applications/wikipedia_principal_eigenvector.py
+++ b/examples/applications/wikipedia_principal_eigenvector.py
@@ -8,7 +8,7 @@ graph is to compute the principal eigenvector of the adjacency matrix
 so as to assign to each vertex the values of the components of the first
 eigenvector as a centrality score:
 
-    http://en.wikipedia.org/wiki/Eigenvector_centrality
+    https://en.wikipedia.org/wiki/Eigenvector_centrality
 
 On the graph of webpages and links those values are called the PageRank
 scores by Google.
@@ -20,7 +20,7 @@ this eigenvector centrality.
 The traditional way to compute the principal eigenvector is to use the
 power iteration method:
 
-    http://en.wikipedia.org/wiki/Power_iteration
+    https://en.wikipedia.org/wiki/Power_iteration
 
 Here the computation is achieved thanks to Martinsson's Randomized SVD
 algorithm implemented in the scikit.
diff --git a/examples/calibration/plot_calibration.py b/examples/calibration/plot_calibration.py
index 299f924e2a4688737025bc712b0c1e8afcc806b4..b38b25812bb7f42a4a6d09a40c586a3853c91d03 100644
--- a/examples/calibration/plot_calibration.py
+++ b/examples/calibration/plot_calibration.py
@@ -11,7 +11,7 @@ while others being under-confident. Thus, a separate calibration of predicted
 probabilities is often desirable as a postprocessing. This example illustrates
 two different methods for this calibration and evaluates the quality of the
 returned probabilities using Brier's score
-(see http://en.wikipedia.org/wiki/Brier_score).
+(see https://en.wikipedia.org/wiki/Brier_score).
 
 Compared are the estimated probability using a Gaussian naive Bayes classifier
 without calibration, with a sigmoid calibration, and with a non-parametric
diff --git a/examples/datasets/plot_iris_dataset.py b/examples/datasets/plot_iris_dataset.py
index 2436dac67253c788e0e51cad5325d3d6d53e34d2..fc8790762d1de588ace0d309f474f8f54cbcb9eb 100644
--- a/examples/datasets/plot_iris_dataset.py
+++ b/examples/datasets/plot_iris_dataset.py
@@ -13,7 +13,7 @@ The rows being the samples and the columns being:
 Sepal Length, Sepal Width, Petal Length	and Petal Width.
 
 The below plot uses the first two features.
-See `here <http://en.wikipedia.org/wiki/Iris_flower_data_set>`_ for more
+See `here <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ for more
 information on this dataset.
 """
 print(__doc__)
diff --git a/examples/decomposition/plot_pca_iris.py b/examples/decomposition/plot_pca_iris.py
index 67a679e8e567764ed746f4c808dcbe7e39e2ce35..f8451915b44128180fe174cabb67beb4427dee03 100644
--- a/examples/decomposition/plot_pca_iris.py
+++ b/examples/decomposition/plot_pca_iris.py
@@ -8,7 +8,7 @@ PCA example with Iris Data-set
 
 Principal Component Analysis applied to the Iris dataset.
 
-See `here <http://en.wikipedia.org/wiki/Iris_flower_data_set>`_ for more
+See `here <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ for more
 information on this dataset.
 
 """
diff --git a/examples/linear_model/plot_iris_logistic.py b/examples/linear_model/plot_iris_logistic.py
index 4cd705dc32df3e73f666713d806cb46ffa8ddeef..5186a775cd8aa5fffd18454e310d6b7aa89e9b68 100644
--- a/examples/linear_model/plot_iris_logistic.py
+++ b/examples/linear_model/plot_iris_logistic.py
@@ -7,7 +7,7 @@ Logistic Regression 3-class Classifier
 =========================================================
 
 Show below is a logistic-regression classifiers decision boundaries on the
-`iris <http://en.wikipedia.org/wiki/Iris_flower_data_set>`_ dataset. The
+`iris <https://en.wikipedia.org/wiki/Iris_flower_data_set>`_ dataset. The
 datapoints are colored according to their labels.
 
 """
diff --git a/examples/manifold/plot_manifold_sphere.py b/examples/manifold/plot_manifold_sphere.py
index 77b37e787778594ee5aed108afa8061bf133c287..744eb2f37675b58be8583552763f71080679c037 100644
--- a/examples/manifold/plot_manifold_sphere.py
+++ b/examples/manifold/plot_manifold_sphere.py
@@ -24,7 +24,7 @@ high-dimensional space, unlike other manifold-learning algorithms,
 it does not seeks an isotropic representation of the data in
 the low-dimensional space. Here the manifold problem matches fairly
 that of representing a flat map of the Earth, as with
-`map projection <http://en.wikipedia.org/wiki/Map_projection>`_
+`map projection <https://en.wikipedia.org/wiki/Map_projection>`_
 """
 
 # Author: Jaques Grobler <jaques.grobler@inria.fr>
diff --git a/examples/neighbors/plot_species_kde.py b/examples/neighbors/plot_species_kde.py
index 95f4417ce1bcaa75da7d579140bdf961084c683f..c582d76a9bf69c033be8db9b642a1ae08c0dab0e 100644
--- a/examples/neighbors/plot_species_kde.py
+++ b/examples/neighbors/plot_species_kde.py
@@ -7,7 +7,7 @@ density estimate) on geospatial data, using a Ball Tree built upon the
 Haversine distance metric -- i.e. distances over points in latitude/longitude.
 The dataset is provided by Phillips et. al. (2006).
 If available, the example uses
-`basemap <http://matplotlib.sourceforge.net/basemap/doc/html/>`_
+`basemap <http://matplotlib.org/basemap>`_
 to plot the coast lines and national boundaries of South America.
 
 This example does not perform any learning over the data
diff --git a/examples/plot_johnson_lindenstrauss_bound.py b/examples/plot_johnson_lindenstrauss_bound.py
index 7530f76d94aa88acd4fa7d3b02c56f6051f03ebf..b2dc902c71c52eee852a3b38cc6507226e380355 100644
--- a/examples/plot_johnson_lindenstrauss_bound.py
+++ b/examples/plot_johnson_lindenstrauss_bound.py
@@ -8,7 +8,7 @@ The `Johnson-Lindenstrauss lemma`_ states that any high dimensional
 dataset can be randomly projected into a lower dimensional Euclidean
 space while controlling the distortion in the pairwise distances.
 
-.. _`Johnson-Lindenstrauss lemma`: http://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma
+.. _`Johnson-Lindenstrauss lemma`: https://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma
 
 
 Theoretical bounds
diff --git a/sklearn/covariance/shrunk_covariance_.py b/sklearn/covariance/shrunk_covariance_.py
index 86f1545abba134e43ce8406b133e8230aa13d523..37c10a909cf17c0672278cdc6e4eab94ab46f916 100644
--- a/sklearn/covariance/shrunk_covariance_.py
+++ b/sklearn/covariance/shrunk_covariance_.py
@@ -441,7 +441,7 @@ def oas(X, assume_centered=False):
     The formula we used to implement the OAS
     does not correspond to the one given in the article. It has been taken
     from the MATLAB program available from the author's webpage
-    (https://tbayes.eecs.umich.edu/yilun/covestimation).
+    (http://tbayes.eecs.umich.edu/yilun/covestimation).
 
     """
     X = np.asarray(X)
@@ -485,7 +485,7 @@ class OAS(EmpiricalCovariance):
 
     The formula used here does not correspond to the one given in the
     article. It has been taken from the Matlab program available from the
-    authors' webpage (https://tbayes.eecs.umich.edu/yilun/covestimation).
+    authors' webpage (http://tbayes.eecs.umich.edu/yilun/covestimation).
 
     Parameters
     ----------
diff --git a/sklearn/datasets/descr/breast_cancer.rst b/sklearn/datasets/descr/breast_cancer.rst
index cb652b7f13168e1916ae043902e8ca4d8fe10687..8e12472941a667a2458cedfdb85482d3576becee 100644
--- a/sklearn/datasets/descr/breast_cancer.rst
+++ b/sklearn/datasets/descr/breast_cancer.rst
@@ -81,8 +81,6 @@ https://goo.gl/U2Uwz2
 Features are computed from a digitized image of a fine needle
 aspirate (FNA) of a breast mass.  They describe
 characteristics of the cell nuclei present in the image.
-A few of the images can be found at
-http://www.cs.wisc.edu/~street/images/
 
 Separating plane described above was obtained using
 Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree
diff --git a/sklearn/datasets/descr/linnerud.rst b/sklearn/datasets/descr/linnerud.rst
index 6e5a9b94cf6bf0d2077e89b511a769eba60d8bc5..d790d3c0c9086b3685a26c5ad6632fdbf8302956 100644
--- a/sklearn/datasets/descr/linnerud.rst
+++ b/sklearn/datasets/descr/linnerud.rst
@@ -18,5 +18,4 @@ The Linnerud dataset constains two small dataset:
 
 References
 ----------
-  * http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=mixOmics:linnerud
   * Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris: Editions Technic.
diff --git a/sklearn/datasets/samples_generator.py b/sklearn/datasets/samples_generator.py
index a1168ab0245d226c49b7badd93be20412852ec99..5f6d1a71b503d7f21de69773cfa16a5c99b5d174 100644
--- a/sklearn/datasets/samples_generator.py
+++ b/sklearn/datasets/samples_generator.py
@@ -1294,7 +1294,7 @@ def make_swiss_roll(n_samples=100, noise=0.0, random_state=None):
     ----------
     .. [1] S. Marsland, "Machine Learning: An Algorithmic Perspective",
            Chapter 10, 2009.
-           http://www-ist.massey.ac.nz/smarsland/Code/10/lle.py
+           http://seat.massey.ac.nz/personal/s.r.marsland/Code/10/lle.py
     """
     generator = check_random_state(random_state)
 
diff --git a/sklearn/feature_selection/univariate_selection.py b/sklearn/feature_selection/univariate_selection.py
index 52b630c5b501e569aa5e7e0a3190df3c15e73f47..1292252fdc8b71714e27e911047b3ce167c2bfee 100644
--- a/sklearn/feature_selection/univariate_selection.py
+++ b/sklearn/feature_selection/univariate_selection.py
@@ -562,7 +562,7 @@ class SelectFdr(_BaseFilter):
 
     References
     ----------
-    http://en.wikipedia.org/wiki/False_discovery_rate
+    https://en.wikipedia.org/wiki/False_discovery_rate
 
     See also
     --------
diff --git a/sklearn/gaussian_process/gaussian_process.py b/sklearn/gaussian_process/gaussian_process.py
index 7fce663936f2f01c5dcccd34eddfe85d76737313..912ec72289e537f025a23f4aa324bb0aac3c1359 100644
--- a/sklearn/gaussian_process/gaussian_process.py
+++ b/sklearn/gaussian_process/gaussian_process.py
@@ -203,7 +203,7 @@ class GaussianProcess(BaseEstimator, RegressorMixin):
 
     .. [NLNS2002] `H.B. Nielsen, S.N. Lophaven, H. B. Nielsen and J.
         Sondergaard.  DACE - A MATLAB Kriging Toolbox.` (2002)
-        http://www2.imm.dtu.dk/~hbn/dace/dace.pdf
+        http://imedea.uib-csic.es/master/cambioglobal/Modulo_V_cod101615/Lab/lab_maps/krigging/DACE-krigingsoft/dace/dace.pdf
 
     .. [WBSWM1992] `W.J. Welch, R.J. Buck, J. Sacks, H.P. Wynn, T.J. Mitchell,
         and M.D.  Morris (1992). Screening, predicting, and computer
diff --git a/sklearn/isotonic.py b/sklearn/isotonic.py
index 01fd8cb1c1ce3d4ec191b42d23dcd013ba31b2c5..456c1983cef7d055271347f4fdf0a7ed775178e3 100644
--- a/sklearn/isotonic.py
+++ b/sklearn/isotonic.py
@@ -49,7 +49,7 @@ def check_increasing(x, y):
     References
     ----------
     Fisher transformation. Wikipedia.
-    http://en.wikipedia.org/w/index.php?title=Fisher_transformation
+    https://en.wikipedia.org/wiki/Fisher_transformation
     """
 
     # Calculate Spearman rho estimate and set return accordingly.
@@ -62,7 +62,7 @@ def check_increasing(x, y):
         F_se = 1 / math.sqrt(len(x) - 3)
 
         # Use a 95% CI, i.e., +/-1.96 S.E.
-        # http://en.wikipedia.org/wiki/Fisher_transformation
+        # https://en.wikipedia.org/wiki/Fisher_transformation
         rho_0 = math.tanh(F - 1.96 * F_se)
         rho_1 = math.tanh(F + 1.96 * F_se)
 
diff --git a/sklearn/linear_model/least_angle.py b/sklearn/linear_model/least_angle.py
index 70c93c7be5bc8593ee707e5daa662c80a4f16d27..9fce600950c9e31c812052875bbb4903b4078410 100644
--- a/sklearn/linear_model/least_angle.py
+++ b/sklearn/linear_model/least_angle.py
@@ -135,13 +135,13 @@ def lars_path(X, y, Xy=None, Gram=None, max_iter=500,
     References
     ----------
     .. [1] "Least Angle Regression", Effron et al.
-           http://www-stat.stanford.edu/~tibs/ftp/lars.pdf
+           http://statweb.stanford.edu/~tibs/ftp/lars.pdf
 
     .. [2] `Wikipedia entry on the Least-angle regression
-           <http://en.wikipedia.org/wiki/Least-angle_regression>`_
+           <https://en.wikipedia.org/wiki/Least-angle_regression>`_
 
     .. [3] `Wikipedia entry on the Lasso
-           <http://en.wikipedia.org/wiki/Lasso_(statistics)#Lasso_method>`_
+           <https://en.wikipedia.org/wiki/Lasso_(statistics)#Lasso_method>`_
 
     """
 
@@ -1402,8 +1402,8 @@ class LassoLarsIC(LassoLars):
     Hui Zou, Trevor Hastie, and Robert Tibshirani
     Ann. Statist. Volume 35, Number 5 (2007), 2173-2192.
 
-    http://en.wikipedia.org/wiki/Akaike_information_criterion
-    http://en.wikipedia.org/wiki/Bayesian_information_criterion
+    https://en.wikipedia.org/wiki/Akaike_information_criterion
+    https://en.wikipedia.org/wiki/Bayesian_information_criterion
 
     See also
     --------
diff --git a/sklearn/linear_model/perceptron.py b/sklearn/linear_model/perceptron.py
index 0eb2ac2d3af0b6e91b01949b3430d8901e658cc7..76f8c648c72018aec86b5667e8a4de0befc54508 100644
--- a/sklearn/linear_model/perceptron.py
+++ b/sklearn/linear_model/perceptron.py
@@ -84,7 +84,7 @@ class Perceptron(BaseSGDClassifier, _LearntSelectorMixin):
     References
     ----------
 
-    http://en.wikipedia.org/wiki/Perceptron and references therein.
+    https://en.wikipedia.org/wiki/Perceptron and references therein.
     """
     def __init__(self, penalty=None, alpha=0.0001, fit_intercept=True,
                  n_iter=5, shuffle=True, verbose=0, eta0=1.0, n_jobs=1,
diff --git a/sklearn/linear_model/ransac.py b/sklearn/linear_model/ransac.py
index 58187de99395970bcbf7cf417b5bbebc0303b46b..5b3b27e2e6ff3804f7c090f78f1999b40bcc3b3c 100644
--- a/sklearn/linear_model/ransac.py
+++ b/sklearn/linear_model/ransac.py
@@ -170,7 +170,7 @@ class RANSACRegressor(BaseEstimator, MetaEstimatorMixin, RegressorMixin):
 
     References
     ----------
-    .. [1] http://en.wikipedia.org/wiki/RANSAC
+    .. [1] https://en.wikipedia.org/wiki/RANSAC
     .. [2] http://www.cs.columbia.edu/~belhumeur/courses/compPhoto/ransac.pdf
     .. [3] http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf
     """
diff --git a/sklearn/linear_model/sgd_fast.pyx b/sklearn/linear_model/sgd_fast.pyx
index 56c087dea0d080b98ad39508f5e57174e826db1e..df9fb8be09ed161571f6c716fbd58f824c2cb0d4 100644
--- a/sklearn/linear_model/sgd_fast.pyx
+++ b/sklearn/linear_model/sgd_fast.pyx
@@ -242,7 +242,7 @@ cdef class Huber(Regression):
     Variant of the SquaredLoss that is robust to outliers (quadratic near zero,
     linear in for large errors).
 
-    http://en.wikipedia.org/wiki/Huber_Loss_Function
+    https://en.wikipedia.org/wiki/Huber_Loss_Function
     """
 
     cdef double c
diff --git a/sklearn/linear_model/theil_sen.py b/sklearn/linear_model/theil_sen.py
index b4204a381974edf7256651e921c40cadd623a619..0764304559ddd77cddde7c2f10469eb90c43efec 100644
--- a/sklearn/linear_model/theil_sen.py
+++ b/sklearn/linear_model/theil_sen.py
@@ -276,7 +276,7 @@ class TheilSenRegressor(LinearModel, RegressorMixin):
     ----------
     - Theil-Sen Estimators in a Multiple Linear Regression Model, 2009
       Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang
-      http://www.math.iupui.edu/~hpeng/MTSE_0908.pdf
+      http://home.olemiss.edu/~xdang/papers/MTSE.pdf
     """
 
     def __init__(self, fit_intercept=True, copy_X=True,
diff --git a/sklearn/manifold/spectral_embedding_.py b/sklearn/manifold/spectral_embedding_.py
index bcaac9dd032cf5e6162db845166ee9ef47fef0f5..d8a69c402122e816a423cc8ed4cf21f2be9a7f71 100644
--- a/sklearn/manifold/spectral_embedding_.py
+++ b/sklearn/manifold/spectral_embedding_.py
@@ -195,7 +195,7 @@ def spectral_embedding(adjacency, n_components=8, eigen_solver=None,
 
     References
     ----------
-    * http://en.wikipedia.org/wiki/LOBPCG
+    * https://en.wikipedia.org/wiki/LOBPCG
 
     * Toward the Optimal Preconditioned Eigensolver: Locally Optimal
       Block Preconditioned Conjugate Gradient Method
diff --git a/sklearn/metrics/classification.py b/sklearn/metrics/classification.py
index d5743fbb9934acfcb0b7b4c5fb64b1bb5305f9cd..351ce46f2c7bdb720274f614a88256ee856b055d 100644
--- a/sklearn/metrics/classification.py
+++ b/sklearn/metrics/classification.py
@@ -212,7 +212,7 @@ def confusion_matrix(y_true, y_pred, labels=None, sample_weight=None):
     References
     ----------
     .. [1] `Wikipedia entry for the Confusion matrix
-           <http://en.wikipedia.org/wiki/Confusion_matrix>`_
+           <https://en.wikipedia.org/wiki/Confusion_matrix>`_
 
     Examples
     --------
@@ -369,7 +369,7 @@ def jaccard_similarity_score(y_true, y_pred, normalize=True,
     References
     ----------
     .. [1] `Wikipedia entry for the Jaccard index
-           <http://en.wikipedia.org/wiki/Jaccard_index>`_
+           <https://en.wikipedia.org/wiki/Jaccard_index>`_
 
 
     Examples
@@ -451,7 +451,7 @@ def matthews_corrcoef(y_true, y_pred, sample_weight=None):
        <http://dx.doi.org/10.1093/bioinformatics/16.5.412>`_
 
     .. [2] `Wikipedia entry for the Matthews Correlation Coefficient
-       <http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_
+       <https://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_
 
     Examples
     --------
@@ -639,8 +639,7 @@ def f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary',
 
     References
     ----------
-    .. [1] `Wikipedia entry for the F1-score
-            <http://en.wikipedia.org/wiki/F1_score>`_
+    .. [1] `Wikipedia entry for the F1-score <https://en.wikipedia.org/wiki/F1_score>`_
 
     Examples
     --------
@@ -750,7 +749,7 @@ def fbeta_score(y_true, y_pred, beta, labels=None, pos_label=1,
            Modern Information Retrieval. Addison Wesley, pp. 327-328.
 
     .. [2] `Wikipedia entry for the F1-score
-           <http://en.wikipedia.org/wiki/F1_score>`_
+           <https://en.wikipedia.org/wiki/F1_score>`_
 
     Examples
     --------
@@ -934,10 +933,10 @@ def precision_recall_fscore_support(y_true, y_pred, beta=1.0, labels=None,
     References
     ----------
     .. [1] `Wikipedia entry for the Precision and recall
-           <http://en.wikipedia.org/wiki/Precision_and_recall>`_
+           <https://en.wikipedia.org/wiki/Precision_and_recall>`_
 
     .. [2] `Wikipedia entry for the F1-score
-           <http://en.wikipedia.org/wiki/F1_score>`_
+           <https://en.wikipedia.org/wiki/F1_score>`_
 
     .. [3] `Discriminative Methods for Multi-labeled Classification Advances
            in Knowledge Discovery and Data Mining (2004), pp. 22-30 by Shantanu
@@ -1478,7 +1477,7 @@ def hamming_loss(y_true, y_pred, classes=None, sample_weight=None):
            3(3), 1-13, July-September 2007.
 
     .. [2] `Wikipedia entry on the Hamming distance
-           <http://en.wikipedia.org/wiki/Hamming_distance>`_
+           <https://en.wikipedia.org/wiki/Hamming_distance>`_
 
     Examples
     --------
@@ -1643,7 +1642,7 @@ def hinge_loss(y_true, pred_decision, labels=None, sample_weight=None):
     References
     ----------
     .. [1] `Wikipedia entry on the Hinge loss
-           <http://en.wikipedia.org/wiki/Hinge_loss>`_
+           <https://en.wikipedia.org/wiki/Hinge_loss>`_
 
     .. [2] Koby Crammer, Yoram Singer. On the Algorithmic
            Implementation of Multiclass Kernel-based Vector
@@ -1812,7 +1811,7 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):
 
     References
     ----------
-    http://en.wikipedia.org/wiki/Brier_score
+    https://en.wikipedia.org/wiki/Brier_score
     """
     y_true = column_or_1d(y_true)
     y_prob = column_or_1d(y_prob)
diff --git a/sklearn/metrics/cluster/supervised.py b/sklearn/metrics/cluster/supervised.py
index 294d66b8c85c68551eb939cf922e9f0cb1b4bfe9..77c9c50436061f81306de7e03beea4ff52b578fd 100644
--- a/sklearn/metrics/cluster/supervised.py
+++ b/sklearn/metrics/cluster/supervised.py
@@ -177,9 +177,9 @@ def adjusted_rand_score(labels_true, labels_pred, max_n_classes=5000):
 
     .. [Hubert1985] `L. Hubert and P. Arabie, Comparing Partitions,
       Journal of Classification 1985`
-      http://www.springerlink.com/content/x64124718341j1j0/
+      http://link.springer.com/article/10.1007%2FBF01908075
 
-    .. [wk] http://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index
+    .. [wk] https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index
 
     See also
     --------
@@ -702,7 +702,7 @@ def adjusted_mutual_info_score(labels_true, labels_pred, max_n_classes=5000):
        <http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf>`_
 
     .. [2] `Wikipedia entry for the Adjusted Mutual Information
-       <http://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_
+       <https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_
 
     """
     labels_true, labels_pred = check_clusterings(labels_true, labels_pred)
diff --git a/sklearn/metrics/cluster/unsupervised.py b/sklearn/metrics/cluster/unsupervised.py
index a0d2aaa24ef382788a11ed8488958e636bd6cb0f..4a8ff450e7ceab6f49d8b9f06157dedb81101905 100644
--- a/sklearn/metrics/cluster/unsupervised.py
+++ b/sklearn/metrics/cluster/unsupervised.py
@@ -78,7 +78,7 @@ def silhouette_score(X, labels, metric='euclidean', sample_size=None,
        <http://www.sciencedirect.com/science/article/pii/0377042787901257>`_
 
     .. [2] `Wikipedia entry on the Silhouette Coefficient
-           <http://en.wikipedia.org/wiki/Silhouette_(clustering)>`_
+           <https://en.wikipedia.org/wiki/Silhouette_(clustering)>`_
 
     """
     X, labels = check_X_y(X, labels)
@@ -158,7 +158,7 @@ def silhouette_samples(X, labels, metric='euclidean', **kwds):
        <http://www.sciencedirect.com/science/article/pii/0377042787901257>`_
 
     .. [2] `Wikipedia entry on the Silhouette Coefficient
-       <http://en.wikipedia.org/wiki/Silhouette_(clustering)>`_
+       <https://en.wikipedia.org/wiki/Silhouette_(clustering)>`_
 
     """
     le = LabelEncoder()
diff --git a/sklearn/metrics/ranking.py b/sklearn/metrics/ranking.py
index 807741e65a7cd97ec9fd1473958a95b298716fe9..da75465597f6937cc03ea6b8973ac94980115b86 100644
--- a/sklearn/metrics/ranking.py
+++ b/sklearn/metrics/ranking.py
@@ -155,7 +155,7 @@ def average_precision_score(y_true, y_score, average="macro",
     References
     ----------
     .. [1] `Wikipedia entry for the Average precision
-           <http://en.wikipedia.org/wiki/Average_precision>`_
+           <https://en.wikipedia.org/wiki/Average_precision>`_
 
     See also
     --------
@@ -227,7 +227,7 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
     References
     ----------
     .. [1] `Wikipedia entry for the Receiver operating characteristic
-            <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
+            <https://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
 
     See also
     --------
@@ -482,7 +482,7 @@ def roc_curve(y_true, y_score, pos_label=None, sample_weight=None,
     References
     ----------
     .. [1] `Wikipedia entry for the Receiver operating characteristic
-            <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
+            <https://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
 
 
     Examples
diff --git a/sklearn/metrics/regression.py b/sklearn/metrics/regression.py
index 672fdb17e30c0a12918c69f8091eb924bf3bfc60..af3a02d6f33f96791c7251e6880d2185053a024f 100644
--- a/sklearn/metrics/regression.py
+++ b/sklearn/metrics/regression.py
@@ -425,7 +425,7 @@ def r2_score(y_true, y_pred,
     References
     ----------
     .. [1] `Wikipedia entry on the Coefficient of determination
-            <http://en.wikipedia.org/wiki/Coefficient_of_determination>`_
+            <https://en.wikipedia.org/wiki/Coefficient_of_determination>`_
 
     Examples
     --------
diff --git a/sklearn/neighbors/classification.py b/sklearn/neighbors/classification.py
index 64f6df27a6c3ce8e16a6af0255c9e50900646e85..86224cac1526a55cc356de2b6cd93787dc496c76 100644
--- a/sklearn/neighbors/classification.py
+++ b/sklearn/neighbors/classification.py
@@ -114,7 +114,7 @@ class KNeighborsClassifier(NeighborsBase, KNeighborsMixin,
        but different labels, the results will depend on the ordering of the
        training data.
 
-    http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
+    https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
     """
 
     def __init__(self, n_neighbors=5,
@@ -312,7 +312,7 @@ class RadiusNeighborsClassifier(NeighborsBase, RadiusNeighborsMixin,
     See :ref:`Nearest Neighbors <neighbors>` in the online documentation
     for a discussion of the choice of ``algorithm`` and ``leaf_size``.
 
-    http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
+    https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
     """
 
     def __init__(self, radius=1.0, weights='uniform',
diff --git a/sklearn/neighbors/regression.py b/sklearn/neighbors/regression.py
index 0b1d4c03622dbb9838b63352d48e2c1e6d4116d6..b38de5acceaf32b93ef1779bfcd790a665221b5b 100644
--- a/sklearn/neighbors/regression.py
+++ b/sklearn/neighbors/regression.py
@@ -112,7 +112,7 @@ class KNeighborsRegressor(NeighborsBase, KNeighborsMixin,
        but different labels, the results will depend on the ordering of the
        training data.
 
-    http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
+    https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
     """
 
     def __init__(self, n_neighbors=5, weights='uniform',
@@ -250,7 +250,7 @@ class RadiusNeighborsRegressor(NeighborsBase, RadiusNeighborsMixin,
     See :ref:`Nearest Neighbors <neighbors>` in the online documentation
     for a discussion of the choice of ``algorithm`` and ``leaf_size``.
 
-    http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
+    https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
     """
 
     def __init__(self, radius=1.0, weights='uniform',
diff --git a/sklearn/neighbors/unsupervised.py b/sklearn/neighbors/unsupervised.py
index 590069b9ed55e6ed2d9dad38f61011339d38bd4d..7231c820976a4f90228ec34f431e962a3b0a4281 100644
--- a/sklearn/neighbors/unsupervised.py
+++ b/sklearn/neighbors/unsupervised.py
@@ -110,7 +110,7 @@ class NearestNeighbors(NeighborsBase, KNeighborsMixin,
     See :ref:`Nearest Neighbors <neighbors>` in the online documentation
     for a discussion of the choice of ``algorithm`` and ``leaf_size``.
 
-    http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
+    https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
     """
 
     def __init__(self, n_neighbors=5, radius=1.0,
diff --git a/sklearn/preprocessing/data.py b/sklearn/preprocessing/data.py
index 9c2866cbfe16df0abd22f38a4ec4d92456d4ebe8..e7453d00a75d4243cebf47b7a794caf08c05023d 100644
--- a/sklearn/preprocessing/data.py
+++ b/sklearn/preprocessing/data.py
@@ -941,8 +941,8 @@ class RobustScaler(BaseEstimator, TransformerMixin):
     -----
     See examples/preprocessing/plot_robust_scaling.py for an example.
 
-    http://en.wikipedia.org/wiki/Median_(statistics)
-    http://en.wikipedia.org/wiki/Interquartile_range
+    https://en.wikipedia.org/wiki/Median_(statistics)
+    https://en.wikipedia.org/wiki/Interquartile_range
     """
 
     def __init__(self, with_centering=True, with_scaling=True, copy=True):
diff --git a/sklearn/random_projection.py b/sklearn/random_projection.py
index 1a1414e3fc8237fbc48ddf88e00232a1c2c8ca77..19235732e71e23d25f84c76b899660f475123881 100644
--- a/sklearn/random_projection.py
+++ b/sklearn/random_projection.py
@@ -12,7 +12,7 @@ samples of the dataset.
 
 The main theoretical result behind the efficiency of random projection is the
 `Johnson-Lindenstrauss lemma (quoting Wikipedia)
-<http://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma>`_:
+<https://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma>`_:
 
   In mathematics, the Johnson-Lindenstrauss lemma is a result
   concerning low-distortion embeddings of points from high-dimensional
@@ -110,7 +110,7 @@ def johnson_lindenstrauss_min_dim(n_samples, eps=0.1):
     References
     ----------
 
-    .. [1] http://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma
+    .. [1] https://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma
 
     .. [2] Sanjoy Dasgupta and Anupam Gupta, 1999,
            "An elementary proof of the Johnson-Lindenstrauss Lemma."
@@ -584,7 +584,7 @@ class SparseRandomProjection(BaseRandomProjection):
            http://www.stanford.edu/~hastie/Papers/Ping/KDD06_rp.pdf
 
     .. [2] D. Achlioptas, 2001, "Database-friendly random projections",
-           http://www.cs.ucsc.edu/~optas/papers/jl.pdf
+           https://users.soe.ucsc.edu/~optas/papers/jl.pdf
 
     """
     def __init__(self, n_components='auto', density='auto', eps=0.1,
diff --git a/sklearn/tree/tree.py b/sklearn/tree/tree.py
index d33f2fbadcb80ace1a1c846795a2caa128fcd2a8..c2ba3f9e91f2bb5d348cdf09d3cd35cfb7714628 100644
--- a/sklearn/tree/tree.py
+++ b/sklearn/tree/tree.py
@@ -649,7 +649,7 @@ class DecisionTreeClassifier(BaseDecisionTree, ClassifierMixin):
     References
     ----------
 
-    .. [1] http://en.wikipedia.org/wiki/Decision_tree_learning
+    .. [1] https://en.wikipedia.org/wiki/Decision_tree_learning
 
     .. [2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification
            and Regression Trees", Wadsworth, Belmont, CA, 1984.
@@ -880,7 +880,7 @@ class DecisionTreeRegressor(BaseDecisionTree, RegressorMixin):
     References
     ----------
 
-    .. [1] http://en.wikipedia.org/wiki/Decision_tree_learning
+    .. [1] https://en.wikipedia.org/wiki/Decision_tree_learning
 
     .. [2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification
            and Regression Trees", Wadsworth, Belmont, CA, 1984.
diff --git a/sklearn/utils/linear_assignment_.py b/sklearn/utils/linear_assignment_.py
index edcedc4dba23f3cd247dbf32607bb78eb2f8d6c6..5282c84e211307add3f127fe418aa89957fcdfbe 100644
--- a/sklearn/utils/linear_assignment_.py
+++ b/sklearn/utils/linear_assignment_.py
@@ -50,7 +50,7 @@ def linear_assignment(X):
        *Journal of the Society of Industrial and Applied Mathematics*,
        5(1):32-38, March, 1957.
 
-    5. http://en.wikipedia.org/wiki/Hungarian_algorithm
+    5. https://en.wikipedia.org/wiki/Hungarian_algorithm
     """
     indices = _hungarian(X).tolist()
     indices.sort()