PEP: 376
Title: Changing the .egg-info structure
Version: $Revision: 75414 $
Last-Modified: $Date: 2009-10-14 14:39:17 -0400 (Wed, 14 Oct 2009) $
Author: Tarek Ziadé <tarek@ziade.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Feb-2009
Python-Version: 2.7, 3.2
Post-History:


Abstract
========

This PEP proposes various enhancements for Distutils:

- A new format for the .egg-info structure.
- Some APIs to read the metadata of a distribution.
- A replacement PEP 262.
- An uninstall feature.

Definitions
===========

A **distribution** is a collection of files, which can be Python modules,
extensions, or data. A distribution is managed by a special module called
`setup.py` which contains a call to the `distutils.core.setup` function.
The arguments passed to that function describe the distribution, like
its `name`, its `version`, and so on.

Distutils provides, among other things, **commands** that can be called
through the shell using the `setup.py` script. An `sdist` command is provided
for instance to create a source distribution archive. An `install` command
is also provided to perform an installation of the distribution in the Python
installation the script is invoked with::

    $ python setup.py install

See the Distutils [#distutils]_ documentation for more information.

Once installed, the elements are located in various places in the system, like:

- In Python's site-packages (Python modules, Python modules organized into
  packages, Extensions, etc.)
- In Python's `include` directory.
- In Python's `bin` or `Script` directory.
- Etc.

Rationale
=========

There are two problems right now in the way distributions are installed in
Python:

- There are too many ways to do it.
- There is no API to get the metadata of installed distributions.

How distributions are installed
-------------------------------

Right now, when a distribution is installed in Python, the elements it
contains are installed in various directories.

The pure Python code, for instance, is installed in the `purelib` directory
which is located in the Python installation at ``lib/python2.6/site-packages``
for example under Unix-like systems or Mac OS X, and in ``Lib\site-packages``
under Windows. This is done with the Distutils `install` command, which calls
various subcommands.

The `install_egg_info` subcommand is called during this process in order to
create an `.egg-info` file in the `purelib` directory.

For example, for the `docutils` distribution, which contains one package an
extra module and executable scripts, three elements are installed in
`site-packages`:

- `docutils`: The ``docutils`` package.
- `roman.py`: An extra module used by `docutils`.
- `docutils-0.5-py2.6.egg-info`: A file containing the distribution metadata
  as described in PEP 314 [#pep314]_. This file corresponds to the file
  called `PKG-INFO`, built by the `sdist` command.

Some executable scripts, such as `rst2html.py`, are also be added in the
`bin` directory of the Python installation.

Another project called `setuptools` [#setuptools]_ has two other formats 
to install distributions, called `EggFormats` [#eggformats]_:

- a self-contained `.egg` directory, that contains all the distribution files
  and the distribution metadata in a file called `PKG-INFO` in a subdirectory
  called `EGG-INFO`. `setuptools` creates other fils in that directory that can
  be considered as complementary metadata.

- a `.egg-info` directory installed in `site-packages`, that contains the same
  files `EGG-INFO` has in the `.egg` format.

The first format is automatically used when you install a distribution that
uses the ``setuptools.setup`` function in its setup.py file, instead of
the ``distutils.core.setup`` one.

The `setuptools` project also provides an executable script called 
`easy_install` [#easyinstall]_ that installs all distributions, including 
distutils-based ones in self-contained `.egg` directories.

If you want to have a standalone `.egg.info` directory distributions, e.g. 
the second `setuptools` format, you have to force it when you work
with a setuptools-based distribution or with the `easy_install` script.
You can force it by using the `-–single-version-externally-managed` option 
**or** the `--root` option.

This option is used by :

- the `pip` [#pip]_ installer 
- the Fedora packagers [#fedora]_.
- the Debian packagers [#debian]_.

Uninstall information
---------------------

Distutils doesn't provide an `uninstall` command. If you want to uninstall
a distribution, you have to be a power user and remove the various elements
that were installed, and then look over the `.pth` file to clean them if
necessary.

And the process differs depending on the tools you have used to install the
distribution and if the distribution's `setup.py` uses Distutils or
Setuptools.

Under some circumstances, you might not be able to know for sure that you
have removed everything, or that you didn't break another distribution by
removing a file that is shared among several distributions.

But there's a common behavior: when you install a distribution, files are
copied in your system. And it's possible to keep track of these files for
later removal.

What this PEP proposes
----------------------

To address those issues, this PEP proposes a few changes:

- A new `.egg-info` structure using a directory, based on one format of
  the `EggFormats` standard from `setuptools`.
- New APIs in `pkgutil` to be able to query the information of installed
  distributions.
- A de-facto replacement for PEP 262
- An uninstall function and an uninstall script in Distutils.


.egg-info becomes a directory
=============================

As explained earlier, the `EggFormats` standard from `setuptools` proposes two 
formats to install the metadata information of a distribution:

- A self-contained directory that can be zipped or left unzipped and contains
  the distribution files *and* an `.egg-info` directory containing the 
  metadata.

- A distinct `.egg-info` directory located in the site-packages directory,
  with the metadata inside.

This PEP proposes to keep just one format and make it the standard way to
install the metadata of a distribution : a distinct `.egg-info` directory 
located in the site-packages directory, containing the metadata.

This `.egg-info` directory contains a `PKG-INFO` file built by the 
`write_pkg_file` method of the `Distribution` class in Distutils.

This change does not impact Python itself because the metadata files are not
used anywhere yet in the standard library besides Distutils.

It does impact the `setuptools` and `pip` projects, but given the fact that
they already work with a directory that contains a `PKG-INFO` file, the change
will have no deep consequences.

Let's take an example of the new format with the `docutils` distribution. 
The elements installed in `site-packages` are::

    - docutils/
    - roman.py
    - docutils-0.5.egg-info/
        PKG-INFO

The syntax of the egg-info directory name is as follows::

    name + '-' + version + '.egg-info'

The egg-info directory name is created using a new function called
``egginfo_dirname(name, version)`` added to ``pkgutil``. ``name`` is
converted to a standard distribution name by replacing any runs of
non-alphanumeric characters with a single '-'. ``version`` is converted
to a standard version string. Spaces become dots, and all other
non-alphanumeric characters (except dots) become dashes, with runs of
multiple dashes condensed to a single dash. Both attributes are then
converted into their filename-escaped form, i.e. any '-' characters are
replaced with '_' other than the one in 'egg-info' and the one 
separating the name from the version number.

Examples::

    >>> egginfo_dirname('docutils', '0.5')
    'docutils-0.5.egg-info'

    >>> egginfo_dirname('python-ldap', '2.5')
    'python_ldap-2.5.egg-info'

    >>> egginfo_dirname('python-ldap', '2.5 a---5')
    'python_ldap-2.5.a_5.egg-info'

Adding a RECORD file in the .egg-info directory
===============================================

A `RECORD` file is added inside the `.egg-info` directory at installation
time when installing a source distribution using the `install` command.
Notice that when installing a binary distribution created with `bdist` command 
or a `bdist`-based command, the `RECORD` file will be installed as well since
these commands use the `install` command to create a binary distributions.

The `RECORD` file holds the list of installed files. These correspond
to the files listed by the `record` option of the `install` command, and will
be generated by default. This allows the implementation of an uninstallation
feature, as explained later in this PEP. The `install` command also provides 
an option to prevent the `RECORD` file from being written and this option 
should be used when creating system packages.

Third-party installation tools also should not overwrite or delete files
that are not in a RECORD file without prompting or warning.

This RECORD file is inspired from PEP 262 FILES [#pep262]_.

The RECORD format
-----------------

The `RECORD` file is a CSV file, composed of records, one line per
installed file. The ``csv`` module is used to read the file, with
these options:

- field delimiter : `,`
- quoting char :  `"`.
- line terminator : ``os.linesep`` (so ``\r\n`` or ``\n``)

Each record is composed of three elements.

- the file's full **path**

 - if the installed file is located in the directory where the `.egg-info`
   directory of the package is located, it's a '/'-separated relative
   path, no matter what the target system is. This makes this information
   cross-compatible and allows simple installations to be relocatable.

 - if the installed file is located under ``sys.prefix`` or 
   `sys.exec_prefix``, it's a it's a '/'-separated relative path prefixed
   by the `$PREFIX` or the `$EXEC_PREFIX` string. The `install` command
   decides which prefix to use depending on the files. For instance if
   it's an executable script defined in the `scripts` option of the 
   setup script, `$EXEC_PREFIX` will be used. If `install` doesn't know
   which prefix to use, `$PREFIX` is preferred.

- the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo`
  generated files don't have any hash because they are automatically produced
  from `py` files. So checking the hash of the corresponding `py` file is
  enough to decide if the file and its associated `pyc` or `pyo` files have
  changed.

- the file's size in bytes

The ``csv`` module is used to generate this file, so the field separator is
",". Any "," characters found within a field is escaped automatically by ``csv``.

When the file is read, the `U` option is used so the universal newline
support (see PEP 278 [#pep278]_) is activated, avoiding any trouble
reading a file produced on a platform that uses a different new line
terminator.

Example
-------

Back to our `docutils` example, we now have::

    - docutils/
    - roman.py
    - docutils-0.5.egg-info/
        PKG-INFO
        RECORD

And the RECORD file contains (extract)::

    docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
    docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
    roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
    $EXEC_PREFIX/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
    docutils-0.5.egg-info/PKG-INFO,6fe57de576d749536082d8e205b77748,195
    docutils-0.5.egg-info/RECORD

Notice that:

- the `RECORD` file can't contain a hash of itself and is just mentioned here
- `docutils` and `docutils-0.5.egg-info` are located in `site-packages` so the file
  paths are relative to it.

Adding an INSTALLER file in the .egg-info directory
===================================================

The `install` command has a new option called `installer`. This option
is the name of the tool used to invoke the installation. It's an normalized
lower-case string matching `[a-z0-9_\-\.]`.

    $ python setup.py install --installer=pkg-system

It defaults to `distutils` if not provided.

When a distribution is installed, the INSTALLER file is generated in the
`.egg-info` directory with this value, to keep track of **who** installed the
distribution. The file is a single-line text file.

Adding a REQUESTED file in the .egg-info directory
==================================================

If a distribution is installed by direct user request (the usual
case), a file REQUESTED is added to the .egg-info directory of the
installed distribution. The REQUESTED file may be empty, or may
contain a marker comment line beginning with the "#" character.

If an install tool installs a distribution automatically, as a
dependency of another distribution, the REQUESTED file should not be
created.

The ``install`` command of distutils by default creates the REQUESTED
file. It accepts --requested and --no-requested options to explicitly
specify whether the file is created.

If a package that was already installed on the system as a dependency
is later installed by name, the distutils ``install`` command will
create the REQUESTED file in the .egg-info directory of the existing
installation.

Rationale
---------

Some install tools automatically detect unfulfilled dependencies and
install them. These tools may also want to be able to alert the user
if distributions are left on the system in an "orphan" state:
installed as a dependency of another distribution, which has since
been removed.

In order to provide information about orphaned dependencies, knowing
the dependency graph for installed distributions does not suffice (a
package that is not required by any other package may be on the system
because the user needs it directly). It is also necessary to know, for
each installed package, one additional bit of information: whether it
was installed "by request" or solely as a dependency.

Each (un)install tool could of course record that bit in its own
separate metadata cache, but this will break down if multiple tools
are used to work with installed packages on the same system. If
distutils takes care of this bit of metadata in a standard way,
multiple tools can cooperate and correctly handle orphaned
dependencies.

(In contrast, it is not necessary for distutils to record or manage
the full dependency graph for installed packages, because the list of
installed packages and their dependency metadata, standardized in PEP
345, allow any tool to independently calculate the dependency graph.)

An (un)installer tool which works with dependencies could use the
REQUESTED metadata for orphan detection as follows: an orphaned
distribution is any installed distribution that doesn't have a
REQUESTED file and is not required by any other installed
distribution.

The availability of the REQUESTED metadata of course does not obligate
any tool to provide this orphan-detection feature, or to implement it
in a certain way; for instance, distutils has no opinion about whether
tools should automatically remove newly-orphaned dependencies at
uninstall time.


New APIs in pkgutil
===================

To use the `.egg-info` directory content, we need to add in the standard
library a set of APIs. The best place to put these APIs is `pkgutil`.

Query functions
---------------

The new functions added in the ``pkgutil`` are :

- ``get_distributions()`` -> iterator of ``Distribution`` instances.

  Provides an iterator that looks for ``.egg-info`` directories in 
  ``sys.path`` and returns ``Distribution`` instances for
  each one of them.

- ``get_distribution(name)`` -> ``Distribution`` or None.

  Scans all elements in ``sys.path`` and looks for all directories ending with
  ``.egg-info``. Returns a ``Distribution`` corresponding to the 
  ``.egg-info`` directory that contains a PKG-INFO that matches `name` 
  for the `name` metadata.

  Notice that there should be at most one result. The first result found
  is returned. If the directory is not found, returns None.

- ``get_file_users(path)`` -> iterator of ``Distribution`` instances.

  Iterates over all distributions to find out which distributions uses ``path``.
  ``path`` can be a local absolute path or a relative '/'-separated path.

Distribution class
------------------

A new class called ``Distribution`` is created with the path of the
`.egg-info` directory provided to the constructor. It reads the metadata
contained in `PKG-INFO` when it is instantiated.

``Distribution(path)`` -> instance

  Creates a ``Distribution`` instance for the given ``path``.

``Distribution`` provides the following attributes:

- ``name``: The name of the distribution.

- ``metadata``: A ``DistributionMetadata`` instance loaded with the
  distribution's PKG-INFO file.

- ``requested``: A boolean that indicates whether the REQUESTED
  metadata file is present (in other words, whether the package was
  installed by user request).

And following methods:

- ``get_installed_files(local=False)`` -> iterator of (path, md5, size)

  Iterates over the `RECORD` entries and return a tuple ``(path, md5, size)``
  for each line. If ``local`` is ``True``, the path is transformed into a
  local absolute path. Otherwise the raw value from `RECORD` is returned.

  A local absolute path is an absolute path in which occurrences of '/'
  have been replaced by the system separator given by ``os.sep``.

- ``uses(path)`` -> Boolean

  Returns ``True`` if ``path`` is listed in `RECORD`. ``path``
  can be a local absolute path or a relative '/'-separated path.

- ``get_egginfo_file(path, binary=False)`` -> file object

   Returns a file located under the `.egg-info` directory.

   Returns a ``file`` instance for the file pointed by ``path``.

   ``path`` has to be a '/'-separated path relative to the `.egg-info`
   directory or an absolute path.

   If ``path`` is an absolute path and doesn't start with the `.egg-info`
   directory path, a ``DistutilsError`` is raised.

   If ``binary`` is ``True``, opens the file in read-only binary mode (`rb`),
   otherwise opens it in read-only mode (`r`).

- ``get_egginfo_files(local=False)`` -> iterator of paths

  Iterates over the `RECORD` entries and return paths for each line if the path
  is pointing a file located in the `.egg-info` directory or one of its
  subdirectory.

  If ``local`` is ``True``, each path is transformed into a
  local absolute path. Otherwise the raw value from `RECORD` is returned.


Notice that the API is organized in five classes that work with directories 
and Zip files (so it works with files included in Zip files, see PEP 273 for
more details [#pep273]_). These classes are described in the documentation 
of the prototype implementation for interested readers [#prototype]_.

Usage example
-------------

Let's use some of the new APIs with our `docutils` example::

    >>> from pkgutil import get_distribution, get_file_users
    >>> dist = get_distribution('docutils')
    >>> dist.name
    'docutils'
    >>> dist.metadata.version
    '0.5'

    >>> for path, hash, size in dist.get_installed_files()::
    ...     print '%s %s %d' % (path, hash, size)
    ...
    docutils/__init__.py b690274f621402dda63bf11ba5373bf2 9544
    docutils/core.py 9c4b84aff68aa55f2e9bf70481b94333 66188
    roman.py a4b84aff68aa55f2e9bf70481b943D3 234
    /usr/local/bin/rst2html.py a4b84aff68aa55f2e9bf70481b943D3 234
    docutils-0.5.egg-info/PKG-INFO 6fe57de576d749536082d8e205b77748 195
    docutils-0.5.egg-info/RECORD None None

    >>> dist.uses('docutils/core.py')
    True

    >>> dist.uses('/usr/local/bin/rst2html.py')
    True

    >>> dist.get_egginfo_file('PKG-INFO')
    <open file at ...>

    >>> dist.requested
    True

PEP 262 replacement
===================

In the past an attempt was made to create a installation database (see PEP 262
[#pep262]_).

Extract from PEP 262 Requirements:

    " We need a way to figure out what distributions, and what versions of
    those distributions, are installed on a system..."


Since the APIs proposed in the current PEP provide everything needed to meet
this requirement, PEP 376 replaces PEP 262 and becomes the official 
`installation database` standard.

The new version of PEP 345 (XXX work in progress) extends the Metadata
standard and fullfills the requirements described in PEP 262, like the 
`REQUIRES` section.

Adding an Uninstall function
============================

Distutils already provides a very basic way to install a distribution, which
is running the `install` command over the `setup.py` script of the
distribution.

Distutils will provide a very basic ``uninstall`` function, that is added
in ``distutils.util`` and takes the name of the distribution to uninstall
as its argument. ``uninstall`` uses the APIs described earlier and remove all
unique files, as long as their hash didn't change. Then it removes empty 
directories left behind.

``uninstall`` returns a list of uninstalled files::

    >>> from distutils.util import uninstall
    >>> uninstall('docutils')
    ['/opt/local/lib/python2.6/site-packages/docutils/core.py',
     ...
     '/opt/local/lib/python2.6/site-packages/docutils/__init__.py']

If the distribution is not found, a ``DistutilsUninstallError`` is be raised.

Filtering
---------

To make it a reference API for third-party projects that wish to control
how `uninstall` works, a second callable argument can be used. It's 
called for each file that is removed. If the callable returns `True`, the
file is removed. If it returns False, it's left alone.

Examples::

    >>> def _remove_and_log(path):
    ...     logging.info('Removing %s' % path)
    ...     return True
    ...
    >>> uninstall('docutils', _remove_and_log)

    >>> def _dry_run(path):
    ...     logging.info('Removing %s (dry run)' % path)
    ...     return False
    ...
    >>> uninstall('docutils', _dry_run)

Of course, a third-party tool can use ``pkgutil`` APIs to implement
its own uninstall feature.

Installer marker
----------------

As explained earlier in this PEP, the `install` command adds an `INSTALLER`
file in the `.egg-info` directory with the name of the installer.

To avoid removing distributions that where installed by another packaging system,
the ``uninstall`` function takes an extra argument ``installer`` which default
to ``distutils``.

When called, ``uninstall`` controls that the ``INSTALLER`` file matches
this argument. If not, it raises a ``DistutilsUninstallError``::

    >>> uninstall('docutils')
    Traceback (most recent call last):
    ...
    DistutilsUninstallError: docutils was installed by 'cool-pkg-manager'

    >>> uninstall('docutils', installer='cool-pkg-manager')

This allows a third-party application to use the ``uninstall`` function
and strongly suggest that no other program remove a distribution it has
previously installed. This is useful when a third-party program that relies
on Distutils APIs does extra steps on the system at installation time,
it has to undo at uninstallation time.

Adding an Uninstall script
==========================

An `uninstall` script is added in Distutils. and is used like this::

    $ python -m distutils.uninstall packagename

Notice that script doesn't control if the removal of a distribution breaks
another distribution. Although it makes sure that all the files it removes
are not used by any other distribution, by using the uninstall function.

Also note that this uninstall script pays no attention to the
REQUESTED metadata; that is provided only for use by external tools to
provide more advanced dependency management.

Backward compatibility and roadmap
==================================

These changes don't introduce any compatibility problems with the previous
version of Distutils, and will also work with existing third-party tools.

The plan is to include the functionality outlined in this PEP in distutils for
Python 2.7 and Python 3.2. A backport of the new distutils for 2.5, 2.6, 3.0
and 3.1 is provided so people can benefit from these new features.

Distributions installed using existing, pre-standardization formats do not have
the necessary metadata available for the new API, and thus will be
ignored. Third-party tools may of course continue to support previous
formats in addition to the new format, in order to ease the transition.


References
==========

.. [#distutils]
   http://docs.python.org/distutils

.. [#pep262]
   http://www.python.org/dev/peps/pep-0262

.. [#pep314]
   http://www.python.org/dev/peps/pep-0314

.. [#setuptools]
   http://peak.telecommunity.com/DevCenter/setuptools

.. [#easyinstall]
   http://peak.telecommunity.com/DevCenter/EasyInstall

.. [#pip]
   http://pypi.python.org/pypi/pip

.. [#eggformats]
   http://peak.telecommunity.com/DevCenter/EggFormats

.. [#pep273]
   http://www.python.org/dev/peps/pep-0273

.. [#pep278]
   http://www.python.org/dev/peps/pep-0278

.. [#fedora]
   http://fedoraproject.org/wiki/Packaging/Python/Eggs#Providing_Eggs_using_Setuptools

.. [#debian]
   http://wiki.debian.org/DebianPython/NewPolicy

.. [#prototype]
   http://bitbucket.org/tarek/pep376/   

Acknowledgements
================

Jim Fulton, Ian Bicking, Phillip Eby, and many people at Pycon and Distutils-SIG.

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
