Discussion:
HTTPS, CPAN, and dist integrity
Michiel Beijen
2015-02-03 22:25:26 UTC
Permalink
Hi,

This Saturday at FOSDEM in the hallway I had some discussions with
leont, Tux and later also with .. oh I guess that was RJBS? I did not
introduce myself, very bad. Hi!

Basically I think the whole CPAN setup with 200+ mirrors sounded great
back in the 1990s and it is still widely touted as a feature of CPAN.
But I'm a bit concerned about package integrity. Most Linux
distributions (where the packages and ISOs are typically LOTS bigger)
who use mirrors have a system in place where they verify their
packages with GPG keys. If you do that then having many mirrors
outside of your control using plain HTTP is not a problem, but Perl
does not *really* have something like that. Yeah of course there is
the signatures list, which is GPG signed, but this signature is not
checked 'out of the box' as far as I know.

So assuming you can't really verify the integrity of a module on a
mirror from the client, I think it would be best not to use any
mirrors.
As far as I know, with StrawberryPerl or a client like cpanm, you only
use one mirror anyway. Maybe the parties involved can share how much
bandwith it takes them to see if it would be feasible to switch to
*one* source for CPAN with possibly a CDN underneath. The metacpan
seems to have a decent CDN now, has SSL certificates and a complete
index. I think they should be able to handle the additional data, but
this is just based on my gut feeling of scale of the thing, average
dist size, and such and not on actual facts.

The other problem is how to securely connect to the mirror. There is
no support for SSL in core perl. But I think in many cases, it would
be an acceptable solution to install IO::Socket::SSL from your linux
distro's distribution, and then have the CPAN client 'auto-select' the
https version of the cpan mirror. If desired the CPAN client could
complain about not having SSL when IO::Socket::SSL is not installed.

Please let me know if this would be feasible and what your concerns would be.

I'd be willing to contribute patches to the cpanpm client to use HTTPS
if available, and to rip out the mirrorlist stuff.
--
Michiel
Mike Doherty
2015-02-04 00:16:07 UTC
Permalink
Doesn't cpan know how to use curl or wget if the system has it installed?
Probably easier to bootstrap TLS support in perl that way.

-Mike
Post by Michiel Beijen
Hi,
This Saturday at FOSDEM in the hallway I had some discussions with
leont, Tux and later also with .. oh I guess that was RJBS? I did not
introduce myself, very bad. Hi!
Basically I think the whole CPAN setup with 200+ mirrors sounded great
back in the 1990s and it is still widely touted as a feature of CPAN.
But I'm a bit concerned about package integrity. Most Linux
distributions (where the packages and ISOs are typically LOTS bigger)
who use mirrors have a system in place where they verify their
packages with GPG keys. If you do that then having many mirrors
outside of your control using plain HTTP is not a problem, but Perl
does not *really* have something like that. Yeah of course there is
the signatures list, which is GPG signed, but this signature is not
checked 'out of the box' as far as I know.
So assuming you can't really verify the integrity of a module on a
mirror from the client, I think it would be best not to use any
mirrors.
As far as I know, with StrawberryPerl or a client like cpanm, you only
use one mirror anyway. Maybe the parties involved can share how much
bandwith it takes them to see if it would be feasible to switch to
*one* source for CPAN with possibly a CDN underneath. The metacpan
seems to have a decent CDN now, has SSL certificates and a complete
index. I think they should be able to handle the additional data, but
this is just based on my gut feeling of scale of the thing, average
dist size, and such and not on actual facts.
The other problem is how to securely connect to the mirror. There is
no support for SSL in core perl. But I think in many cases, it would
be an acceptable solution to install IO::Socket::SSL from your linux
distro's distribution, and then have the CPAN client 'auto-select' the
https version of the cpan mirror. If desired the CPAN client could
complain about not having SSL when IO::Socket::SSL is not installed.
Please let me know if this would be feasible and what your concerns would be.
I'd be willing to contribute patches to the cpanpm client to use HTTPS
if available, and to rip out the mirrorlist stuff.
--
Michiel
Michiel Beijen
2015-02-04 07:17:57 UTC
Permalink
Absolutely correct! I forgot about that; it is used for perl < 5.14 where
there is no HTTP::Tiny or LWP in core. And yes, this would be the best way
to go about this I think.

Can anyone fill in on the feasibility of directing all cpan cients to *one*
site, i.e. https://cpan.metacpan.org/ ?
--
Michiel
Post by Mike Doherty
Doesn't cpan know how to use curl or wget if the system has it installed?
Probably easier to bootstrap TLS support in perl that way.
-Mike
Post by Michiel Beijen
Hi,
This Saturday at FOSDEM in the hallway I had some discussions with
leont, Tux and later also with .. oh I guess that was RJBS? I did not
introduce myself, very bad. Hi!
Basically I think the whole CPAN setup with 200+ mirrors sounded great
back in the 1990s and it is still widely touted as a feature of CPAN.
But I'm a bit concerned about package integrity. Most Linux
distributions (where the packages and ISOs are typically LOTS bigger)
who use mirrors have a system in place where they verify their
packages with GPG keys. If you do that then having many mirrors
outside of your control using plain HTTP is not a problem, but Perl
does not *really* have something like that. Yeah of course there is
the signatures list, which is GPG signed, but this signature is not
checked 'out of the box' as far as I know.
So assuming you can't really verify the integrity of a module on a
mirror from the client, I think it would be best not to use any
mirrors.
As far as I know, with StrawberryPerl or a client like cpanm, you only
use one mirror anyway. Maybe the parties involved can share how much
bandwith it takes them to see if it would be feasible to switch to
*one* source for CPAN with possibly a CDN underneath. The metacpan
seems to have a decent CDN now, has SSL certificates and a complete
index. I think they should be able to handle the additional data, but
this is just based on my gut feeling of scale of the thing, average
dist size, and such and not on actual facts.
The other problem is how to securely connect to the mirror. There is
no support for SSL in core perl. But I think in many cases, it would
be an acceptable solution to install IO::Socket::SSL from your linux
distro's distribution, and then have the CPAN client 'auto-select' the
https version of the cpan mirror. If desired the CPAN client could
complain about not having SSL when IO::Socket::SSL is not installed.
Please let me know if this would be feasible and what your concerns would be.
I'd be willing to contribute patches to the cpanpm client to use HTTPS
if available, and to rip out the mirrorlist stuff.
--
Michiel
Cosimo Streppone
2015-02-04 09:29:55 UTC
Permalink
Post by Michiel Beijen
Can anyone fill in on the feasibility of directing all cpan cients to
*one* site, i.e. https://cpan.metacpan.org/ ?
Having multiple mirrors is IMO one of the many things
that CPAN got right from the start.

Other similar but centralized package repositories
have failed (and continue to fail) miserably.
Why go centralised if the problem is elsewhere?

CDNs, while being distributed, are managed centrally
by one entity, who also pays the bandwidth/service cost.
--
Cosimo
Michiel Beijen
2015-02-04 09:36:41 UTC
Permalink
Hi Cosimo,
Post by Cosimo Streppone
Post by Michiel Beijen
Can anyone fill in on the feasibility of directing all cpan cients to
*one* site, i.e. https://cpan.metacpan.org/ ?
Having multiple mirrors is IMO one of the many things
that CPAN got right from the start.
Other similar but centralized package repositories
have failed (and continue to fail) miserably.
Why go centralised if the problem is elsewhere?
CDNs, while being distributed, are managed centrally
by one entity, who also pays the bandwidth/service cost.
Yeah of course, it should not be one **host** - but it can still be
one URL which leverages a CDN right? Are you saying "CDN, bad,
mirrors, good?

I understand this might mean more bandwidth cost for the one who pays
the bills - therefore I've asked about how we could find out about the
amount of traffic that would be involved.

If bandwidth would be much of a concern it could also be a possibility
to get the checksums for the dists from one source via HTTPS, perform
the download from a mirror and then verify the checksum.

--
Michiel
Cosimo Streppone
2015-02-04 10:28:45 UTC
Permalink
Post by Michiel Beijen
Hi Cosimo,
Post by Cosimo Streppone
Post by Michiel Beijen
Can anyone fill in on the feasibility of directing all cpan cients to
*one* site, i.e. https://cpan.metacpan.org/ ?
Having multiple mirrors is IMO one of the many things
that CPAN got right from the start.
[...]
CDNs, while being distributed, are managed centrally
by one entity, who also pays the bandwidth/service cost.
Yeah of course, it should not be one **host** - but it can still be
one URL which leverages a CDN right? Are you saying "CDN, bad,
mirrors, good?
Of course I am not saying that :-)

Just, it's easy to look at the neighbor's garden and think
the grass is greener.

I've been in other gardens, and it's often not the case :)

/C
David Cantrell
2015-02-04 11:46:19 UTC
Permalink
Post by Michiel Beijen
Basically I think the whole CPAN setup with 200+ mirrors sounded great
back in the 1990s and it is still widely touted as a feature of CPAN.
Having a zillion mirrors is no longer a killer feature - the net is now
much better connected, bandwidth is cheap, and site reliability is much
higher than it used to be. However, the ability to easily create a
mirror is still a nifty feature. It makes it dead easy to:

* have a mirror on my laptop for hacking on the move;
* have a customised module repository where all the normal tools "just
work"

The latter is really important. It lets companies add their non-public
code to a CPAN mirror-a-like. It lets you "pin" some of your
dependencies to particular versions. It lets you do things like the
cpXXXan.
--
David Cantrell | Godless Liberal Elitist

" In My Egotistical Opinion, most people's ... programs should be
indented six feet downward and covered with dirt. "
--Blair P. Houghton
Michiel Beijen
2015-02-04 12:40:51 UTC
Permalink
Hi David,
Post by David Cantrell
Having a zillion mirrors is no longer a killer feature - the net is now
much better connected, bandwidth is cheap, and site reliability is much
higher than it used to be. However, the ability to easily create a
* have a mirror on my laptop for hacking on the move;
* have a customised module repository where all the normal tools "just
work"
The latter is really important. It lets companies add their non-public
code to a CPAN mirror-a-like. It lets you "pin" some of your
dependencies to particular versions. It lets you do things like the
cpXXXan.
I'm not saying that all mirrors should go, and I'm not saying that you
should not be able to insert your own servers (or file locations) in
your urllist! That's a useful feature and should absolutely stay.

What I'm saying is that I think the *default* out-of-box setup should
go use some central SSL-enabled website - which now, on latest CPAN,
uses http://www.cpan.org by default.
--
Michiel

Loading...