How might we mark a test suite isn't parallalizable?

Discussion:

How might we mark a test suite isn't parallalizable?

brian d foy

2013-05-02 19:39:13 UTC

In HARNESS_OPTIONS we can set -jN to note we want parallel tests
running, but how can a particular module, which might be buried in the
dependency chain, tell the harness it can't do that?

It seems to me that by the time the tests are running, it's too late
because they are already in parallel and the best we can do is issue a
warning or decide to fail.

Karen Etheridge

2013-05-02 19:51:06 UTC

Post by brian d foy
In HARNESS_OPTIONS we can set -jN to note we want parallel tests
running, but how can a particular module, which might be buried in the
dependency chain, tell the harness it can't do that?

When can a test not be parallelizable? Most of the examples that I can
think of (depending on a file resource, etc) smell like a design failure.
Tests should usually be able to use a unique temp file, etc.

Post by brian d foy
It seems to me that by the time the tests are running, it's too late
because they are already in parallel and the best we can do is issue a
warning or decide to fail.

Is it too late if the test itself declares "I cannot be parallelized"? If
not, then this declaration could be worked into Test::DescribeMe and
friends.

Mark Stosberg

2013-05-02 20:01:23 UTC

Post by Karen Etheridge
When can a test not be parallelizable? Most of the examples that I can
think of (depending on a file resource, etc) smell like a design failure.
Tests should usually be able to use a unique temp file, etc.

Here's an example:

Say you are testing a web application that does a bulk export of a
database table.

The test works by doing a slow "count(*)" on the table, then having the
app what should generate 2 rows, and then running another slow
"count(*)" on the table, then checking that if the new value is the
original value plus 2.

This isn't parallel safe, because other tests running in parallel could
insert rows into the table in between the before and after counts.

One solution is to use a unit test instead. If you do that, the test and
the application can be made to share the same database handle. In
PostgreSQL, you can create a temporary table of the same name which
masks the original. Thus, your test has exclusive access to the table,
and the test can be made to run parallel-safe. It may also run much
faster, as the temporary table may have 10 rows in it instead of 100,000.

Other kinds of mocking could be used at this point as well.

Alternatively, you could still use a functional-style test by using
Test::WWW::Mechanize::PSGI instead of testing through your web server.
In this arrangement, the app and the test also run in the same process,
and can be made to share the same database handle, allowing the same
kinds of solutions as above.

I'll be talking more about related topics at YAPC::NA in my talk on
improving the performance of large test suites.

Mark

David Cantrell

2013-05-03 12:17:12 UTC

Post by Karen Etheridge

Post by brian d foy
In HARNESS_OPTIONS we can set -jN to note we want parallel tests
running, but how can a particular module, which might be buried in the
dependency chain, tell the harness it can't do that?

When can a test not be parallelizable?

When you use Test::Class and all your tests inherit this:

sub start_from_a_known_state_because_doing_anything_else_is_stupid :Test(startup) {
shift()->truncate_all_database_tables();
}

No, you can't have your tests clean up after themselves. For two
reasons. First, that means you have to scatter house-keeping crap all
over your tests. Second, if you have a test failure you want the
results to be sitting there in the database to help with debugging.

--
David Cantrell | even more awesome than a panda-fur coat

All children should be aptitude-tested at an early age and,
if their main or only aptitude is for marketing, drowned.

Mark Stosberg

2013-05-03 15:03:28 UTC

Post by David Cantrell
No, you can't have your tests clean up after themselves. For two
reasons. First, that means you have to scatter house-keeping crap all
over your tests. Second, if you have a test failure you want the
results to be sitting there in the database to help with debugging.

There is another way to have tests clean up after themselves, which
addresses both the shortcoming you address.

First, I have functions like "insert_test_user()". At the end of these
functions, there is another call like this:

schedule_test_user_for_deletion(...)

That inserts a row into a table "test_ids_to_delete" which includes
column for the table name and primary key of the entity to delete.
Another column has the insertion timestamp.

So, there's no "clean-up" code in all of our test scripts, and we have
the data there for debugging when the tests are done.

In Jenkins, after the test suite runs, a job to "Delete Test Data" is
kicked off, which deletes all test data older than hour. (Longer than it
takes the test suite to run).

There's a third reason not to do in-line test clean-up, which is that a
SQL "DELETE" operation can be relatively slow, especially when complex
RI is involved. Doing this "offline" as we do speeds that up.

There are still a few places where we have clean-up code in tests, but
it is exception. Those are the cases in which we can't use functions
like "insert_test_user()".

For example, if we are creating an entity by filling out a form on the
web and submitting it, then we may need to manually register that for
clean up later.

Mark

David Cantrell

2013-05-03 15:34:44 UTC

Post by Mark Stosberg

Post by David Cantrell
No, you can't have your tests clean up after themselves. For two
reasons. First, that means you have to scatter house-keeping crap all
over your tests. Second, if you have a test failure you want the
results to be sitting there in the database to help with debugging.

There is another way to have tests clean up after themselves, which
addresses both the shortcoming you address.
First, I have functions like "insert_test_user()". At the end of these
schedule_test_user_for_deletion(...)
That inserts a row into a table "test_ids_to_delete" which includes
column for the table name and primary key of the entity to delete.
Another column has the insertion timestamp.
So, there's no "clean-up" code in all of our test scripts, and we have
the data there for debugging when the tests are done.

OK, but you still have to clean out your database before you start each
independent chunk of your test suite, otherwise you start from an
unknown state. You might, for example, start with broken data still
hanging around from a previous test failure, which then causes your
correct code to fail, which is a massive pain in the arse if you are, as
you should be, using your tests as a debugging aid.

Consider, for example, your test for "retrieve a list of all customers
from the database and inflate them to objects". If some previous test
left a broken customer record in the database, then your code for
inflating customers to objects will (correctly but unexpectedly) fail.

If you want to paralellise tests that use a shared resource, such as a
database, then you will normally need to mock that shared resource.

Post by Mark Stosberg
In Jenkins, after the test suite runs, a job to "Delete Test Data" is
kicked off, which deletes all test data older than hour. (Longer than it
takes the test suite to run).

What about when you're not running under Jenkins. Like when you're
writing and testing your code. You still need to start testing from a
known state each time, which means you must clean out the database at
startup.

--
David Cantrell | Cake Smuggler Extraordinaire

Nuke a disabled unborn gay baby whale for JESUS!

Mark Stosberg

2013-05-03 15:48:19 UTC

Post by David Cantrell
OK, but you still have to clean out your database before you start each
independent chunk of your test suite, otherwise you start from an
unknown state.

In a lot of cases, this isn't true. This pattern is quite common:

1. Insert entity.
2. Test with entity just inserted.

Since all that my test cares about is the unique entity or entities, the
state of the rest of database doesn't matter. The state the matters is
in a "known state".

Post by David Cantrell
What about when you're not running under Jenkins. Like when you're
writing and testing your code. You still need to start testing from a
known state each time, which means you must clean out the database at
startup.

We have a cron job that runs overnight to clean up anything that was
missed in Jenkin's runs.

We expect our tests to generally work in the face of a "dirty" database.
If they don't, that's considered a flaw in the test. This is important
to run several tests against the same database at the same time. Even if
we did wipe the database for we tested, all the other tests running in
parallel would be considered to making the database "dirty". Thus, if a
pristine database is a requirement, only one test could run against the
database at the time.

We run our tests 4x parallel against the same database, matching the
cores available in the machine.

We also share the same database between developers and the test suite.
This "dirty" environment can work like a feature, as it can sometimes
produce unexpected and "interesting" states that were missed by a
clean-room testing approach that so carefully controlled the environment
that some real-world possibilities.

For example, perhaps a column allows null values, but the test suite
never tests that case because it "shouldn't happen". A developer might
manually create a value, which could expose a problem spot-- perhaps the
field should be "not null", or the app should handle that case gracefully.

A perfect clean-room approach would cover all these cases, but I don't
assume our tests are perfect.

Mark

Ovid

2013-05-03 20:34:35 UTC

Post by Mark Stosberg

Post by David Cantrell
OK, but you still have to clean out your database before you start each
independent chunk of your test suite, otherwise you start from an
unknown state.

1. Insert entity.
2. Test with entity just inserted.
Since all that my test cares about is the unique entity or entities, the
state of the rest of database doesn't matter. The state the matters is
in a "known state".

For many of the test suites I've worked on, the business rules are complex enough that this is a complete non-starter. I *must* have a database in a known-good state at the start of every test run.

is $customer_table->count, 2, "We should find the correct number of records";

Post by Mark Stosberg
We have a cron job that runs overnight to clean up anything that was
missed in Jenkin's runs.

No offense, but that scares me. If this strategy was so successful, why do you even need to clean anything up? You can accumulate cruft forever, right?

For example, I might want to randomize the order in which I run my tests (theoretically, the order in which you run separate test cases SHOULD NOT MATTER), but if I don't have a clean environment, I can't know if a passing test is accidentally relying on something a previous test case created. This often manifests when a test suite passes but an individual test program fails (and vice versa). That's a big no-no. (Note that I distinguish between a test case and a test: a test case might insert some data, test it, insert more data, test the altered data, and so on. There are no guarantees in that scenario if I have a dirty database of unknown state).

Post by Mark Stosberg
We expect our tests to generally work in the face of a "dirty" database.
If they don't, that's considered a flaw in the test.

Which implies that you might be unknowingly relying on something a previous test did, a problem I've repeatedly encountered in poorly designed test suites.

Post by Mark Stosberg
This is important
to run several tests against the same database at the same time. Even if
we did wipe the database for we tested, all the other tests running in
parallel would be considered to making the database "dirty". Thus, if a
pristine database is a requirement, only one test could run against the
database at the time.

There are multiple strategies people use to get around this limitation, but this is the first time I've ever heard of anyone suggesting that a dirty test database is desirable.

Post by Mark Stosberg
We run our tests 4x parallel against the same database, matching the
cores available in the machine.

Your tests run against a different test database per pid.

Or you run them against multiple remote databases with TAP::Harness::Remote or TAP::Harness::Remote::EC2.

Or you run them single-threaded in a single process instead of multiple processes.

Or maybe profiling exposes issues that weren't previously apparent.

Or you fall back on a truncating strategy instead of rebuilding (http://www.slideshare.net/Ovid/turbo-charged-test-suites-presentation). That's often a lot faster.

There are so many ways of attacking this problem which don't involve trying to debug an unknown, non-deterministic state.

Post by Mark Stosberg
We also share the same database between developers and the test suite.
This "dirty" environment can work like a feature, as it can sometimes
produce unexpected and "interesting" states that were missed by a
clean-room testing approach that so carefully controlled the environment
that some real-world possibilities.

I've been there in one of my first attempts at writing tests about a decade ago. I got very tired of testing that I successfully altered the state of the database only to find out that another developer was running the test suite at the same time and also altered the state of the database and both of us tried to figure out why our tests were randomly failing.

I'll be honest, I've been doing testing for a long, long time and this is the first time that I can recall anyone arguing for an approach like this. I'm not saying you're wrong, but you'll have to do a lot of work to convince people that starting out with an effectively random environment is a good way to test code.

Cheers,
Ovid
--
Twitter - http://twitter.com/OvidPerl/
Buy my book - http://bit.ly/beginning_perl
Buy my other book - http://www.oreilly.com/catalog/perlhks/
Live and work overseas - http://www.overseas-exile.com/

chromatic

2013-05-03 20:40:24 UTC

... you'll have to do a lot of work to convince people that starting out
with an effectively random environment is a good way to test code.

Before you dismiss the idea entirely, consider that our real live code running
for real live clients runs in an effectively random environment.

This reminds me of the (false) debate over using mock objects to enforce some
standard of unit testing purity. It's interesting if individual pieces of code
adhere to expected behavior in isolation, but when the code has to work, it's
not doing so in isolation.

Isolation of individual test cases is, of course, essential to
parallelization--but I'm trying to maximize the return on confidence for
testing investment in what's inherently a stochastic process, so pursuing
isolation between individual test units is rarely worth it, in my experience.
Writing the right tests which test the right things is.

-- c

Buddy Burden

2013-05-04 20:37:38 UTC

Ovid,

I lean more towards Mark's approach on this one, albeit with a slight twist.

Post by Ovid
For many of the test suites I've worked on, the business rules are

complex enough that this is a complete non-starter. I *must* have a
database in a known-good state at the start of every test run.

Post by Ovid
is $customer_table->count, 2, "We should find the correct number of

records";

I've just never written a test like that. I can't think that I've ever
needed to, really ... although I think I vaguely remember writing something
which used $^T to guarantee I was only pulling records that I'd added, so
perhaps that's close. But that was only once.

We have several databases, but unit tests definitely don't have their own.
Typically unit tests run either against the dev database, or the QA
database. Primarily, they run against whichever database the current
developer has their config pointed to. This has to be the case, since
sometimes we make modifications to the schema. If the unit tests all ran
against their own database, then my unit tests for my new feature involving
the schema change would necessarily fail. Or, contrariwise, if I make the
schema modification on the unit test database, then every other dev's unit
tests would fail. I suppose if we were using MySQL, it might be feasible
to create a new database on the fly for every unit test run. When you're
stuck with Oracle though ... not so much. :-/

So all our unit tests just connect to whatever database you're currently
pointed at, and they all create their own data, and they all roll it back
at the end. In fact, our common test module (which is based on Test::Most)
does the rollback for you. In fact in fact, it won't allow you to commit.
So there's never anything to clean up.

AFA leaving the data around for debugging purposes, we've never needed
that. The common test module exports a "DBdump" function that will dump
out whatever records you need. If you run into a problem with the data and
you need to see what the data is, you stick a DBdump in there. When you're
finished debugging, you either comment it out, or (better yet) just change
it from `diag DBdump` to `note DBdump` and that way you can get the dump
back any time just by adding -v to your prove.

AFAIK the only time anyone's ever asked me to make it possible for the data
to hang around afterwards was when the QA department was toying with the
idea of using the common test module to create test data for their manual
testing scenarios, but they eventually found another way around that.
Certainly no one's ever asked me to do so for a unit test. If they did,
there's a way to commit if you really really want to--I just don't tell
anyone what it is. ;->

Our data generation routines generate randomized data for things that have
to be unique (e.g. email addresses) using modules such as String::Random.
In the unlikely event that it gets a collision, it just retries a few
times. If a completely randomly generated string isn't unique after, say,
10 tries, you've probably got a bigger problem anyway. Once it's inserted,
we pull it back out again using whatever unique key we generated, so we
don't ever have a need to count records or anything like that. Perhaps
count the number of records _attached_ to a record we inserted previously
in the test, but that obviously isn't impacted by having extra data in the
table.

Unlike Mark, I won't say we _count_ on the random data being in the DB; we
just don't mind it. We only ever look at the data we just inserted. And,
since all unit test data is in a transaction (whether ours or someone
else's who happens to be running a unit test at the same time), the unit
tests can't conflict with each other, or with themselves (i.e. we do use
parallelization for all our unit tests). The only problems we ever see
with this approach are:

* The performance on the unit tests can be bad if lots and lots of things
are hitting the same tables at the same time.
* If the inserts or updates aren't judicious with their locking, some tests
can lock other tests out from accessing the table they want.

And the cool thing there is, both of those issues expose problems in the
implementation that need fixing anyway: scalability problems and potential
DB contention issues. So forcing people to fix those in order to make
their unit tests run smoothly is a net gain.

Anyways, just wanted to throw in yet another perspective.

-- Buddy

yary

2013-05-05 11:07:08 UTC

I've done some heavy DB work/testing and like your idea of simply
turning off autocommit, rolling back for all the database tests. It's
not what we did- we just truncated all the test tables to start from a
good state, and the only parallel testing we did were specifically
designed concurrency tests. Not saying that's what I'd do again, just
for a data point.

Anyway, while rolling back after DB unit tests seems like a good
solution in general, and is something I would try in the future, it
can't work in all situations- specifically when you want to test
things that rely on transactions. Part of our
designed-to-be-concurrent tests had to do with subroutines that
required a lock so changes to 2-3 tables would be either all done, or
all rolled back on failure. Another needed an exclusive-select-row
lock. Those needed to commit so we could both see the data succeed and
so we could see the concurrent threads continue (or report "could not
get lock" properly, etc)

-y

Post by Buddy Burden
Ovid,
I lean more towards Mark's approach on this one, albeit with a slight twist.

...

Ovid

2013-05-06 06:03:49 UTC

________________________________
Ovid,
I lean more towards Mark's approach on this one, albeit with a slight twist.

Given the very interesting discussion and the fact that I hate dogma (such as what I was throwing down), I have to say that I'm going to rethink my position on this. Thanks for all of the informative comments, everyone!

Cheers,
Ovid
--
Twitter - http://twitter.com/OvidPerl/
Buy my book - http://bit.ly/beginning_perl
Buy my other book - http://www.oreilly.com/catalog/perlhks/
Live and work overseas - http://www.overseas-exile.com/

David Cantrell

2013-05-07 14:17:17 UTC

Post by Buddy Burden
I lean more towards Mark's approach on this one, albeit with a slight twist.

Given the very interesting discussion and the fact that I hate dogma (such as what I was throwing down), I have to say that I'm going to rethink my position on this. Thanks for all of the informative comments, everyone!

I lean very much towards starting from a known state. I don't care what
that state is, just that it's known. This is because I see the tests I
write as being primarily a debugging tool, not merely a way of proving
that what I've done is correct - we all know that tests can't do that,
right? Debugging is easiest if you don't have to sift through loads of
irrelevant noise to find the problem.

It's a good point that the live environment is unknown, and potentially
full of broken old rubbish from previous versions of the software. That
is why we do two things:

* first, when a bug gets reported in live, I like to create a test case
from it, using data that at least replicates the structure of that
that is live. This will, necessarily, be an end-to-end test of the
whole application, from user interaction through all the layers that
make up the application down to the database, and back up through all
those layers again to produce broken output. As I debug and fix I may
also create unit tests for individual bits of the application.
* second, we don't just deploy from dev to live. We have a staging
environment in which real people hammer on the application. Ideally,
they'll record what they do and create repeatable tests completely
independently of the developers.

--
David Cantrell | semi-evolved ape-thing

You don't need to spam good porn

Buddy Burden

2013-05-07 18:05:52 UTC

David,

Post by David Cantrell
* first, when a bug gets reported in live, I like to create a test case
from it, using data that at least replicates the structure of that
that is live. This will, necessarily, be an end-to-end test of the
whole application, from user interaction through all the layers that
make up the application down to the database, and back up through all
those layers again to produce broken output. As I debug and fix I may
also create unit tests for individual bits of the application.

See, I would approach that differently. I would start with creating a test
for just the bit of data that I suspected was causing the problem. If that
worked (or didn't work, technically--if that reproduced the bug, I mean),
fine, I'm done. If that _doesn't_ work, though, I have to add more and
more bits of data specific to the user's exact situation ... which then
shows me how much that extra data is contributing to the problem vs how
much is irrelevant. By the time I'm done, I have exactly the data I need
to reproduce, which makes it _much_ easier to figure out what's wrong.

But then, on our team, we only produce unit tests. Integration tests are
produced by QA, and they might do it more as you suggest.

Of course, they don't get to start with an empty database either. It might
be _nice_ to start with a blank slate every time, I'm not sure--I'd worry
that I'd constantly miss scalability issues that are currently outed fairly
early with our current process--but the point is, I don't have that luxury,
whether I want to or not. ;->

(OTOH, I'll soon be moving to a new job where they don't even _have_ a QA
department, so I'll have to rethink the whole process. :-D )

-- Buddy

Lasse Makholm

2013-05-06 10:15:23 UTC

Post by Buddy Burden
We have several databases, but unit tests definitely don't have their
own. Typically unit tests run either against the dev database, or the QA
database. Primarily, they run against whichever database the current
developer has their config pointed to. This has to be the case, since
sometimes we make modifications to the schema. If the unit tests all ran
against their own database, then my unit tests for my new feature involving
the schema change would necessarily fail. Or, contrariwise, if I make the
schema modification on the unit test database, then every other dev's unit
tests would fail. I suppose if we were using MySQL, it might be feasible
to create a new database on the fly for every unit test run. When you're
stuck with Oracle though ... not so much. :-/

Interesting... Developers in our project have a local copy of the
production database for working with but our unit test runs always create a
database from scratch and run all schema migrations on it before running
the tests. Creating and migrating the unit test DB usually takes between 10
and 30 seconds so setup time is not really an issue... We're currently on
MySQL but will be migrating to Oracle in the near future. Could you
elaborate on why this approach might not be viable on Oracle?

As to why we do this - I guess it's mainly history... We've only recently
cleaned up our tests to not rely on each other so we're only now getting to
a point where we can start running them in random order - let alone in
parallel... I guess the upsides of starting from a clean database are
mainly matters of convenience; single-digit IDs are easier to read
ten-digits ones and debugging failures is easier on a table with 10 rows
instead of 10 million. The flip-side is of course, as previously mentioned
is that production code is expected to work in a "dirty" rather than
"clean" environment...

Your points about parallelization and using it to flush out
locking/contention issues are interesting and something that we haven't
really explored in our test setup but something we could certainly benefit
from... (Having had our fair share of those issues in the past...)

/L

Post by Buddy Burden
So all our unit tests just connect to whatever database you're currently
pointed at, and they all create their own data, and they all roll it back
at the end. In fact, our common test module (which is based on Test::Most)
does the rollback for you. In fact in fact, it won't allow you to commit.
So there's never anything to clean up.
AFA leaving the data around for debugging purposes, we've never needed
that. The common test module exports a "DBdump" function that will dump
out whatever records you need. If you run into a problem with the data and
you need to see what the data is, you stick a DBdump in there. When you're
finished debugging, you either comment it out, or (better yet) just change
it from `diag DBdump` to `note DBdump` and that way you can get the dump
back any time just by adding -v to your prove.
AFAIK the only time anyone's ever asked me to make it possible for the
data to hang around afterwards was when the QA department was toying with
the idea of using the common test module to create test data for their
manual testing scenarios, but they eventually found another way around
that. Certainly no one's ever asked me to do so for a unit test. If they
did, there's a way to commit if you really really want to--I just don't
tell anyone what it is. ;->
Our data generation routines generate randomized data for things that have
to be unique (e.g. email addresses) using modules such as String::Random.
In the unlikely event that it gets a collision, it just retries a few
times. If a completely randomly generated string isn't unique after, say,
10 tries, you've probably got a bigger problem anyway. Once it's inserted,
we pull it back out again using whatever unique key we generated, so we
don't ever have a need to count records or anything like that. Perhaps
count the number of records _attached_ to a record we inserted previously
in the test, but that obviously isn't impacted by having extra data in the
table.
Unlike Mark, I won't say we _count_ on the random data being in the DB; we
just don't mind it. We only ever look at the data we just inserted. And,
since all unit test data is in a transaction (whether ours or someone
else's who happens to be running a unit test at the same time), the unit
tests can't conflict with each other, or with themselves (i.e. we do use
parallelization for all our unit tests). The only problems we ever see
* The performance on the unit tests can be bad if lots and lots of things
are hitting the same tables at the same time.
* If the inserts or updates aren't judicious with their locking, some
tests can lock other tests out from accessing the table they want.
And the cool thing there is, both of those issues expose problems in the
implementation that need fixing anyway: scalability problems and potential
DB contention issues. So forcing people to fix those in order to make
their unit tests run smoothly is a net gain.
Anyways, just wanted to throw in yet another perspective.
-- Buddy

Buddy Burden

2013-05-06 20:37:31 UTC

Lasse,

Post by Lasse Makholm
Interesting... Developers in our project have a local copy of the

production database for working with but our unit test runs always create a
database from scratch and run all schema migrations on it before running
the tests. Creating and migrating the unit test DB usually takes between 10
and 30 seconds so setup time is not really an issue... We're currently on
MySQL but will be migrating to Oracle in the near future. Could you
elaborate on why this approach might not be viable on Oracle?

First of all: my condolences. :-D

Creating a "database" on MySQL is essentially the same as creating a
directory. Tables are files, so a collection of tables (a database) is
just a collection of files (i.e. a directory). However, in other RDBMSes
(such as Sybase or Informix--I lack enough experience with Postgres to
comment there) a database is a more complex creature, and takes a bit more
effort to create.

The situation in Oracle is much worse. What Oracle calls a "database" is
actually a server. That is, on Sybase or Informix (also I believe DB2
works this way), you can have several different databases on a single
server. Not so in Oracle. You need a whole new server for a new database,
which means a new port, and new entry in oranames.tns, etc etc etc. It's
quite a PITA. I've never seen a dev create a database in Oracle--it's
_always_ been something that the DBAs do. Maybe there's a way to do it,
particularly if you have Oracle's "personal" version, but I've just never
seen it done.

Other remarkably painful things to look forward to: Say goodbye to
autoincrement columns, and hello to "sequences." You know how you think
that NULL and the empty string are two different things? HA! No more temp
tables--Oracle has "global" (i.e. permanent) temp tables instead, which are
often useless for how you normally use temp tables. Probably there are
more things, but those are the most egregious that spring to mind ATM.

-- Buddy

Mark Stosberg

2013-05-29 14:00:48 UTC

Post by Mark Stosberg
We have a cron job that runs overnight to clean up anything that was
missed in Jenkin's runs.

No offense, but that scares me. If this strategy was so successful,
why do you even need to clean anything up? You can accumulate cruft
forever, right?

Ha. Like any database, smaller ones perform better.

Post by Mark Stosberg
We expect our tests to generally work in the face of a "dirty"
database. If they don't, that's considered a flaw in the test.

Which implies that you might be unknowingly relying on something a
previous test did, a problem I've repeatedly encountered in poorly
designed test suites.

I just ran across a ticket now where our design was helpful, so I
thought I would share it. The story goes like this:

I was tracking down a test that sometimes failed. I found that the test
was expecting a "no results" state on the page, but sometimes other test
data created a "some results" state. The test failed at this point
because there was actually a bug in a code. This bug was not found by
the site's users, our client, ou development time, or directly the
automated test itself, but only because the the environment it run it
had some randomness in it.

Often I do intentionally create isolation among my tests, and sometimes
we have cases like this, which add value for us.

Post by Ovid
Your tests run against a different test database per pid.
Or you run them against multiple remote databases with
TAP::Harness::Remote or TAP::Harness::Remote::EC2.
Or you run them single-threaded in a single process instead of
multiple processes.
Or maybe profiling exposes issues that weren't previously apparent.
Or you fall back on a truncating strategy instead of rebuilding
(http://www.slideshare.net/Ovid/turbo-charged-test-suites-presentation).
That's often a lot faster.
There are so many ways of attacking this problem which don't involve
trying to debug an unknown, non-deterministic state.

And if I'm running 4 tests files in parallel, would you expect that I'm
setting up and tearing down a database with 50 tables and significant
database for each and every test file, so they don't interfere with each
other?

That seems rather wasteful, when we already have easy to implementation
solutions to allow multiple tests to share the same database while still
achieving isolation between them when needed.

Post by Ovid
I'll be honest, I've been doing testing for a long, long time and this
is the first time that I can recall anyone arguing for an approach
like this. I'm not saying you're wrong, but you'll have to do a lot of
work to convince people that starting out with an effectively random
environment is a good way to test code.

I've been doing Perl testing for about 15 years myself. Many of our
tests have isolation designed in, because there's value in that.

There's also value against running some tests against ever-changing
datasets, which is more like the kind of data that actually happens in
production.

Mark

Mark Stosberg

2013-05-02 19:51:11 UTC

Post by brian d foy
In HARNESS_OPTIONS we can set -jN to note we want parallel tests
running, but how can a particular module, which might be buried in the
dependency chain, tell the harness it can't do that?
It seems to me that by the time the tests are running, it's too late
because they are already in parallel and the best we can do is issue a
warning or decide to fail.

I spent considerable time researching the topic of partially-parallel
test suites in Perl. Some of research is published here:

http://mark.stosberg.com/blog/2012/08/running-perl-tests-in-parallel-with-prove-with-some-exceptions-2.html
http://stackoverflow.com/questions/11977015/how-to-run-some-but-not-all-tests-in-a-perl-test-suite-in-parallel

In the end, the most efficient path forward for my particular case was
to look harder at the exceptional cases, and find ways to make them
parallel-safe.

Also, higher level automation like Jenkins or a wrapper script could be
used to first run all the tests that can be run in parallel, and then
run serial-only tests.

I have hope for better support within prove in the future, but
pragmatically for now I've find it easier to address through other parts
of the toolchain.

Mark

Ovid

2013-05-03 05:33:06 UTC

On a side note, this is one of the reasons I want to add tags to Test::Class::Moose (http://blogs.perl.org/users/ovid/2013/04/adding-tags-to-testclassmoose.html). With that, marking a test as "parallelizable" becomes a trivial problem. This is, in part, because Test::Class::Moose is evolving into a testing platform for those who need more reporting and fine-grained control over their test suite, something which I think has currently been lacking in the Perl toolchain.

Currently, marking things as "parallelizable" could already be done using the include and exclude features, but that's based on the method names and isn't really scalable.

Cheers,
Ovid
--
Twitter - http://twitter.com/OvidPerl/
Buy my book - http://bit.ly/beginning_perl
Buy my other book - http://www.oreilly.com/catalog/perlhks/
Live and work overseas - http://www.overseas-exile.com/

________________________________
Sent: Thursday, 2 May 2013, 21:39
Subject: How might we mark a test suite isn't parallalizable?
In HARNESS_OPTIONS we can set -jN to note we want parallel tests
running, but how can a particular module, which might be buried in the
dependency chain, tell the harness it can't do that?
It seems to me that by the time the tests are running, it's too late
because they are already in parallel and the best we can do is issue a
warning or decide to fail.

Olivier Mengué

2013-05-03 09:02:16 UTC

Post by brian d foy
In HARNESS_OPTIONS we can set -jN to note we want parallel tests
running, but how can a particular module, which might be buried in the
dependency chain, tell the harness it can't do that?
It seems to me that by the time the tests are running, it's too late
because they are already in parallel and the best we can do is issue a
warning or decide to fail.

Note that Test::Synchronized may be a solution to your problem.

For a more general solution a simple way to do it would be to let harness
know that some tests must not be run in parallel by their file path.
Something like storing those tests in a particular place such as
"t/non-parallel/" that harness would exclude from parralel testing and
would run sequentially after all the parallel tests.

A related question is: do we want the tests to be explicitely aware at
runtime that some others tests may be running in parallel (or in the
opposite, that they are running in an exclusive mode)? That would give more
information to Test::Is/Test::DescribeMe, but I don't think that would be a
good idea. Non-parallel tests must stay the exception.

Olivier.

Leon Timmermans

2013-05-03 13:35:07 UTC

Post by brian d foy
In HARNESS_OPTIONS we can set -jN to note we want parallel tests
running, but how can a particular module, which might be buried in the
dependency chain, tell the harness it can't do that?
It seems to me that by the time the tests are running, it's too late
because they are already in parallel and the best we can do is issue a
warning or decide to fail.

TAP::Harness already allows a module author to indicate which tests
can be run together and which not, using its rules feature. Not that
this feature is easily used from current build tools…

Leon

20 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

brian d foy 2013-05-02 19:39:13 UTC

Karen Etheridge 2013-05-02 19:51:06 UTC

Mark Stosberg 2013-05-02 20:01:23 UTC

David Cantrell 2013-05-03 12:17:12 UTC

Mark Stosberg 2013-05-03 15:03:28 UTC

David Cantrell 2013-05-03 15:34:44 UTC

Mark Stosberg 2013-05-03 15:48:19 UTC

Ovid 2013-05-03 20:34:35 UTC

chromatic 2013-05-03 20:40:24 UTC

Buddy Burden 2013-05-04 20:37:38 UTC

yary 2013-05-05 11:07:08 UTC

Ovid 2013-05-06 06:03:49 UTC

David Cantrell 2013-05-07 14:17:17 UTC

Buddy Burden 2013-05-07 18:05:52 UTC

Lasse Makholm 2013-05-06 10:15:23 UTC

Buddy Burden 2013-05-06 20:37:31 UTC

Mark Stosberg 2013-05-29 14:00:48 UTC

Mark Stosberg 2013-05-02 19:51:11 UTC

Ovid 2013-05-03 05:33:06 UTC

Olivier Mengué 2013-05-03 09:02:16 UTC

Leon Timmermans 2013-05-03 13:35:07 UTC

about - legalese

Loading...