2018-11-24 19:33 — By Erik van Eykelen
Test data should be generated by code instead of relying on copies of production data.
As a developer and tester it is important to work with a representative dataset in the apps you’re working on. No dataset, or a very small dataset, makes it hard to test different scenarios and corner-cases. A very large (production) database may slow things down as its content is constantly changing, making it harder to quickly navigate to the data or screens you need to test.
Working with production data should be avoided because:
A script which generates test data should:
It should be easy to populate development, staging, or review app databases with generated data.
In the following example 31 tables are populated with 4887 random as well as predictable records:
~/projects/some-rails-app>rails db:seed
Added 4 funds
Added 63 users
Added 11 companies
Added 11 company users
Added 21 ventures
Added 42 venture users
Added 84 educations
Added 126 experiences
Added 294 expertises
Added 168 achievements
Added 21 locations
Added 8 rounds
Added 1 deal types
Added 84 deals
Added 252 deal events
Added 441 watchlists
Added 441 early accesses
Added 63 participations
Added 252 documents
Added 189 assigned documents
Added 2 assets
Added 20 news items
Added 21 decks
Added 483 deck parts
Added 63 indicators
Added 588 indicator points
Added 84 deck bookmarks
Added 84 deck attachments
Added 63 messages
Added 21 venture updates
Added 882 notifications
It should be easy to edit existing or add new test data by using a “fake data” generator:
[User::USER_TYPE_INVESTOR, User::USER_TYPE_FOUNDER, User::USER_TYPE_ADMIN].each_with_index do |user_type, idx1|
1.upto 20 do |idx2|
first_name = Mockdata::People.first_name
last_name = Mockdata::People.last_name
User.create!(
user_type: user_type,
email_address: "#{first_name}.#{last_name}-#{idx1}-#{idx2}@example.com".downcase,
phone_number: "+31 646 000 000",
first_name: first_name,
last_name: last_name,
password: "11223344",
gender: [User::GENDER_MALE, User::GENDER_FEMALE, User::GENDER_OTHER].sample
)
end
end
See https://github.com/evaneykelen/mockdata for an example of a Ruby library which provides fake names of people, companies, and projects. There are similar libraries for C# and other languages.
Running your test data script should generate actual database records:
> ap User.first
User Load (0.9ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1]]
#<User:0x00007fd7b5b648d0> {
:id => "0287ee7e-cae1-48d7-b093-64fd7840df46",
:user_type => "investor",
:gender => "male",
:email_address => "investor@example.com",
:phone_number => "+31646000000",
:prefix => nil,
:first_name => "Paul",
:infix => nil,
:last_name => "Graham",
:postfix => nil,
:website_url => nil,
:blog_url => nil,
:twitter_url => nil,
:facebook_url => nil,
:instagram_url => nil,
:linkedin_url => nil,
:password_digest => "$2a$10$nIWqjnHVTNaVuHixwiLCcOsA/GG54jkgW2TfBsC3wCaOfQdd5C3JW",
:timezone => "Amsterdam",
:signatory => false,
:asset_id => nil,
:last_viewed_dashboard_at => nil,
:last_invited_at => Fri, 02 Nov 2018 12:00:35 UTC +00:00,
:svg_path_paraph => nil,
:svg_path_signature => nil,
:created_at => Fri, 02 Nov 2018 12:00:20 UTC +00:00,
:updated_at => Fri, 02 Nov 2018 12:00:35 UTC +00:00,
:brand => "uplane"
}
These code/output snippets above are merely an example. Test data can be generated in any programming language and its UI can be a CLI or GUI.
Tip: ensure the test data script cannot run on production databases by including a circuit breaker which detects e.g. an environment variable or some other signal only present in production environments.
Tip: although generated test data is mostly random, it is wise to limit the randomness to achieve a level of predictability. For instance it’s good to pick a random city name from a set of just 10 pre-defined names so that testers always know which names they can input in search fields.