Skip to content

jiananlu/faked_csv

Repository files navigation

Build Status

Faking CSV data made easy

faked_csv, using the awesome Faker gem, helps you to generate CSV file with fake random data in your specified way. You may find it particularly useful in testing something out.

Installation

gem install faked_csv

Test your installation by running the following command in your terminal/console:

faked_csv --version

You can also install the gem using bundler.

The gem is developed and tested under Ruby 2.1.

Usage

For each faked CSV file, you first create a <name>.csv.json file that contains the configuration of how you want to generate the faked data, as hinted by the name, in json format.

Example

A simple example .csv.json file looks like the following. Noted, JSON file doesn't support inline comment, so the below example is only for illustration. You can use the example.csv.json file in the root folder of the source code as a boilerplate. More details on the configuration file are in later section.

{
  "rows": 200, <- how many rows in the generated CSV
  "fields": [  <- a list of fields/columns in the generated CSV
    {
      "name": "ID",        <- field/column name
      "type": "rand:char", <- type/format of the field
      "length": 5          <- 5 characters random word
    },
    {
      "name": "First Name",
      "type": "fixed",     <- values will only come from the below list
      "values": ["Peter", "Tom", "Jane", "Tony", "Steve", "John"]
    },
    {
      "name": "Last Name",
      "type": "faker:name:last_name", <- via Faker::Name.last_name method
      "inject": ["Lu", "Smith"]       <- "must-have" values
    },
    {
      "name": "City",
      "type": "faker:address:city", <- via Faker::Address.city method
      "rotate": "rows/5"            <- rows/5 (=>200/5 =>40) unique values
                                       (and only these values) will appear
    },
    {
      "name": "State",
      "type": "faker:address:state_abbr",
      "rotate": 10,    <- both rotate and inject? will make sure 10 unique 
                          values (including "CA") appear
      "inject": ["CA"]
    },
    {
      "name": "Age",
      "type": "rand:int",
      "range": [10, 80] <- range (including boundaries) for random integers
    },
    {
      "name": "Height",
      "type": "rand:float",
      "range": [150, 190],  <- range for random floats
      "inject": [200, 210], <- "must-have" values. can be any format
      "rotate": 20,         <- 20 unique values (including injects)
      "precision": 2        <- precision of the float numbers
    }
  ]
}

With the <name>.csv.json file, now you can generate a random CSV by invoking the following command:

faked_csv -i example.csv.json -o example.csv

The output CSV looks something like:

ID,First Name,Last Name,City,State,Age,Height
iW00K,Jane,Bosco,Lake Dandreland,ME,61,170.57
BzxTl,Steve,Smith,Uniqueside,PA,35,152.93
vobj8,Tony,Auer,Tressiestad,MS,63,160.86
X78RS,Steve,Cole,Port Zellachester,OH,72,159.84
0YpWG,John,Lu,East Vernaview,OH,18,200
4JaoZ,Peter,Simonis,Wernerchester,HI,25,161.64
hP48C,Tom,Lu,Uniqueside,ME,45,161.64
WDQpb,John,Casper,East Felicityshire,OH,41,170.57
wyPwB,Jane,Johns,Maggiehaven,CA,16,180.42
VCtYR,Peter,Jast,Schadenberg,ME,41,161.64
1VYOs,John,Daniel,Port Zellachester,HI,34,200
V7pg9,Tom,Mayert,Schadenberg,OH,71,152.93
SmE9w,Jane,Lu,Stephanchester,HI,25,176.95
ALyok,Tom,Smith,Ryanchester,PA,70,176.95
fgE5v,Peter,Bailey,Bednarstad,PA,67,170.24
eBIIM,Peter,Haley,East Vernaview,MS,65,161.64
5sv4L,Peter,Prosacco,Uniqueside,PA,50,178.25
a48IS,Peter,Marvin,Kossview,OH,63,200
9mLcZ,John,VonRueden,East Vernaview,PA,20,174.24
N8ysZ,Tony,Barrows,Wernerchester,HI,62,159.84
PxsfA,John,Lind,East Vernaview,OH,21,182.45
lKM33,Steve,Bosco,South Reese,OH,75,176.95
HR9H5,Jane,Torp,Tressiestad,PA,23,170.57
JTRCw,Steve,Hermann,Kunzefort,MS,43,152.93
Wndzb,Tony,McGlynn,Lake Dandreland,CA,52,160.86
ksmvF,Peter,Rutherford,Josiannetown,PA,12,184.08
cfsBG,Peter,Lebsack,Port Zellachester,OH,43,182.45
7ISEJ,Steve,Altenwerth,Stephanchester,HI,13,159.84
Letb8,John,Frami,South Reese,HI,65,170.57
......

CLI Options

-i to specify the input json configuration file. If omitted, the program will try to find ./faked.csv.json by default. You can also specify a remote URL of the configuration file.

faked_csv -i http://goo.gl/xDtkJs -o remote.csv

-o to specify the output CSV file path. If omitted, the program will print the data to stdout.

More details on configuration file

rows to specify how many rows you want to generate in the output CSV file.

fields to specify the list of all fields of the generated CSV file.

{
    "rows": 1000,
    "fields": [
        ...
    ]
}

For each field, two attributes are required:

name to specify the column display name.

type to sepcify the format of the data in this column.

Optional attributes help finer control the generated output.

inject to specify a list of values that must appear in this column. this is useful when you want to specifically inject some special values into your final CSV. The inject values can be in the same or different format as the type of this column.

rotate to specify the number of unique values that will be used in this column. This number will include the number of items you put into inject list.

"type": "faker:address:state_abbr",
"inject": ["CA", "HI"],
"rotate": 10

In the above example, totally 10 states will be considered in the generation process. 2 of them are given by you, i.e. CA and HI; 8 of them will be faked by calling Faker::Address.state_abbr() method. If you specify rows to be greater than 10, some values will be repeated, meaning only 10 unique values will appear and these 10 values are ensured to appear at least once. If you specify rows less than 10, only rows number of unique values will appear; and your inject values will have higher priority, meaning if rows is 2, then only CA and HI will appear in the final output.

Supported types

rand:int and rand:float

Random integers/floats. The following attributes are available:

range to specify the minimum and maximum value that will be generated.

"range": [10, 100] <- the min=10 and max=100. 10 and 100 are included.

If it's rand:float, you can specify precision attribute to indicate the digit(s) you want to keep for each of the data.

rand:char

Random characters. Only A-Z, a-z and 0-9 will be considered as characters.

length to specify how many characters in each of the generated data, e.g. 15 will give you something like 9mLcZHR9H5V7pg9. This is particularly useful if you want to mimic something like item ID, access token etc.

format to sepcify the format of the randomly generated strings. Any character in the format string will be print to the generated string unless it's prefixed with a '/' which will generate a corresponding character based on the format given after the '/' sign.

/w any letter between 'a' to 'z' /W any letter between 'A' to 'Z' /D any letter between 'A' to 'Z' or 'a' to 'z' /d any digit between 0 to 9 /@ alphanumeric, A-Z, a-z, or 0-9

Example:

001/W/W/d/@ might generate something like: 001HR9a

faker:[class]:[method]

This indicates you want to use one of the methods from Faker gem. You can find the list of all methods from Faker gem's documentation. Each method is a class name and a method name. Example, Faker::Internet.url can be mapped to faker:internet:url as a field type in the configuration. As current version, we don't support parameters into the faker method. Future version will add such support.

fixed

This indicates you only want to use the provided values in the output. Use values as a list to provide the values you want to use. They will be randomly picked independently each time; so no guarantee that every value in the list will be used. If you use fixed type, inject and rotate will be ignored.

inc:int

This indicates you want to generate inremental integers as the values for this column. Optional parameters are start to indicate the start count; step to indicate the step of each incrementation.

Example:

...
{
  "name": "ID",
  "type": "inc:int",
  "start": 3,
  "step": 2
}
...

will generate something like: [3, 5, 7, 9, 11, ...]

Contributing

  1. Fork it ( https://github.com/jiananlu/faked_csv/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

About

Generate fake CSV data in the specified way.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages