Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polyvore to Monomere like data in convert_polyvore.py #1

Open
demonking2 opened this issue Jun 6, 2018 · 1 comment
Open

Polyvore to Monomere like data in convert_polyvore.py #1

demonking2 opened this issue Jun 6, 2018 · 1 comment

Comments

@demonking2
Copy link

Can you please share the structure of data directories used in this script as polyvore website is down and I am having a hard time to understand how you generated negatives from the data and what is asin in polyvore data?

@shaform
Copy link

shaform commented Jun 7, 2018

Hi demonking2,

The product id on polyvore site is used as ASINs.

Suppose Polyvore website is still alive, the crawler could be used to crawl the data. Specifically:

  1. polyvore_outfit_from_user and polyvore_outfit can be used to grab the urls of outfits as OUTFIT.jl.
  2. Using OUTFIT.jl as input, polyvore_outfit_set grabs information for each outfit as well as urls of the items in the outfits as OUTFIT_SET.jl.
  3. Manually extract the item urls from OUTFIT_SET.jl and use the urls as input to polyvore_item would grab all item images in a directory as well as item information ITEM.jl.
  4. Run python -m cfl.scripts.preprocess_polyvore --items-store ITEMS_S3_STORE_PATH --image-dir IMAGES_DIR --output-dir data/polyvore, where IMAGES_DIR is the image output of polyvore_item and ITEMS_S3_STORE_PATH stores the crawled items on S3.
    ITEMS_S3_STORE_PATH has the following structure:
    ITEMS_S3_STORE_PATH/polyvore_item/: stores ITEM.jl
    ITEMS_S3_STORE_PATH/polyvore_outfit_set/: stores OUTFIT_SET.jl
  5. preprocess_polyvore would produce the data structure:
    data/polyvore/images: stores processed images
    data/polyvore/meta.txt: stores the product_id as well as categories of items.
    data/polyvore/cate_[n].txt: stores the product_ids for each category [n].
    data/polyvore/outfits.txt: stores the fav_count and items of each outfit.
  6. Run python -m cfl.keras.extract_v3 --input-dir data/polyvore/images --output-dir data/polyvore/latents would produce latent vectors for each item image.
  7. Run ./experiments/polyvore/convert_polyvore.sh, which produces the following data structure:
    parsed_data/polyvore_random/top_to_other/train/meta.txt: item ids as well as categories
    parsed_data/polyvore_random/top_to_other/train/pairs_pos.txt: positive pairs
    parsed_data/polyvore_random/top_to_other/train/pairs_neg.txt: negative pairs
    parsed_data/polyvore_random/top_to_other/train/pairs_all.txt: all pairs
    parsed_data/polyvore_random/top_to_other/train/features.b: features of items
    parsed_data/polyvore_random/top_to_other/train/source.txt: item ids from source categories
    parsed_data/polyvore_random/top_to_other/train/target.txt: item ids from target categories
    parsed_data/polyvore_random/top_to_other/val: the same structure as top_to_other/train
    parsed_data/polyvore_random/top_to_other/test: the same structure as top_to_other/train
    parsed_data/polyvore_random/bottom_to_other: the same structure as top_to_other
    parsed_data/polyvore_random/shoe_to_other: the same structure as top_to_other

Negatives are generated by random sampling items from different categories as described in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants