NVIDIA Model Card

Last updated: July 2024

Model Details

  • Model name: Generative AI by Getty Images
  • Model release date: July, 2024
  • Model version: Getty Images, Edify Image v3.0
  • Model summary: Generative AI by Getty Images is a commercially safe service built on a responsibly trained and clean foundational model. Key elements of this commercial safety are:
    • The model was only trained on high resolution, licensed or owned images and metadata from Getty Images' vast creative library.
      • All training data is owned or licensed.
      • Getty Images maintains model and property releases for images depicting persons and certain places (as necessary) included in the training set.
      • The model was not trained off any data/images scraped from the internet, generated synthetically, or from outputs from other generator.
    • The generator has been trained to not produce visuals that violate intellectual property or artist rights, including images of identifiable people, protected locations, trademarks or brands.
    • Getty Images blocks both prompts and generations in an effort to avoid visuals being generated that would create legal risks or be considered offensive.
    • This model is safe for commercial use. Safe for commercial use means that because the model was only trained with permissioned content, you may use the outputs for commercial purposes. Accordingly, Getty Images represents and warrants that necessary model and property releases have been obtained to avoid infringement of third-party intellectual property rights.
    • Legal indemnification is included for all generations, without requiring that assets are reviewed and cleared by Getty Images. Different monetary levels of indemnification are offered based on the package purchased.
    • Additionally, the model strives to promote people diversity and representation through the diversity inherent in the training dataset, as well as custom model design.
    • The model is a custom architecture. It supports images up to a 4K resolution using super-resolution techniques.

Terms of Use

The intended use of the model is for commercially safe, photorealistic image generation for creation & ideation. Users of the model are expected to act responsibly and are subject to the terms and conditions expressed in the Getty Images Site Terms of Use, Getty Images Content License Agreement and the applicable AI Image Generation Subscription Agreement which prohibit illegal and certain other uses.

Third-Party Community Consideration

This model has been developed and built to Getty Images requirements for this application and use case; more information: https://www.gettyimages.com/ai/generation/about.

  • References: This model is based on large-scale text-to-image diffusion models.
    • [1] Balaji, Y., Nah, S., Huang, X., Vahdat, A., Song, J., Kreis, K., Aittala, M., Aila, T., Laine, S., Catanzaro, B. and Karras, T., 2022. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324.
    • [2] Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T. and Ho, J., 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35, pp.36479-36494.
    • [3] Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. and Chen, M., 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), p.3.

Training Details

  • Model Architecture: Diffusion
    • Architecture Type: Convolution Neural Network (CNN)
    • Network Architecture: Unet-Based

Inputs

  • Input Type(s): Text, Image
  • Input Format(s): Text: Raw Text, Image: JPG
  • Input Parameter(s): One Dimensional (1D)
  • Other Properties Related to Input: Max 250 words

Outputs

  • Output Type(s): Image
  • Output Format: Red, Green, Blue (RGB)
  • Output Parameter(s): Two-Dimensional (2D)
  • Other Properties Related to Output: Output Sizes (Configurable)- 1024x1024, 1024x768, 1024x576, 576x1024, 768x1024 for 1K resolution; 4096x4096, 4096x3072, 4096x2304, 2304x4096, 3072x4096 for 4K resolution.

Software Integration

  • Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere
  • Preferred/Supported Operating System(s): Linux

Training and Evaluation Datasets and Performance

  • Dataset: Licensed or owned high-resolution photography, illustrations, and still images from Getty Images vast Creative Asset library, paired with detailed visual descriptions per asset. Descriptions and metadata attributes curated and crafted by Getty Images photographers and professional content editors are utilized. You can review this collection and metadata at gettyimages.com and istock.com.
  • Creator Compensation: Getty Images compensates contributors in an ongoing basis. This includes where contributors’ content is used as training data for AI. On an annual recurring basis, we will share in the revenues generated from the Generative AI by Getty Images with contributors whose content was used to train the AI Generator, allocating both a pro rata share in respect of every file and allocating a share based on traditional licensing revenue.
  • Quality: It especially excels at content that is commercially viable, photorealistic people, and compelling creative concepts.
  • Performance: The model achieves an average of 9 seconds to generate 4 images.

Inference

  • Engine: Tensor (RT), Triton
  • Test Hardware: NVIDIA A100

Limitations

  1. People and object deformations: While the model addresses common issues in generative models, such as malformed limbs, hands, and disproportionate object sizes through careful design choices and custom loss functions, it can still occasionally produce images with malformed or disfigured human parts or objects.
  2. Offensive: The model might create unrealistic and potentially offensive representations of humans by merging independent features learned during training. We attempt to block many of these instances through prompt blocking and output blocking.
  3. Bias: While the model implements measures to generate more diverse representations of humans, the training dataset has some imbalances in the distribution of human attributes like gender and ethnicity in relation to occupational roles that can be biased towards such attributes. Our custom prompting and custom model design aims to combat these biases, but they may still occasionally arise.
  4. Not safe for work: The model is supplemented by a language model which analyzes and filters text prompts, and an image filter that screens for inappropriate outputs. However, both these models can mistakenly filter “safe” prompts and images and may fail to filter unsafe prompts or images. This can arise from expertly designed adversarial input prompts or inherent limitations within the models.
  5. Contemporary: The training data covers up to October 2023 and only includes descriptions in English.
  6. Text: The model does not perform well at generating text in outputs.
  7. Fantastical and cinematic illustrations: The model’s strength is photorealism. As such, it does not perform well on with fantastical or cinematic illustrative styles.

Contact

Please send model questions and comments to api@gettyimages.com or https://www.nvidia.com/en-us/support/submit-security-vulnerability/