GridMarkets PDG Support

GridMarkets PDG Support

Overview

Support for TOPs/PDG on the GridMarkets cloud is still under active development, but we decided to release this experimental build of the toolset as we are comfortable with the stability and pipeline efficiency of the build.  As a first release, there are several caveats which we would be remiss in not mentioning here.
  1. Though the GridMarkets PDG ROP node is more than capable of submitting dependent jobs on its own (similar to the Cloud Cache workflow), we have found that the PDG Work Item skipping behavior for when work item output files already exist is not reliable enough to be used in a production environment where costs are time based. 
  2. The Cloud Prepare TOP was created to manage the data flow and cut off ancestor TOP nodes which have already been cached by previous jobs, forcing the desired behavior and avoiding PDG mistakenly recooking upstream work items. 
    1. Be aware that if there is an issue with some work items in the upstream cache that do not cause a failure state, but do cause problems with downstream work items that would normally be caught as part of the cooking process, they may not be caught.
  3. While it is not reliable, the Work Item skipping behavior does work on GridMarkets cloud, and has been shown to be a double edged sword in testing.
    1. It can save time and money by not needing to recook frames that have been cached out already from jobs which have failed or been terminated due to lack of credits.  This may even pick up simulations, though that is untested.
    2. It can cause job runtime/cost estimates to be skewed.  If you are looking to estimate costs and/or cook times of a job when it is run on different machines, be sure to delete the outputs through Envoy before submitting the next test or submit it under a unique Job Name set on the Render Submit node or in the lower left of the Preflight window.
  4. When the jobs are cooked on the farm, the work items from the network are partitioned appropriately and each machine runs one partition.  This can lead to jobs appearing to take much longer when a full production run is submitted vs. the tests as each task on the job in Envoy consists of an evenly distributed number of work items. 
    1. For example, if a test is run with 100 work items on Economy (5 machines), each machine will be cooking 20 work items.  If the production run is 1000 work items being run on Standard (50 machines), each machine will be still be cooking 20 work items and each task will finish in roughly the same amount of time.
    2. As an alternative, if the same test was done, but the production was run on Economy (5 machines) as well, then each task would be cooking 200 work items, and proportionally take ~10x as long.
    3. Information regarding what Work Items are being run on a given task can be found in the logs for the task.
  5. The toolset has been designed with Static Work Items in mind.  If you are utilizing dynamic work items that add or remove at cook time instead of at generate, there will most likely be instances where duplication of computations or inaccurate counts of work items will occur.  Do so at your own risk.
  6. If you need information out of the TOP network while it is cooking, print() functions in the Cook tab of a Python Processor will write to the logs visible in Envoy.
    1. Be aware that overutilization of this will cause the logs visible in Envoy to be truncated.  The GridMarkets team will still be able to access the full logs, but it may cause debugging issues for users.
  7. This is mentioned in many places in the documentation and in warnings on the nodes themselves, but I will state here for emphasis as well.  When submitting a PDG job, be sure to set the Budget to something reasonable for the job considering it may take hours to complete many work items, and set the Budget Action to Terminate.
    1. These settings are required for any job to be eligible for refunds if overages occur.
    2. There is currently no way to change the default budget behavior, so the settings must be changed for every PDG job, for every submission.
    3. This requirement is in place to protect both the users and GridMarkets as PDG can explode into unexpectedly enormous compute times, which can lead to some very heavy costs for all parties.  We are proud to be able to offer this service and want to be sure that it remains cost effective for all parties so we can maintain support long into the future.

GridMarkets PDG ROP Node

Submits TOPS/PDG jobs to the GridMarkets platform

Usage

This tool groups work items and executes batches on each processing machine.  It is appropriate for wedges, chunking renders, and for variations on simulations.  In the case of simulations, each work item should account for a single variation so each machine is only working on one simulation.

To use setup a job, target the TOP node you would like to process on the GridMarkets platform.  Select your service level setting which matches the setting on your account. The actual number of machines which will be used will be calculated, but will be capped by the setting for the service level.  Finally, select the type of job which is being run using the submission type drop down.

When submitting a job, set a Job Budget in the Jobs tab of the Preflight with the command set to Terminate.  As this is an experimental tool, it might run into issues and this setting will prevent it from overspending on your account.  GridMarkets is not responsible for overspending if the job was submitted without a Terminate budget limit set.

ROP Input (optional):
        The job which this node creates will be processed as dependent to any input nodes.  If the input is another GridMarkets PDG ROP, be sure the input is targeting a GridMarkets Cloud Prepare TOP.

Downstream Output:
        Nodes wired into this output will create jobs dependent on the outputs from the TOPs network.
        

Parameters

Service Level:
        Determines the maximum number of machines used to create the batches.  The final number of tasks will vary based on the number of work items and how they are able to be spread over machines.
Service Level
Machine Count
Economy 
5
Standard 
40
Rush
100
Custom 
Set a custom number of nodes you would like to use.
This is still capped at the current service level setting for your account.

TOP Path:
        Set the path to the node you would like this job to process.
If you are submitting multiple jobs from the same TOP network, please use GridMarkets Cloud Prepare TOPs as the targets for your GridMarkets PDG nodes.
These nodes are designed to work together to prevent duplication of computation and overspending.

Generate Work Items:
        Generates the work items for the target TOP node without cooking the network, unless upstream nodes need to be cooked to generate the work items for the target node.
        This will also update the information text below the Service Level parameters to let you know how many machines will actually be running and how many work items each machine will be processing.
This uses the `hou.Node.GenerateStaticWorkItems()` method from the `hou.Node.TopNode` object.  If problems arise in your pipeline, please reference the documentation for this method.

Submission Type:
        Set the appropriate submission type here.  This is used by GridMarkets to make sure that the job is given the correct machine configuration for it to run.

Submission Type
Use Case
Geometry 
Used for any geometry caching.  This can be simulations or time independent caching.
Be sure that simulations have the correct Frame Range setting and are have the "Cook Frames as Single Work Item" ticked before submitting.
Mantra 
Used for rendering from a Mantra TOP.
Karma 
Used for rendering from a Karma TOP.
XPU machines are currently not available for Karma rendering.  Select the Requires GPU Machine option with Karma selected to run on a GPU machine. This machine may not feature the needed CPU power so times may be longer than expected for certain renders.

Redshift 
Used for rendering from a Redshift TOP.
           
Requires GPU Machine:
        This setting is available for Geometry or Karma jobs and will swap the machine type selection from CPU machines to GPU machines on submission.


Cloud Prepare TOP

Controls network flow while running in the cloud to prevent duplication of computation while cooking on the GridMarkets Cloud.

USAGE 

This tool works in concert with the GridMarkets PDG ROP node to manage data flow through the PDG network while it is cooking on the farm. This is specifically designed to prevent ancestor nodes in the network from cooking when dependent jobs are processed.
This is an experimental toolset. This node is not needed for submitting PDG jobs to GridMarkets, but errors which cause the recooks this is designed to prevent will be considered User Error. The only jobs which will be refunded are ones which utilize this node for all but the last PDG submissions that also have a Terminate budget set.

PARAMETERS 

Create ROP
            Change the target ROP Path and the node name if desired, then hit the Create ROP button to generate a ROP with all of the parameters preset to target this node.
Job Settings
            Determines what type of job is being submitted. This should be set to the caching type of the input node or fit the requirements for the ancestor nodes to cook.
Requires GPU
            Tells GridMarkets to process the job associated with this node using a GPU machine.