
Pick Our Brain
Open Source Contributions + Knowledge Sharing = Better World
-
Test

-
Science
Science: Like magic, but real! Today, we’re celebrating National Science Appreciation Day by geeking about the everyday wonders around us. From the code that powers your apps to the physics that makes 3D animation possible, we live in a world where ‘impossible’ things happen every day. And the best part? We can explain how!

-
IAM Auth for Django Database: passwordless, not painless
TL;DR:
Adding IAM Auth requires increasing the RDS server at least 4x the cost for a server with password auth, and likely much more for production. This makes it non-viable for our immediate use-case with a relatively low-stress app.
Goal: no database passwords in code/configs
We’re running a Wagtail CMS site (built on Django) and we don’t want to use passwords to authenticate Wagtail/Django to our database or other AWS services since they present a risk if discovered and are hard to manage outside our committed code repo.
Typical access to PostgreSQL databases use credentials including host, database, username, and password, and Django settings include these. We do not want to store creds in code, so we want to leverage AWS IAM Roles to permit connections — a cloud-native mechanism.
We are using SES for Django email with AWS IAM authentication, obviating the need for passwords to authenticate to their SMTP server. We want to do the same for RDS access to our PostgreSQL database.
This gets pretty nerdy, but hope it helps other user IAM for Django RDS auth, and avoid a little-publicized problem in the current AWS implementation.
IAM Auth, EC2 Role, Django DB Wrappers
I first tried the approach from https://stackoverflow.com/a/57923227/4880852 with the 10-line wrapper in your.package.postgresql/base.py and it worked great — for 15 minutes, after which the temporary token it got expired. Oops.
Below we describe how we used the package mentioned in that post, with IAM and an EC2 Role. In the examples below, our top-level database is default “postgres” with user “rdsiamuser”, and our Django database is “rdsiam”. We use CloudFormation for our Infrastructure-as-Code to create all our resources.
Enable IAM Auth in CloudFormation
Enable IAM Auth in the CloudFormation definition of the database:
SQLDatabase:
Type: ‘AWS::RDS::DBInstance’
Properties:
Engine: postgres
DBName: !Ref DBName
MultiAZ: !Ref MultiAZDatabase
MasterUsername: !Ref DBUser
MasterUserPassword: !Ref DBPassword
EnableIAMDatabaseAuthentication: true
DBInstanceClass: !Ref DBInstanceClass
AllocatedStorage: !Ref DBAllocatedStorage
We have to add an IAM Role to our EC2 instance so it can use IAM Auth to talk to RDS; in CloudFormation, like this:
RolePolicies:
Type: AWS::IAM::Policy
Properties:
PolicyName: root
PolicyDocument:
Version: ‘2012-10-17’
Statement:
– Effect: Allow
Action: rds-db:connect
Resource: !Sub “arn:aws:rds-db:${AWS::Region}:${AWS::AccountId}:dbuser:*/${DBUser}”
The “dbuser” is a literal AWS term, the ${DBUser} is the same as our Django setting, or top-level RDS user. We have to use the wildcard “*” for the database because CloudFormation gives us no way to determine the RDS DbResourceId! 🙁
Set IAM auth for RDS user
After CloudFormation creates the EC2 and RDS, we can connect to the DB with the initial password we gave it then give the initial DBUser the rds_iam permission to authenticate. Since my RDS was inside a private VPC, I found it easiest to launch a PostgreSQL docker container on the EC2 and run the command there:
ec2# docker run -it postgres:alpine bash
docker# psql -h MyRdsDbHost.us-east-1.rds.amazonaws.com -U rdsiamuser postgres
postgres=> GRANT rds_iam TO rdsiamuser;
postgres=> du;
Role name | Attributes | Member of
—————–+———————————+————————-
rdsiamuser | Create role, Create DB +| {rds_superuser,rds_iam}
| Password valid until infinity |
Now you can see the user has the rds_iam role, so it can use IAM to auth. Warning: this will prevent that user from logging in with normal password credentials! I didn’t see this mentioned in the AWS docs.
Of course the goal is to not have passwords in code, and above we show our CloudFormation which likely has the password committed to the repo. But after GRANTing rds_iam, the password no longer works, so this security bug turns into a security feature. I see no way of setting this GRANT at RDS creation time or any other mechanism to allow IAM auth through CloudFormation.
Django config
I then used the code referenced in the post at https://github.com/labd/django-iam-dbauth , adding it to my requirements.txt. After reading the code and walking through it I realized the README docs were incomplete and we must supply a region for it to work. This is what I ended up with in my settings/dev.py file:
DATABASES = {
‘default’: {
‘HOST’: os.environ[‘DATABASE_HOST’],
‘NAME’: os.environ[‘DATABASE_NAME’],
‘USER’: os.environ[‘DATABASE_USER’],
‘ENGINE’: “django_iam_dbauth.aws.postgresql”,
“OPTIONS”: {
“use_iam_auth”: True,
“region_name”: “us-east-1”,
},
}
}
Look, ma — no PASSWORD! Then ENGINE and OPTIONS were the critical bits; the other info comes from the environment our Docker container runs it, pretty standard.
I could then run Django on my EC2 (we do it inside Docker) and it worked great for more than 15 minutes, so we knew the django-iam-dbauth worked without token timeouts. All was well… until it wasn’t.
Django Failures, rdsauthproxy Failures
After about an hour, I started seeing authentication failures in the Django logs. We’re running it in gunicorn and saw:
[2021-05-06 18:23:31 +0000] [52] [DEBUG] GET /
psycopg2.OperationalError: FATAL: PAM authentication failed for user “rdsiamuser”
FATAL: pg_hba.conf rejects connection for host “10.42.9.163”, user “rdsiamuser”, database “rdsiam”, SSL off
and later, timeouts from gunicorn, presumably because Django could not auth to build a response:
[2021-05-06 19:44:35 +0000] [50] [CRITICAL] WORKER TIMEOUT (pid:63)
[2021-05-06 19:44:36 +0000] [50] [WARNING] Worker with pid 63 was terminated due to signal 9
This was a test instance with virtually no load except ALB health probes (GET /). RDS was on a db.t3.micro which had been fine for our developers exercising it when using typical password auth, so something broke when we switched to IAM. It seemed to recover after a while, briefly, then failed again and never did recover.
A look at the RDS Logs showed the cause of the problem:
* connect to 127.0.0.1 port 1108 failed: Connection refused
* Failed to connect to rdsauthproxy port 1108: Connection refused
* Closing connection 0
2021-05-06 20:35:10 UTC:10.42.9.163(35716):rdsiamuser@rdsiam:[31065]:LOG: pam_authenticate failed: Permission denied
2021-05-06 20:35:10 UTC:10.42.9.163(35716):rdsiamuser@rdsiam:[31065]:FATAL: PAM authentication failed for user “rdsiamuser”
2021-05-06 20:35:10 UTC:10.42.9.163(35716):rdsiamuser@rdsiam:[31065]:DETAIL: Connection matched pg_hba.conf line 13: “hostssl all +rds_iam all pam”
2021-05-06 20:35:10 UTC:10.42.9.163(35718):rdsiamuser@rdsiam:[31067]:FATAL: pg_hba.conf rejects connection for host “10.42.9.163”, user “rdsiamuser”, database “rdsiam”, SSL off
It appears that RDS has a proxy for PostgreSQL called “rdsauthproxy” but it died for some undeclared reason. It may have come back once or twice but eventually went down permanently and the IAM auth never worked again. I was able to stop then restart the RDS in the AWS console and the rdsauthproxy would come back, but it would soon go down again without a trace.
I found only one hit on this topic, from August 2020, with a “me too” from March 2021, and have posted my “me too” reply; zero response from AWS: https://forums.aws.amazon.com/thread.jspa?threadID=326681
AWS says too small, known problem
I filed a support ticket with AWS and they said that my t3.micro instance had a “CPU Baseline: 10%” and that it had been consistently above this, and that the T3 burstable “CPU Credit” dropped to 0 so it had used all its credits; Freeable Memory dropped to very low, and swap was high.
It sounded like I was thrashing the underlying T3 instance and exhausting resources, and that was probably what was killing rdsauthproxy. But why was it fine, under much higher developer load, when using Password auth instead of IAM auth?
AWS Support then went on to say:
while our internal team is indeed investigating further on this, since this issue is not really a bug and is mostly related to resource throttling, there might not really be a “fix” for it, as such. Therefore, we recommend all our customers to ensure that they have enough resources to have a seamless experience and avoid such scenarios.
We have no guidance of how much we have to scale up our RDS instance before it will stop dying: double? quadruple? different instance type? who knows… 🙁
db.t3.small seems to work — then falls over
I tweaked my CloudFormation to replace the micro instance with a db.t3.small and redeployed. Happily, it copied all the data and after a while the app started working again. Still, we’re not putting any load on this test instance. For our QA and Prod servers, we’ll have to watch load carefully; maybe use a db.m5.* instance instead of burstable db.t3.* instance.
About 12 hours later, with zero app load, we saw auth failures and RDS load went from < 10% to about 30%. So this size is too small for anything with IAM auth.
Adding IAM auth doubles, quadruples, octuples cost
Our db.t3.micro was running for months for our devs, demoing to our customer, it never fell over and was snappy enough; it costs $0.018/hour, or $13/month — totally reasonable for a Dev and maybe QA instance.
The db.t3.small fell over in 12 hours, and costs twice as much. A db.t3.medium costs twice that. So our dev instance is now costing 4x what it used to in order to support IAM auth, and we don’t know if it will fall over under anything but minimal load.
For QA, we’d need at least the same size, 4x the cost of the micro.
For Prod, we’d need 2x instances and probably an “m” instance so we don’t run out of “t” burst credits. Minimum in that class is db.m6g.large at $0.159/hour, that’s almost 10x the cost of our micro, and we need 2 for failover, about $230/month. That’s a lot of money for our fairly-small commercial app’s database, especially since if we used the same in Dev and QA (with only one instance each), it adds another $230/month. $500/month for a small app with Dev, QA, Prod. Not a way to win customers.
The point is the overhead of running IAM auth is causing us to increase or DB cost by roughly 4x – 10x compared to password-based auth!
Could we switch to Aurora? Let’s presume it can do IAM auth without falling over. If we go to the calculator and pick the lowest price option (db.r4.2xlarge) the price is $847! OK, that option’s out.
Maybe if our DB needs were big, and we already required a db.t3.xlarge or db.m6g.large the IAM penalty wouldn’t be noticeable, but it’s noncompetitive for our use case.
Locked Yourself Out?So you’ve GRANTed rds_iam to your PostgreSQL user and locked yourself out of your database. Now what?
You can use the AWS console to disable IAM authentication, and wait for it to reconfigure, and try later.
Or you can use the technique from
https://aws.amazon.com/premiumsupport/knowledge-center/rds-postgresql-connect-using-iam/
to get a 15-minute password token, just like our Django plugin uses via Boto3 calls.
On the EC2 with access to the RDS, use the “aws” CLI to generate a token. If this isn’t installed on your EC2, you can use their Docker image. Generate an auth token:
token:
export PGPASSWORD=”$(aws rds generate-db-auth-token –hostname $RDSHOST –port 5432 –region us-east-1 –username rdsiamuser)”
Then, connect using a Dockerized Postgres client on the EC2::
docker run -it
-e PGPASSWORD=$PGPASSWORD
postgres:alpine
psql -h $RDSHOST -p 5432
“sslmode=require dbname=postgres user=rdsiamuser”
This takes about 8 seconds on the first connection but it’s quick on subsequent connections; maybe the rdsauthproxy has a cache. Then you should be able to create another user with normal password creds or perhaps revoke the IAM creds to restore password login:
postgres=> REVOKE rds_iam FROM rdsiam;
Then you should be able to login with a password and do whatever else you need.
-
V! Studios Wins 2020 Communicator Award of Distinction for Online Video
V! Studios has received a 2020 Communicator Award of Distinction for its online video series, NASA ScienceCasts. The NASA ScienceCast series highlights scientific research and discoveries, keeping audiences informed, advancing understanding, and bringing wonder through animation and visualizations.
The Communicator Awards are judged and overseen by the Academy of Interactive and Visual Arts (AIVA), an assembly of leading professionals from various disciplines of the visual arts dedicated to embracing progress and the evolving nature of traditional and interactive media.
NASA ScienceCasts have invited viewers to learn more about topics such as studying forest height using laser light from space, to black holes, to particle physics on the International Space Station. Episodes are being produced in 4K for audiences to enjoy across many online platforms including Facebook, YouTube, Twitter, iTunes, as well as being broadcast on NASATV.
With over 6,000 entries received from across the US and around the world, the Communicator Awards is the largest and most competitive awards program honoring creative excellence for communications professionals. “We are extremely proud to recognize the work received in the 26th Annual Communicator Awards. This class of entries embodies the best of the ever-evolving marketing and communications industry” noted Eva McCloskey, managing director of the AIVA.
The Communicator Awards is the leading international awards program recognizing big ideas in marketing and communications. Founded nearly three decades ago, The Communicator Awards honors work that transcends innovation and craft – work that made a lasting impact, providing an equal chance of winning to all entrants regardless of company or agency size and project budget.is one of the largest awards of its kind in the world.
Headquartered in Tysons Corner, VA, V! Studios is a unique hybrid company, successfully combining left brain and right brain skills to weave technology, information, and the arts into innovative and effective products and services. Learn more about V! Studios services at: V-Studios.com
-
Serverless Step Functions with Callback
This is a demo of how you can use the “callback” pattern to restart a Step Functions state machine from within a Lambda function. It took me a while to dig through the AWS docs, sample code, and examples to unlock the mysteries, so I hope it saves you some time.
It is inspired by Ross Rhodes’ tweet on callbacks with Step Functions. He used the AWS Cloud Development Kit and SQS, but I’ll be using the Serverless Framework with direct Lambda calls because it’s a pattern that comes up repeately in our use cases. Ben Kehoe wrote an excellent AWS Blog post on the same topic; he’s using SNS Email for human approvals.
The SNS is also not exactly alined with our current use cases, but SQS- and SNS-driven restarts are both likely something we’ll need at some point.
All the code here is on our GitHub: https://github.com/v-studios/serverless-stepfunctions-callback
Our Real Use Case
Our application takes a file and uses a Lambda to split it up into chunks which are dropped onto S3. Each of those chunks’ S3 CreateObject event triggers a Lambda to process the chunk, so all the chunks get prococessed in parallel. Some chunks take longer than others, so once we determine that all the chunks are done, we want to restart our state machine. We do this by calling Step Functions API directly, indicating success.
Demo Implementation
This demo code skips the complexity of our real app, allowing us to focus on the state machine stop and restart. We’ll use a random chance to decide when we’re done, with a chance that the processing function fails, so we can signal the failure. Our state machine has a handler for this, so it can do different things on success and failure.
Our preferred backend language is Python, so that’s what we’ll use for our Lambda handler. Translating to Node or some other Lambda language should be trivial: just map the two API calls we make to your Step Functions SDK.
We’ve been using the Serverless Framework for a while for our commercial and government projects and really like it: it’s a pleasure to use and makes all the boring stuff go away. It takes care of the infrastructure so we don’t need to do our own CloudFormation, nor its shiny new cousin, Cloud Development Kit. Under the covers, Serverless does CloudFormation for us, and that’s just where it should be — under the covers, so we can inspect it if we need to, and ignore it most of the time.
Takahiro Horike’s Step Function plugin for the Serverless Framework makes it a breeze to describe state machines directly in our serverless.yml file.
Get it Running
Install the dependencies:
npm install
Assuming you’ve set your AWS credentials in your environment (we set AWS_PROFILE), deploy with Serverless; we use the default us-east-1 region and stage dev:
sls deploy
When done, you should see your functions and an HTTP endpoint we created to start the state machine:
Serverless: Packaging service…
…
Serverless: Stack update finished…
Service Information
service: serverless-stepfunctions-callback
stage: dev
region: us-east-1
stack: serverless-stepfunctions-callback-dev
resources: 15
api keys:
None
endpoints:
functions:
SplitDoc: serverless-stepfunctions-callback-dev-SplitDoc
ProcessAndCheckCompletion: serverless-stepfunctions-callback-dev-ProcessAndCheckCompletion
layers:
None
Serverless StepFunctions OutPuts
endpoints:
GET – https://yoururlhere.execute-api.us-east-1.amazonaws.com/dev/startIn the AWS console, you should see your state machine under Step Functions – State machines.
You can get details by clicking on the name; click the Definition tab to get the diagram.
Under the “Executions” tab, you can “Start execution”, and leave the default input alone. Depending on chance, it should go through ContinueProcess and succeed, or ProcessingFailed and fail. We can examine the inputs and outputs of each state, so here we look at ContinueProcess:
For the failure case, we examine at ProcessingFailed and can see it has an Exception instead of Output:
For convenience, we added an HTTP endpoint to start the state machine; this simulates how our real application’s state machine is started by some external event, like dropping an object into S3 or a DynamoDB row change. You can use this to start the state machine from the CLI instead of the console:
curl https://yoururlhere.execute-api.us-east-1.amazonaws.com/dev/start
Do this a few times then look at the console to see the results; most will likely succeed, some will fail, due to the random chance.
On to the Code!
So how does this work? How are we defining the state machine, and how do we define the restart step, then how do we invoke it? We’ll ignore the overall state machine definition because it’s well-documented, so we can focus on the more subtle callback mechanism.
In serverless.yml we specify for the Resource the waitForTaskToken magick incantation. Normally, our state machine would specify a Lambda function as its resource, but we can’t do that when we want to wait. We then specify our Lambda under the Parameters as FunctionName, and pass into it the PayLoad containing the Step Function $$.Task.Token:
WaitForCompletion:
Type: Task
Resource: arn:aws:states:::lambda:invoke.waitForTaskToken
Parameters:
FunctionName: ${self:service}-${opt:stage}-ProcessAndCheckCompletion
Payload:
taskToken.$: $$.Task.Token
Next: ContinueProcess # the happy pathThe Lambda will need to call the Step Functions API with this Task.Token to flag success or failure, so it has to be an input to the function. We can add anything else we want as an input here too.
As usual, the state has a Next for the happy path, but here we’ve defined error handlers with the Catch directive. We first try to catch an error that we specify in our Lambda, then a catch-all in case anything else blows up (e.g., a Python exception due to bad code):
Catch:
– ErrorEquals: [“ProcessingFailed”]
Next: ProcessingFailed
– ErrorEquals: [“States.TaskFailed”]
Next: UnexpectedFailureIn our Lambda handler function, we don’t actually do any processing in this demo. For the real application, we’d process our chunk and check for all the chunks being processed; if they’re not all complete, we’d just return. Here, we pretend we have determined that all the chunks are done, and signal the Step Function state machine to continue:
task_token = event[‘taskToken’]
SFN.send_task_success(
taskToken=task_token,
output=json.dumps({‘msg’: ‘this goes to the next state’,
‘status’: ‘looking good’}))We can set the output to be anything we want to feed to the next step in our state machine.
To indicate failure, we make a similar call, and can set optional error to a named code we can catch in our Step Function, and the cause to provide more details:
SFN.send_task_failure(
taskToken=task_token,
error=’ProcessingFailed’,
cause=f’Something broke in our chunk processing chance={chance}’)
If this gets executed, the ProcessingFailed should get caught by the Catch… ErrorEquals: [“ProcessingFailed”] clause in the state machine definition.Conclusion
We now know how to define waitForTaskToken and pass tokens ot lambdas so they can signal success and failure to restart the state machine, and can use it with the Serverless Framework’s Step Functions plugin with ease. Step Functions invoke Lambdas as Tasks asynchronously, so we may have many opportunities to have the state machine pause and wait for completion of a longer-running lambda, or many parallel lambdas.
-
V! Studios Receives Nomination for a 2018 Emmy® Award
V! Studios has received a nomination for a 2018 Emmy® Award for its work on the NASA ScienceCasts episode, “Two Sides of the Same Star.” The NASA ScienceCast series highlights scientific research and discoveries, keeping audiences informed, advancing understanding, and bringing wonder through animation and visualizations.
“Two Sides of the Same Star” explores the nature of neutron stars. Animations are used throughout the episode to explain the variability of neutron stars’ magnetic fields, and the scientific debate over the evolutionary stages of a neutron star.
The nomination comes from the National Capital Chesapeake Bay Chapter (NCCB) of the National Academy of Television Arts & Sciences (NATAS) in the category of Health/Science – Program Feature/Segment. The NCCB is a non-profit, professional organization serving the Maryland, Virginia and Washington, DC television community. The NATAS Emmy® Award is the industry’s benchmark for the recognition of television excellence.
The 61st Emmy® Awards will be Livestreamed on June 22, 2019 at www.capitalemmys.tv/emmys.
Headquartered in Tysons Corner, VA, V! Studios is a unique hybrid company, successfully combining left brain and right brain skills to weave technology, information, and the arts into innovative and effective products and services. Learn more about V! Studios services at: V-Studios.com.
-
Quick process of adapting Megascan Atlas images into volumetric lighting scenes.
3d models, C4D, Cinema 4D, extrude, megascans, octane, octane render, Quixel, seaweed, underwater, volume, volumetric lightingIf you are like me, and want to use volumetric atmosphere in a scene that incorporates the use of scanned imagery from Megascans (using their 2D scanned imagery known as Atlases) to populate the area, you are out of luck using the standard method of applying the images to planes and use the opacity channel to cut-out the shape. This is because the outline of the plane is still shown in a scene that incorporates volumetric fog/lighting, etc.
To get around this limitation, I have found a moderately quick method to get the look you want with the proper shadows and such. This process involves the use of three software packages. Adobe Photoshop, Adobe Illustrator and Cinema4D and the Megascan Atlas source file(s) from quixel.com.For this blog, I’m going to talk about incorporating some Seaweed images from Megascans, into my underwater scene. Here is what the scene looks like with the alpha channeled image plane cut-out approach. You’ll notice the planes are easily visible along the bottom of the seabed, even though we ‘cut out’ the shape of the seaweed with the alpha channel/opacity channel of the seaweed.
With the process I’m going to describe below, here is what the scene will now look like:When writing this blog, I wanted to encompass all levels of experience. So, I apologize in advance if you already know many of these steps. But, hopefully, you’ll still find some useful information in this approach. There are many ways to achieve this look, and so this is just one of many, but it was one I found quick and easy. There are plug-ins that can expedite this process, so feel free to experiment further. I’ve broken the process down to 26 steps. Here they are:
1) Navigate to the megascan library residing on quixel.com to find some seaweed images to use on the seafloor (https://quixel.com/megascans/library?search=seaweed).2) After selecting the ‘Plant Seaweed’ Atlas that I want to use, I download it to my desktop. You will notice that when you open it up, it contains a variety of files (Albedo, Bump, Specular, Normal, etc.) and that they are in 4K resolution.Depending on how close you are going to get to the image, you may only need to use a couple of the files. For my purpose, I’m going to use it to populate the seafloor in my scene and don’t anticipate getting very close to it. So, I’ll only use the ‘Albedo’ (for the color) and ‘Opacity’ (for the cutout) images, but feel free to use whichever maps makes the most sense for your project. And I’m also going to reduce the resolution quite a bit since I’m not going to need that much detail and I want to always conserve memory in Octane since I’m running it on either a GTX 1060 or GTX 1070 most of the time.3) Next, open up all the image files you’ll want to use for your project in Adobe Photoshop (or whichever app you use). In my case, I’ll open the Albedo and Opacity files.4) Copy and paste each of the image files into layers on the ‘Albedo’ file. You want to have all the channels on one image as layers so that when we crop them, they will all line up perfectly.5) Once you have that done. Reduce the image to something more appropriate to your needs. In my case, I’m only going to need a resolution of 512×512 pixels. Go under Image>Image Size and select the resolution you want to shrink it to.6) Then select the ‘Opacity layer’ and select ‘Image>Adjustments>Invert’ to invert the opacity layer to black on white as Adobe Illustrator see’s paths as black on white.
7) It should now look like this:
8) Do a ‘Save As’ of this Photoshop file as a native PSD format file. We are going to create two separate objects from this file. The seaweed on the left and the seaweed on the right. So, use the ‘Crop’ tool to crop the image on the left first. Crop it to as close as you can to the borders of the seaweed image.
9) It should look like this now:
10) Next we will Save out the Opacity and Albedo layers as two separate images. Go to File>Export>Quick Export as PNG and save it as ‘left seaweed opacity’ (or something similar if you aren’t using Adobe Creative Cloud).
11) Next hide the ‘opacity layer’ layer and do another File>Export>Quick Export as PNG and save it as ‘left seaweed albedo’. Since it is the same dimensions as the opacity layer, when we import it into C4D as a texture it will fit perfectly.
12) Now go back several steps to right before we cropped the left seaweed image, so that you will have both images visible again. This time crop the right seaweed and repeat steps 9-11 (naming your files ‘right’ instead of ‘left’ – as I’m sure you already know) 🙂
13) Now we will jump into Adobe Illustrator. Once you have Illustrator open, open up the ‘left seaweed opacity’ file you had saved. Then select the left seaweed image.
14) The open the ‘Image Trace’ window by going to Window>Image Trace, this will pop up the floating ‘Image Trace’ window.
15) In the ‘Image Trace’ window, select the ‘Preset’ drop down menu and choose ‘Silhouettes’ which I found worked well for this image.
16) It’s going to look a little blobby. Don’t worry, we’ll clean it up. Toggle down the ‘Advanced’ arrow to see additional controls. I found that the settings of 180 for the Threshold, 100% for the Paths, 100% for the Corners and 50 px for the Noise created a good clean image. You want to make sure you don’t adjust the Threshold and Noise setting too low or you’ll have free-hanging portions of your image which will import with issues into C4D.
17) Now we want to Save the file out as an Adobe Illustrator format file. File>Save As. Create a folder, or use one you already designated, and name it something logical like ‘left seaweed-opacity.ai’.18) You will get prompted by a dialog box asking you what version to save it as. You MUST select Illustrator version 8, as C4D only reads that format.
19) Okay, almost there. Now launch Cinema4D and open the ‘left seaweed-opacity.ai’ file. You’ll be prompted by a dialog box asking you what ‘Scale’ you want to bring it in as. I found that for my purposes, a Scale of 0.05 Centimeters and Connect Splines and Group Splines checked, worked well.
20) The imported Illustrator file should look like this in your C4D window. It should come in as a spline object.
21) Next, you’ll want to add an ‘Extrude’ attribute to the imported seaweed.
22) Put the ‘left seaweed-opacity’ spline object under the ‘Extrude’ attribute. Then in the ‘Extrude’ options, select the ‘Object’ tab and make the Z Movement something like .25
23) Now we need to texture it. Create an Octane Diffuse Material and Import the ‘Seaweed Left Albedo’ image we had created earlier and apply it to the Diffuse channel.
24) We need to apply the new material onto the Extruded object. However, change the ‘Projection’ method from ‘UVW Mapping’ to ‘Flat’.
25) Finally, Right-Mouse click the texture icon and select ‘Fit to Object’ and that’s it! Repeat the process for the ‘Right Seaweed’ image and any others that you want to use. You should probably name the file something like ‘Left Seaweed’ so you know what it is as you start importing additional seaweed objects.
26) Now copy and paste this object(s) into your volumetric scene project to replace the decal planes and you have a clean model of the seaweed(s). This process is actually very quick once you understand the process. Hope this helps!
Here is a finished render using the process I outlined in this blog for both the seaweed particulates floating in the water and growing on the seafloor.:
-
Unlocking table data using open source OCR
This summer we were awarded a small research grant from NASA’s Technology Data and Innovation Division to investigate extracting structured information from scans of engineering documents, and we recently demoed our proof-of-concept app for the project to NASA’s Office of the CIO. Our previous work for the NASA Extra Vehicular Activities (EVA) Office used serverless cloud computing and optical character recognition (OCR) to extract unstructured text, and make documents searchable. For this project, NASA asked us to retrieve structured tabular data from the parts lists in their technical diagrams. Because manual entry of these details is tedious, slow, and error-prone, NASA is looking for software tools to assist human technicians by making this process easier, faster, and more accurate.
After surveying the literature, we came up with several candidate approaches. Though we initially expected to use OCR software to solve the entire problem, we found it was unable to reliably extract all the content from tables it identified in the diagrams. In the end, we came up with a three-step approach combining best-of-breed open-source tools: (1) use techniques from computer vision to identify horizontal and vertical lines; (2) cluster the parallel lines to infer table rows and columns (and, by extension, cells); (3) extract text from the cells using OCR.
With the server-side algorithm identified, we developed a simple, focused UI to help users feed in the images of parts list tables. First, the user selects and uploads a document (Figure 1), which our software converts to an image for display. The user then “lassos” the desired table inside of this image (Figure 2). Finally, the server does the extraction and returns a downloadable CSV which the user can view/edit in Excel, Google Sheets, etc (Figure 3).
Since we can apply our technique to extract text and row, column, and cell relationships from any tabular data source and we can’t post NASA’s sensitive spaceflight hardware diagrams here, we’ll be substituting an engineering diagram we found on the Internet.

Figure 1: user-uploaded diagram 
Figure 2: user lassos table 
Figure 3: the extracted table text in a spreadsheet As you can see, the accuracy of text extraction and row, column, cell preservation is outstanding even when starting with a low-resolution, low-contrast scan of a technical drawing.
We’re very happy with the results of this quick proof-of-concept, and look forward to applying it to new data sets and use cases to refine it more. We have some ideas for improving the feature set, and are really interested in comparing and/or combining it with AWS Textract to prepare data sets for domain-specific tabular data extraction AI’s! If you’re interested in scheduling a demo or have suggestions on future directions for this work, please contact us at info@v-studios.com or leave a comment below!
-
Lambda-generated Presigned S3 URLs with AES encryption: CORS is Hell
This is a followup to a previous post on how we use Lambdas to generate presigned URLs so that a user’s browser can directly upload to S3. We now want to have our S3 bucket enforce server-side encryption for all uploaded files. Getting all the pieces to work together was a bit hairy: bucket policies, URL settings, HTTP headers, and mostly the dreaded CORS configuration. This should be applicable to other upload properties as well. Finally, we close with a comparison of the default AWS signature algorithm and the newer V4 signatures.
Architecture
The upload portion of our architecture looks like the following diagram. An Angular application is served from an S3 bucket to the browser. It has a component to select a file and invoke a getUploadURL function which sends the filename and MIME type to a Lambda function; the Lambda calculates a presigned URL which permits uploading for a short time, using the IAM permissions applied to the Lambda. This allows the browser to do secure uploads without leaking credentials; more details on this are in that earlier post.
Our system resources, policies and the Lambda code is defined using the Serverless Framework; it tames the complexity and makes deployment a breeze.S3 Bucket Policy Enforces Crypto
We define a policy on our S3 bucket that requires uploads to use server side encryption (SSE) with the AES-256 cypher. It does this by checking the appropriate headers supplied with the upload. Rather than repeat it here, check the AWS docs.Lambda returns Presigned URLs with SSE
When we generate the presigned URL, we include a requirement for SSE using AES. We’re using Python and the Boto3 SDK.s3 = boto3.client(‘s3’) # See below about non-default Signature version 4params = {‘Bucket’: UPLOAD_BUCKET_NAME,‘Key’: ‘doc_pdf/’ + filename,‘ContentType’: content_type,‘ServerSideEncryption’: ‘AES256’}url = s3.generate_presigned_url(‘put_object’,
Params=params,
ExpiresIn=PSU_LIFETIME_SECONDS)The URL we get includes query string parameters indicating we want x-amz-server-side-encryption, and the shape of the URL depends on the AWS signature version we’re using (see below).This seems fine, but it doesn’t actually force the encryption. The generated URL can only specify information on the URL’s query strings, but S3 doesn’t look at those — it looks for HTTP headers to tell it how to disposition the upload.Browser Must Set SSE HTTP Headers
Since S3 wants HTTP headers to tell it to enable encryption (as well as Content-Type and other metadata), we must have our client code set them. In our Angular app, we do this:putUploadFile(uploadURL: UploadURL, file: File, fileBody): Observable<any> {const headers = {‘Content-Type’: file.type,‘x-amz-server-side-encryption’: ‘AES256’, // force SSE AES256 on PUT};const options = { ‘headers’: headers };return this.http.put(uploadURL.url, fileBody, options).pipe(tap(res => console.log(`putUploadFile got res=${JSON.stringify(res)}`)),catchError(this.handleError<UploadURL>(‘putUploadFile’, null)));We can watch the browser console and get the generated URL, we can use “curl” to PUT to the S3 bucket with the same presigned URL and HTTP headers: our upload works:curl -v -X PUT-H “Content-Type: application/pdf”-H “x-amz-server-side-encryption: AES256”–upload-file mydoc.pdf$PresignedUrlWeGotFromLambdaHowever, when the Angular app does the HTTP PUT, it fails.NG PUT requires S3 CORS allowing SSE Header
The console shows errors in the HTTP OPTIONS preflight check; this sure smells like a CORS problem. When we had our serverless.yml create our bucket, we defined a CORS configuration that allowed us to PUT, and to specify Content-Type headers. We just need to add a new CORS setting to tolerate the SSE header.Type: AWS::S3::BucketProperties:BucketName: ${self:custom.s3_name}CorsConfiguration:# Needed so WebUI can do OPTIONS preflight checkCorsRules:– AllowedMethods:– PUTAllowedOrigins:– “*”AllowedHeaders:– content-type– x-amz-server-side-encryptionWe could have configured it with AllowedHeaders: “*” but that’s more permissive than we’d like, so we opt to be explicit in what we tolerate.We redeploy our Serverless stack to update the S3 configuration, and our app starts uploading successfully! If you’re not doing this with Serverless, just update through the AWS Console or whatnot.Now we can see the files we uploaded are AES-256-encrypted:AWS Signature: default versus V4
By default, the boto3 S3 client is not using AWS Signature Version 4, and the upload does work. We’ve used V4 before on other projects, and understood it to be best practice; we thought it might be required, but turns out it’s not. However, we can enable V4, and it works great. Interestingly, the generated presigned URLs are very different.In both cases, the base URL we get is the same:https://myuploads-dev.s3.amazonaws.com/doc_pdf/mydoc.pdfThere are significant differences in the query string parameters appended to this. Below we show the decoded parameters for comparison.Default Signature
We get an S3 client with the default signature algorithm:s3 = boto3.client(‘s3’)The query string parameters are:AWSAccessKeyId: ASPI31415926535Signature: Vqfl0NqIrr6ifBB3f9T1hXI5/+U=content-type: application/pdfx-amz-server-side-encryption: AES256x-amz-security-token: …Expires: 1541015925AWS V4 Signature
We can request the V4 signature like:s3 = boto3.client(‘s3′, config=Config(signature_version=’s3v4’))The query string parameters become:X-Amz-Algorithm: AWS4-HMAC-SHA256X-Amz-Credential: ASPI31415926535/20181031/us-east-1/s3/aws4_requestX-Amz-Date: 20181031T190519ZX-Amz-Expires: 3600X-Amz-SignedHeaders: content-type;host;x-amz-server-side-encryptionX-Amz-Security-Token: …X-Amz-Signature: a22b58dce238ed393026027ec0b40a7ffd0a9647d792fb0cc3d720bc1cc89fe4Wrap-up
There are a lot of cat-herding to make this work, but once in place, it works beautifully: enforced encryption, time-limited presigned URLs, and browser uploads to S3. Now that we know all the pieces that need to be addressed, we can use the same approach to add other S3 object properties, like read-only ACLs, expiration date, etc.


































