# Security

# Orchestra Security

# Audit Logging

You can enable audit logs for the Orchestra service using the following settings :

orchestra.security.audit.enabled=true
orchestra.security.audit.log.path=/path/to/orchestra-audit.log

The following events are logged :

  • WORKFLOW_START : When a workflow is started, the event contains the workflow name and the time
  • WORKFLOW_PRIVILEGED_ACTION_EVENT : When a workflow containing a security sensitive component is started ( for example MessageTransformer or MessageFilter which are using the Python Scripting Engine), the event contains the workflow name, time and the reason of the privilege, for eg : WORKFLOW WITH 'PYTHON SCRIPTING ENGINE' STARTED
  • PRIVILEGED_WORKFLOW_COMPONENT_FAILURE : When a security sensitive component has a failure, the event contains the name of the workflow, the name of the component, and the type of the component

# Python Scripting Engine Security Risks

Orchestra components that use the Python Scripting Engine are considered security sensitive, because Hume users can use these components to write and execute any Python script. Unlike other Orchestra components which are security tested, GraphAware has no control of the scripts that Hume users write and execute using the security sensitive components. For that reason, security sensitive components have to be explicitly enabled using the experimental flag.

Potential security risks include the following:

  • The Python script can create files
  • The Python script can make http calls
  • The Python script can access environment variables

# Python Scripting Engine Security Risk Mitigation

# Option 1: Minimise Potential Risk

The security risks are limited to the same locations that the Linux system user running the Orchestra service is restricted to.

The easiest, least secure option is to minimise the potential risks by

  • Preventing the use of the urllib module, as network access is not granted to the scripting engine. This has been done by GraphAware.
  • Making the Linux user that runs Orchestra a dedicated Orchestra user with the least possible amount of privileges
  • Having a dedicated role in Hume for Orchestra workflow editors and assigning this role to trusted users only (i.e. users that otherwise have administrative privileges)

# Option 2: Harden Python Scripting Engine

It is possible to restrict what the Python Scripting Engine can do, by enabling the embedded Python Interpreter to run in a thread managed by a dedicated Java Security Manager.

The Java Security Permissions for it are listed below. You can refer to https://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html for a description of what the permission names mean and the security risks associated with the granted permissions.

grant {
    permission java.lang.RuntimePermission "createClassLoader";
    permission java.lang.RuntimePermission "getProtectionDomain";
    permission java.lang.RuntimePermission "accessDeclaredMembers";
    permission java.lang.RuntimePermission "getClassLoader";
    permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
    permission java.util.logging.LoggingPermission "control";
    permission java.io.FilePermission "/${java.io.tmpdir}/-", "read";
    permission java.io.FilePermission "${user.dir}/-", "read";
    permission java.io.FilePermission "/private/${java.io.tmpdir}/-", "read";
    permission java.io.FilePermission "${java.class.path}/-", "read";
    permission java.util.PropertyPermission "java.vm.name", "read";
    permission java.util.PropertyPermission "java.vm.vendor", "read";
    permission java.util.PropertyPermission "os.name", "read";
    permission java.util.PropertyPermission "os.arch", "read";
    permission java.util.PropertyPermission "line.separator", "read";
    permission java.util.PropertyPermission "jnr.ffi.provider", "read";
    permission java.util.PropertyPermission "sun.arch.data.model", "read";
    permission java.util.PropertyPermission "jnr.constants.fake", "read";
    permission java.util.PropertyPermission "user.dir", "read";
};

To enable the Security Management, you will need to configure the following :

hume.orchestra.security.serverPolicy.enabled=true

and add an argument the systemd service java launch command -Dsun.misc.URLClassPath.disableJarChecking=true, full command example :

ExecStart=/usr/bin/java -Dsun.misc.URLClassPath.disableJarChecking=true -Djava.net.preferIPv4Stack=true -Djavax.net.ssl.trustStore=/opt/hume/security/hume-truststore -jar hume-orchestra-<version>.jar

Note : The launch command is automatically added for Docker users.

# Limitations

The security permissions above still need to be handled with care when running Orchestra, for the following reasons :

  • Read access to the directory where the Orchestra jar is located needs to be granted, as the scripting engine needs to locate the jar and read its content
  • Files in the same directory or below can be read

Examples :

A MessageTransformer with the following content :

def __call__():
  with open('/opt/hume/application.yml', 'r') as f:
    body['app'] = f.read()
    return body

will work.

If the Orchestra Engine is running with a dedicated Linux system user, the following script :

def __call__():
  with open('/etc/passwd', 'r') as f:
    body['hack'] = f.read()
    return body

will not be allowed.

Moreover, some module imports will not work for technical reasons that are not yet resolved :

  • The re module import will throw an exception, because the underlying module tries to create a cache on disk

A matrix of the Python modules available and their working state when the Security Manager will be provided in the future.

# java.util.regex as reeplacement

Jython offers the ability to import java classes from within the interpreter. The classes from the java's java.util.regex can be used as Python's re module replacement.

### importing ###
# import re
# becomes
import java.util.regex.Pattern as Pattern


### simple match check ###
# if re.match(PATTERN, string):
#    does_match()
# else:
#    does_not_match()
# becomes
if Pattern.matches(PATTERN, string):
    does_match()
else:
    does_not_match()


### RE matching ###
# match_object = re.match(PATTERN, string)
# becomes
match_object = Pattern.compile(PATTERN).matcher(string)

### Get the start character index for every match ###
# start_chars = [i.span()[0] re.finditer(PATTERN,string)
# becomes
def finditer(PATTERN,string):
  match = Pattern.compile(PATTERN).matcher(string)
  while match.find():
    yield match
start_chars = [i.start() for i in finditer(PATTERN,string)]


### RE replace ###
# purged = re.sub(r"[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]", "", text)
pattern = Pattern.compile("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]")
matcher = pattern.matcher(text)
purged = matcher.replaceAll("")

Please notice the different interface between re.match (opens new window) and java.util.regex.Matcher (opens new window)

A complete reference to the java.util.regex package can be found at oracle's java documentation (opens new window).

The dynamic type management is handled automatically by Jython.

Further information is available at Jython's documentation (opens new window)

# Option 3: Prevent the Use of Python Scripting Engine

The most secure option is preventing the use of the Python Scripting Engine by using one of the following alternatives:

  • a) Writing the Python code inside a codebase that Hume users have no control of, exposing it via an http endpoint, and letting Hume users call the endpoint by the Enricher component in Orchestra
  • b) Rewriting the Python code into a Java extension, the codebase of which you have control of, and deploying it as an Orchestra plugin.

# Orchestra Configuration Recommendations

Orchestra runs as standalone component ( meaning it is not aware of the API neither a databse ) and for some of its features ( auto-restart of workflows or workflow scheduling ) it needs to be aware of the workflow configurations without the API.

To do so, since version 2.7, the API now send the workflow configuration in the background to the Orchestra API, which in turn will continuously keep a copy of the configuration on disk.

WARNING

Workflow configurations contain Resource passwords as well.

In order to meet standard security requirements, the following recommendations are in place :

  1. Control where Orchestra will store the workflow configurations, to do so you can provide the directories in the following sections of the Orchestra application.yml file :
[root@hume-vm orchestra]# cat application.yml
server:
  port: 8100
orchestra:
  datasource:
    filesystemRoot: /opt/hume/orchestra/config
  startup:
    workflows:
      directory: /opt/hume/orchestra/config
      ......

With the previous configuration in place, you will see the following content inside after interaction wiht the Orchestra editor in the frontend ( creating new workflows, editing existing or starting one of them) :

[root@hume-vm orchestra]# cd config/
[root@hume-vm config]# ll
total 204
-rw-r--r--. 1 hume hume   270 Dec 11 14:11 Import wines_197ebb87-94ef-4e9a-b0c9-eb3213161d05.json
-rw-r--r--. 1 hume hume 24762 Dec 11 14:11 Import wines_197ebb87-94ef-4e9a-b0c9-eb3213161d05.workflow.json
-rw-r--r--. 1 hume hume   256 Nov 16 13:30 Test1_31b7347c-fb79-4383-a17b-55bb8b2ed538.json
-rw-r--r--. 1 hume hume   259 Nov 16 13:27 Test1_31b7347c-fb79-4383-a17b-55bb8b2ed538.scheduler.json
-rw-r--r--. 1 hume hume 14287 Nov 20 11:05 Test1_31b7347c-fb79-4383-a17b-55bb8b2ed538.workflow.json
-rw-r--r--. 1 hume hume   147 Dec  5 10:33 copy_e8511fd9-f249-4802-aff4-e2a61d3a0371.workflow.json
-rw-r--r--. 1 hume hume   262 Dec  6 20:11 imported_ff7562d7-892b-416f-b279-f17914a6eff0.json
-rw-r--r--. 1 hume hume 32621 Dec  6 21:23 imported_ff7562d7-892b-416f-b279-f17914a6eff0.workflow.json
-rw-r--r--. 1 hume hume   276 Nov 25 22:15 secure workflow_4e1c511c-ddc4-4810-9496-b83ad632469d.json
-rw-r--r--. 1 hume hume 17168 Nov 25 22:15 secure workflow_4e1c511c-ddc4-4810-9496-b83ad632469d.workflow.json
-rw-r--r--. 1 hume hume   260 Dec  6 22:28 securea_ede1445d-b211-41b6-a42f-5a5430e27e9c.json
-rw-r--r--. 1 hume hume 13870 Dec  7 03:16 securea_ede1445d-b211-41b6-a42f-5a5430e27e9c.workflow.json
-rw-r--r--. 1 hume hume   256 Dec  6 21:51 testa_8d4e1930-8142-403e-898e-1ab6f7cfbf3e.json
-rw-r--r--. 1 hume hume 13567 Dec  6 21:45 testa_8d4e1930-8142-403e-898e-1ab6f7cfbf3e.workflow.json
-rw-r--r--. 1 hume hume  5668 Dec  6 21:42 testb_1baf3c49-89c0-407e-841c-686d156b2620.workflow.json
-rw-r--r--. 1 hume hume   256 Dec  6 21:44 testc_349dbe90-0832-48b0-bfcf-7d29774188ee.json
-rw-r--r--. 1 hume hume  4300 Dec  6 22:18 testc_349dbe90-0832-48b0-bfcf-7d29774188ee.workflow.json
-rw-r--r--. 1 hume hume   153 Dec  5 10:32 wines2.0wf_9b9f0ec7-be47-4227-a34c-c4759e1624d4.workflow.json
-rw-r--r--. 1 hume hume   266 Nov 16 14:30 workflow 2_30a12053-ad24-41d9-9c59-0861b98adac6.json
-rw-r--r--. 1 hume hume   264 Nov 16 13:45 workflow 2_30a12053-ad24-41d9-9c59-0861b98adac6.scheduler.json
-rw-r--r--. 1 hume hume 10417 Nov 16 13:52 workflow 2_30a12053-ad24-41d9-9c59-0861b98adac6.workflow.json
  1. Use Variable Encryption for Resources

When a Resource use a sensitive variable, such as the password of a Neo4j instance, create it as an Ecosystem variable and enable Encryption on the variable.

It will then be transmitted in its encrypted form and only be decrypted at the last moment needed ( when the workflow is started)

# Testing Security

Here is a scenario to validate that all the components of Hume have been properly configured for maximum security :

  1. Create a Variable neo4j-password-secure with a value, for eg : secret and enable encryption :
Security Assessment
  1. Create a Neo4j Resource and use the $$neo4j-password-secure for the password value ( The $$ form indicates to use Ecosystem Variables)
Security Assessment
  1. Verify that the password is stored encrypted in postgres :
[root@hume-vm ~]$ sudo -u postgres -i
-bash-4.2$ psql
psql (12.3)
Type "help" for help.

postgres=# \c hume;
You are now connected to database "hume" as user "postgres".

hume=# SELECT * FROM ecosystem_variable WHERE key = 'neo4j-password-secure';



                 uuid                 |       created_at        |       updated_at        | encrypted |          key          |          value           |           created_by_uuid            | last_updated_by_uuid
--------------------------------------+-------------------------+-------------------------+-----------+-----------------------+--------------------------+--------------------------------------+----------------------
 7b73b034-6865-46e9-9fac-77dc502c3c82 | 2020-12-16 09:53:32.079 | 2020-12-16 09:53:32.079 | t         | neo4j-password-secure | B6PwpCp14nqYsxGlFgyUkg== | d3fd8a5f-0451-4f27-8401-9163dac9d2df |
(1 row)

The value in the value column should be encrypted.

  1. Create a workflow, in any Knowledge Graph :
Security Assessment

And use a Neo4j Writer and select the previously created resource

Security Assessment Security Assessment
  1. Check the workflow on disk in the /opt/hume/orchesta/config (example) directory
[root@neossl-vm orchestra]# ls -l config/
total 212
-rw-r--r--. 1 hume hume  4353 Dec 16 10:00 security assessment_dd3dbe6b-825a-46d5-979f-1a63041ede75.workflow.json
[root@neossl-vm orchestra]#
  1. Assess the file does not contain the value in plain but well the encrypted value :
[root@hume-vm config]# cat security\ assessment_dd3dbe6b-825a-46d5-979f-1a63041ede75.workflow.json | grep secret
[root@hume-vm config]#

Optionally search for the AES value, which is the prefix of any encrypted variable in workflow configuration

cat security\ assessment_dd3dbe6b-825a-46d5-979f-1a63041ede75.workflow.json | grep AES
{"name":"security assessment_dd3dbe6b-825a-46d5-979f-1a63041ede75","autoStart":false,"components":[{"name":"Neo4j Writer_6d749714-5a23-4e36-bdf5-b80224afe24e","component":{"qualifiedName":"#Hume.Orchestra.Persistence.Neo4jWriter","name":"Neo4j Writer","type":"PERSISTENCE","state":"AVAILABLE","inputType":"javax.json.JsonObject","outputType":"javax.json.JsonObject","options":[{"group":"Main","name":"stream_records","type":"BOOLEAN","label":"Stream Records","required":true,"defaultValue":false,"overridable":true},{"group":"Advanced","name":"batch_writes","type":"BOOLEAN","label":"Batch Writes","required":false,"defaultValue":false,"overridable":true},{"group":"Advanced","name":"batch_size","type":"INTEGER","label":"Batch size","required":false,"defaultValue":20,"overridable":true},{"group":"Advanced","name":"aggregation_timeout","type":"INTEGER","label":"Aggregation Timeout","required":false,"defaultValue":300,"overridable":true},{"group":"Advanced","name":"max_retry","type":"INTEGER","label":"How many times to retry Transient Errors ?","required":false,"defaultValue":0,"overridable":true},{"group":"Advanced","name":"delay","type":"INTEGER","label":"Delay in millis between retries","required":false,"defaultValue":0,"overridable":true},{"group":"Main","name":"store_error_content","type":"BOOLEAN","label":"Store latest errors content?","required":true,"defaultValue":true,"overridable":true},{"group":"Main","name":"store_message_content","type":"BOOLEAN","label":"Store latest messages content?","required":true,"defaultValue":true,"overridable":true},{"group":"Failure Handling","name":"failure_handler","type":"COMPONENT_SELECTION_FAILURE_HANDLER","label":"Failure Handler","required":false,"defaultValue":null,"overridable":true}],"resource":"com.hume.ecosystem.resource.Neo4jResource","capabilityQualifiedName":null,"experimental":null,"dryRunSupported":false},"options":[{"name":"batch_writes","value":false},{"name":"batch_size","value":20},{"name":"delay","value":0},{"name":"stream_records","value":false},{"name":"store_error_content","value":true},{"name":"store_message_content","value":true},{"name":"max_retry","value":0},{"name":"aggregation_timeout","value":300},{"name":"failure_handler","value":null}],"resource":{"name":"neo4j-security-test_ab9a4862-ea16-445b-9896-b7283949016c","resource":{"qualifiedName":"#Hume.Orchestra.Resource.Neo4j","name":"Neo4j Database","parameters":[{"name":"host","type":"STRING","required":true},{"name":"port","type":"INTEGER","required":false},{"name":"user","type":"STRING","required":true},{"name":"password","type":"PASSWORD","required":true},{"name":"database","type":"STRING","required":false},{"name":"read_only","type":"BOOLEAN","required":false}],"group":"DATABASES","experimental":false},"parameters":{"host":"localhost","port":7687,"user":"neo4j","password":"###AES-B6PwpCp14nqYsxGlFgyUkg==","database":"neo4j","read_only":false}},"skill":null,"skillConfiguration":{"outputs":{},"inputs":{}}}],"links":[],"resources":[{"name":"neo4j-security-test_ab9a4862-ea16-445b-9896-b7283949016c","resource":{"qualifiedName":"#Hume.Orchestra.Resource.Neo4j","name":"Neo4j Database","parameters":[{"name":"host","type":"STRING","required":true},{"name":"port","type":"INTEGER","required":false},{"name":"user","type":"STRING","required":true},{"name":"password","type":"PASSWORD","required":true},{"name":"database","type":"STRING","required":false},{"name":"read_only","type":"BOOLEAN","required":false}],"group":"DATABASES","experimental":false},"parameters":{"host":"localhost","port":7687,"user":"neo4j","password":"###AES-B6PwpCp14nqYsxGlFgyUkg==","database":"neo4j","read_only":false}}],"skills":[],"workflowResources":[{"name":"neo4j-security-test_ab9a4862-ea16-445b-9896-b7283949016c","resource":{"qualifiedName":"#Hume.Orchestra.Resource.Neo4j","name":"Neo4j Database","parameters":[{"name":"host","type":"STRING","required":true},{"name":"port","type":"INTEGER","required":false},{"name":"user","type":"STRING","required":true},{"name":"password","type":"PASSWORD","required":true},{"name":"database","type":"STRING","required":false},{"name":"read_only","type":"BOOLEAN","required":false}],"group":"DATABASES","experimental":false},"parameters":{"host":"localhost","port":7687,"user":"neo4j","password":"###AES-B6PwpCp14nqYsxGlFgyUkg==","database":"neo4j","read_only":false}}]}