Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 19 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,19 @@ TiDE can be used in various environments. Below are the prerequisites and instru

1. In the command window, execute the following

```
mvn clean install -DskipTests
java -jar ./target/deid-3.0.21-SNAPSHOT-dataflow.jar --deidConfigFile=./src/main/resources/deid_config_omop_genrep.yaml --annotatorConfigFile=./src/main/resources/annotator_config.yaml --inputType=text --phiFileName=./phi/phi_person_data_example.csv --personFile=./person_data/person.csv --inputResource=./sample_notes --outputResource=./output
```
1. If notes file is prepared in text format:

```
mvn clean install -DskipTests
java -jar ./target/deid-3.0.31-SNAPSHOT-dataflow.jar --deidConfigFile=./src/main/resources/deid_config_omop_genrep.yaml --annotatorConfigFile=./src/main/resources/annotator_config.yaml --inputType=text --phiFileName=./phi/phi_person_data_example.csv --personFile=./person_data/person.csv --inputResource=./sample_notes --outputResource=./output
```

2. If notes file is prepared in jsonl format:

```
mvn clean install -DskipTests
java -jar ./target/deid-3.0.31-SNAPSHOT-dataflow.jar --deidConfigFile=./src/main/resources/deid_config_omop_genrep.yaml --annotatorConfigFile=./src/main/resources/annotator_config.yaml --inputType=local --inputResource=./sample_notes_jsonl/notes.json --outputResource=./output --textIdFields="note_id" --textInputFields="note_text"
```

2. [Sample Input](#Sample-Input-Local)

Expand All @@ -94,7 +103,7 @@ TiDE can be used in various environments. Below are the prerequisites and instru

```java

java -jar /opt/deid/target/deid-3.0.21-SNAPSHOT-dataflow.jar --deidConfigFile=/workspaces/src/main/resources/--deidConfigFile=./src/main/resources/deid_config_omop_genrep.yaml --annotatorConfigFile=./src/main/resources/annotator_config.yaml --inputType=text --phiFileName=/workspaces/phi/phi_person_data_example.csv --personFile=/workspaces/person_data/person.csv --inputResource=/workspaces/sample_notes --outputResource=/workspaces/output
java -jar /opt/deid/target/deid-3.0.31-SNAPSHOT-dataflow.jar --deidConfigFile=./src/main/resources/deid_config_omop_genrep.yaml --annotatorConfigFile=./src/main/resources/annotator_config.yaml --inputType=text --phiFileName=/workspaces/phi/phi_person_data_example.csv --personFile=/workspaces/person_data/person.csv --inputResource=/workspaces/sample_notes --outputResource=/workspaces/output

```

Expand All @@ -119,7 +128,7 @@ TiDE can be used in various environments. Below are the prerequisites and instru


```java
java -jar -Xmx6g /opt/deid/target/deid-3.0.21-SNAPSHOT-dataflow.jar --deidConfigFile=./src/main/resources/deid_config_omop_genrep.yaml --annotatorConfigFile=./src/main/resources/annotator_config.yaml --inputType=gcp_gcs --inputResource=gs://<INPUT_BUCKET_NAME>/sample_notes_jsonl/notes.json --outputResource=gs://<OUTPUT_BUCKET_NAME> --gcpCredentialsKeyFile=<SERVICE_ACCOUNT_KEY_DOWNLOADED> --textIdFields="id" --textInputFields="note"
java -jar -Xmx6g /opt/deid/target/deid-3.0.31-SNAPSHOT-dataflow.jar --deidConfigFile=./src/main/resources/deid_config_omop_genrep.yaml --annotatorConfigFile=./src/main/resources/annotator_config.yaml --inputType=gcp_gcs --inputResource=gs://<INPUT_BUCKET_NAME>/sample_notes_jsonl/notes.json --outputResource=gs://<OUTPUT_BUCKET_NAME> --gcpCredentialsKeyFile=<SERVICE_ACCOUNT_KEY_DOWNLOADED> --textIdFields="id" --textInputFields="note"
```

4. [Sample Input](#Sample-Input-GCP)
Expand All @@ -128,14 +137,15 @@ TiDE can be used in various environments. Below are the prerequisites and instru
### Sample Input Local

Sample Notes:
Please refer to ([sample notes folder](sample_notes))
For inputType="text": [sample notes folder](sample_notes)
For inputType="local": [sample notes jsonl folder](sample_notes_jsonl)

Input Arguments:

1. inputResource (mandatory) e.g. inputResource=/workspaces/sample_notes
When used with
1. "inputType=text", this argument specifies location of the folder with notes to be deid in text format. All files in this folder will be processed.
2. "inputType="local", this argument specifies the file with notes to be deid in newline delimited JSON files (jsonl) format.
2. "inputType="local", this argument specifies the file with notes to be deid in newline delimited JSON files (jsonl) format.You can directly use jsonl that contains id and free text column as input if you only need to use NER or general patten matching. If you have known PHI associated with text, you need to have phi information embedded in the jsonl file.

### Sample Input GCP

Expand Down Expand Up @@ -487,7 +497,7 @@ deidJobs:
### Option two: Run DLP Native Job to find PHI

```
java -jar deid-3.0.21-SNAPSHOT.jar \
java -jar deid-3.0.31-SNAPSHOT.jar \
--gcpCredentialsKeyFile=<google_credential.json> \
--projectId=<google_project_id> \
--deidConfigFile=deid_config_omop_genrep.yaml \
Expand Down
5 changes: 3 additions & 2 deletions cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ steps:
entrypoint: mvn
args: ['-gs','maven/settings.xml','-Dmaven.test.skip=true','deploy']
env:
- 'REPO_USERNAME=$_REPO_USER'
- 'REPO_API_TOKEN=$_REPO_TOKEN'

# Docker Build
# Docker Build
- name: "gcr.io/cloud-builders/docker:20.10.14"
args: ["build", "-t", "${_ARTIFACT_REGISTRY_REPO}/tide:3.0.37${_SUFFIX}", "."]
args: ["build", "-t", "${_ARTIFACT_REGISTRY_REPO}/tide:3.0.30-SNAPSHOT", "."]

# Docker push to Google Artifact Registry
- name: "gcr.io/cloud-builders/docker:20.10.14"
Expand Down
70 changes: 70 additions & 0 deletions cloudbuild_release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Access the id_github file from Secret Manager, and setup SSH
steps:
- name: 'gcr.io/cloud-builders/git'
secretEnv: ['SSH_KEY']
entrypoint: 'bash'
args:
- -c
- |
echo "$$SSH_KEY" >> /root/.ssh/id_rsa
chmod 400 /root/.ssh/id_rsa
cp known_hosts.github /root/.ssh/known_hosts
volumes:
- name: 'ssh'
path: /root/.ssh

- name: maven:3.9.0-eclipse-temurin-17-focal
entrypoint: bash
env:
- 'REPO_USERNAME=$_REPO_USER'
- 'REPO_API_TOKEN=$_REPO_TOKEN'
- 'BUILD_ID=$BUILD_ID'
- 'BRANCH_NAME=$BRANCH_NAME'
- 'SHORT_SHA=$SHORT_SHA'
args:
- -c
- |
apt -y update && apt -y install openssh-client && \
git config --global user.name "cloudbuild" && \
git config --global user.email "cloudbuild@example.com" && \
git config --global init.defaultBranch main && \
git branch -m ${BRANCH_NAME} && \
mvn -B -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn \
-gs maven/settings.xml \
release:prepare \
-DbranchName=${BRANCH_NAME} \
-Dproject.build=${BUILD_ID} \
-Dproject.commit=${SHORT_SHA} \
-Dproject.branch=${BRANCH_NAME} \
-DskipTests \
-Darguments="-DskipTests -Dmaven.javadoc.skip=true" \
-DtagNameFormat="v@{project.version}" \
-DscmCommentPrefix="[maven-release-plugin][ci skip]" && \
mvn -B -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn \
-gs maven/settings.xml \
release:perform \
-DbranchName=${BRANCH_NAME} \
-Dproject.build=${BUILD_ID} \
-Dproject.commit=${SHORT_SHA} \
-Dproject.branch=${BRANCH_NAME} \
-DskipTests \
-Darguments="-DskipTests -Dmaven.javadoc.skip=true"
volumes:
- name: 'ssh'
path: /root/.ssh

# Docker Build
- name: "gcr.io/cloud-builders/docker:20.10.14"
args: ["build", "-t", "${_ARTIFACT_REGISTRY_REPO}/tide:3.0.30", "."]

# Docker push to Google Artifact Registry
- name: "gcr.io/cloud-builders/docker:20.10.14"
args: ["push", "-a", "${_ARTIFACT_REGISTRY_REPO}/tide"]

availableSecrets:
secretManager:
- versionName: projects/som-irt-scci-dev/secrets/tide-github-key/versions/latest
env: 'SSH_KEY'

logsBucket: 'gs://cloud-build-dev.starr-data.us'

99 changes: 73 additions & 26 deletions maven/settings.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,69 @@
<id>cloud-build</id>
<repositories>
<repository>
<id>starr-maven</id>
<url>artifactregistry://us-west1-maven.pkg.dev/som-rit-infrastructure-prod/starr-maven</url>
<id>central</id>
<name>Central Repository</name>
<url>https://repo.maven.apache.org/maven2</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>

<repository>
<id>io.cloudrepo.rit-public</id>
<url>https://susom.mycloudrepo.io/public/repositories/rit-public</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>

<repository>
<id>io.cloudrepo.rit-public-snapshot</id>
<url>https://susom.mycloudrepo.io/public/repositories/rit-public-snapshot</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>

<repository>
<id>io.cloudrepo.rit-ext-private</id>
<url>https://susom.mycloudrepo.io/repositories/rit-ext-private</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>

<repository>
<id>io.cloudrepo.rit-private</id>
<url>https://susom.mycloudrepo.io/repositories/rit-private</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>

<repository>
<id>io.cloudrepo.rit-private-snapshot</id>
<url>https://susom.mycloudrepo.io/repositories/rit-private-snapshot</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
<snapshotRepository>
<id>starr-maven-snapshot</id>
<url>artifactregistry://us-west1-maven.pkg.dev/som-rit-infrastructure-prod/starr-maven-snapshot</url>
</snapshotRepository>
</repositories>

<pluginRepositories>
<pluginRepository>
<releases>
<updatePolicy>never</updatePolicy>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>central</id>
<name>Central Repository</name>
<url>https://repo.maven.apache.org/maven2</url>
</pluginRepository>
</pluginRepositories>

</profile>
</profiles>

Expand All @@ -25,26 +80,18 @@

<servers>
<server>
<id>artifact-registry</id>
<configuration>
<httpConfiguration>
<get>
<usePreemptive>true</usePreemptive>
</get>
<head>
<usePreemptive>true</usePreemptive>
</head>
<put>
<params>
<property>
<name>http.protocol.expect-continue</name>
<value>false</value>
</property>
</params>
</put>
</httpConfiguration>
</configuration>
<username>_json_key_base64</username>
<id>io.cloudrepo.rit-ext-private</id>
<username>${env.REPO_USERNAME}</username>
<password>${env.REPO_API_TOKEN}</password>
</server>
<server>
<id>io.cloudrepo.rit-private</id>
<username>${env.REPO_USERNAME}</username>
<password>${env.REPO_API_TOKEN}</password>
</server>
<server>
<id>io.cloudrepo.rit-private-snapshot</id>
<username>${env.REPO_USERNAME}</username>
<password>${env.REPO_API_TOKEN}</password>
</server>
</servers>
Expand Down
Loading