SR
SR4w ago

Error Ducklake: Writing parquet file fails.

ExecutionErr: execution error: "HTTP Error: Unable to connect to URL http://localhost:36867/api/w/testsr2/s3_proxy/_default_/datalake/main/states2/ducklake-01997aa2-1513-75ee-a66e-ff5a6e3d9aa3.parquet?uploads=: Forbidden (HTTP code 403)" Metadata within Postgres (Neon) looks good, S3 Browser up- and download looks good. Using a stand alone DuckDB / Ducklake with same S3 and Postgres infrastructure works just fine, so I think, something is wrong with my Windmill configuration. Has anybody had similar problems? Does the s3_proxy translate "default"? Do I have to set special permissions? I have tried legacy permissions and new advanced permissions. Can somebody give me guidance on how to diagnose the issue further? I am stuck, therefore any help is appreciated. Thank you. I am running self hosted docker version v1.547 with EE image and license on a Windows machine.
No description
No description
No description
26 Replies
rubenf
rubenf4w ago
you're using azure storage ?
Diego
Diego4w ago
What does the backend logs says ? There should be more info on the 403 there
SR
SROP4w ago
DuckDB has issues with Azure, therefore we are using S3 on Cloudflare.
SR
SROP4w ago
Backend log = Worker log? There are obvious no errors. Do I have to set a log level? 2025-09-24T15:26:11.367005Z INFO windmill-common/src/ee.rs:1286: disk stats for "wk-default-58dc978a819b-f1SpF": "/" - 964.8 GB,"/tmp/windmill/logs" - 964.8 GB 2025-09-24T15:26:15.854059Z INFO windmill-worker/src/worker_utils.rs:84: ping update, memory: container=30MB, windmill=21MB worker=wk-default-58dc978a819b-f1SpF hostname=58dc978a819b 2025-09-24T15:26:21.910671Z INFO windmill-worker/src/worker_utils.rs:84: ping update, memory: container=31MB, windmill=21MB worker=wk-default-58dc978a819b-f1SpF hostname=58dc978a819b 2025-09-24T15:26:27.976075Z INFO windmill-worker/src/worker_utils.rs:84: ping update, memory: container=30MB, windmill=21MB worker=wk-default-58dc978a819b-f1SpF hostname=58dc978a819b 2025-09-24T15:26:34.039443Z INFO windmill-worker/src/worker_utils.rs:84: ping update, memory: container=30MB, windmill=21MB worker=wk-default-58dc978a819b-f1SpF hostname=58dc978a819b 2025-09-24T15:26:40.081071Z INFO windmill-worker/src/worker_utils.rs:84: ping update, memory: container=30MB, windmill=21MB worker=wk-default-58dc978a819b-f1SpF hostname=58dc978a819b 2025-09-24T15:26:41.369825Z INFO windmill-common/src/ee.rs:1286: disk stats for "wk-default-58dc978a819b-f1SpF": "/" - 964.8 GB,"/tmp/windmill/logs" - 964.8 GB 2025-09-24T15:26:46.154590Z INFO windmill-worker/src/worker_utils.rs:84: ping update, memory: container=30MB, windmill=21MB worker=wk-default-58dc978a819b-f1SpF hostname=58dc978a819b 2025-09-24T15:26:52.219735Z INFO windmill-worker/src/worker_utils.rs:84: ping update, memory: container=30MB, windmill=21MB worker=wk-default-58dc978a819b-f1SpF hostname=58dc978a819b 2025-09-24T15:26:58.288481Z INFO windmill-worker/src/worker_utils.rs:84: ping update, memory: container=30MB, windmill=21MB worker=wk-default-58dc978a819b-f1Sp
No description
Diego
Diego4w ago
No, I was indeed talking about backend logs The worker queries the S3 Proxy running on the backend (I apologize for the unhelpful error message, I will be looking for a way to show nicer errors from S3 Proxy when I have time) You can see those logs in Logs > Service Logs > Server if you have the right permissions
Diego
Diego4w ago
No description
SR
SROP4w ago
I only see errors in the worker log. There is nothing going on in the server log. To me it looks as if there is no S3_proxy running on the backend.
SR
SROP4w ago
If I am using the S3 Browser (which works just fine) I do see activity in the server log, but then it is "job_helpers" and not "s3_proxy": 2025-09-25T11:59:27.225510Z INFO request: windmill-audit/src/audit_ee.rs:139: kind="audit" operation="variables.decrypt_secret" action_kind=Execute resource="u/rupprecht/meticulous_s3" parameters=null workspace_id="testsr2" username="backend" email="backend" method=GET uri=/api/w/testsr2/job_helpers/load_file_metadata?file_key=datalake%2Fmain%2Fstates3%2Fducklake-0199808e-74d1-762b-b27b-9f327e87a952.parquet traceId="5600f946-2988-48a7-b520-7df47dd4a60e" username="rupprecht" username="rupprecht" email="rupprecht@fastec.de" email="rupprecht@fastec.de" workspace_id="testsr2" workspace_id="testsr2" 2
rubenf
rubenf4w ago
@Diego let's improve the error message propagation so we can root cause easier. I'm not even sure if the issue is what is returned by Neon or a firewall within their org that catch this particular endpoint.
Diego
Diego4w ago
Can you try running this bun script
export async function main(x: string) {
await wmill.writeS3File('s3:///datalake/main/states2/ducklake-019980aa-6391-7047-a139-e01c8fcdb657.parquet', 'hello world')
}
export async function main(x: string) {
await wmill.writeS3File('s3:///datalake/main/states2/ducklake-019980aa-6391-7047-a139-e01c8fcdb657.parquet', 'hello world')
}
Just to check if this errors out i will try to fix error messages tommorow
SR
SROP4w ago
No error.
No description
Diego
Diego4w ago
interesting this must have something to do with the token authentication of the s3 proxy then investigating Hello @SR , Sorry for the wait Can you try this in Bun :
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"

export async function main() {
const baseUrl = process.env["WM_BASE_URL"]
const s3 = new S3Client({
region: "us-west-1",
forcePathStyle: true,
endpoint: baseUrl + '/api/w/' + process.env["WM_WORKSPACE"] + '/s3_proxy',
credentials: {
accessKeyId: process.env["WM_TOKEN"].split('.')[0] + '.' + process.env["WM_TOKEN"].split('.')[1],
secretAccessKey: process.env["WM_TOKEN"].split('.')[2],
},
});
const command = new PutObjectCommand({
Bucket: "_default_",
Key: "helloworld.txt",
Body: "hello world",
});

try {
const response = await s3.send(command);
console.log("File uploaded successfully:", response);
} catch (err) {
console.error("RAW RESPONSE", (err as any).$response);
console.error("Error uploading:", err);
}
}
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"

export async function main() {
const baseUrl = process.env["WM_BASE_URL"]
const s3 = new S3Client({
region: "us-west-1",
forcePathStyle: true,
endpoint: baseUrl + '/api/w/' + process.env["WM_WORKSPACE"] + '/s3_proxy',
credentials: {
accessKeyId: process.env["WM_TOKEN"].split('.')[0] + '.' + process.env["WM_TOKEN"].split('.')[1],
secretAccessKey: process.env["WM_TOKEN"].split('.')[2],
},
});
const command = new PutObjectCommand({
Bucket: "_default_",
Key: "helloworld.txt",
Body: "hello world",
});

try {
const response = await s3.send(command);
console.log("File uploaded successfully:", response);
} catch (err) {
console.error("RAW RESPONSE", (err as any).$response);
console.error("Error uploading:", err);
}
}
This uses the same S3 proxy as the DuckDB executor, and will print the actual error If no error occurs then it pinpoints the bug to a signature mismatch Unfortunately it is impossible for me to implement better error messages in DuckDB directly because they only have predefined error messages and do not parse the XML error message : https://github.com/duckdb/duckdb-httpfs/blob/0989823e43554e8a00b31959a853e29ab9bd07f9/extension/httpfs/s3fs.cpp#L1147 However I did test on cargo run, Docker, and Cloud, and Ducklake is indeed working (read and write). I am curious to find out why it doesn't work for you because configuration seems fine
SR
SROP4w ago
No description
SR
SROP4w ago
I am running Rancher Desktop with dockerd (moby).
Diego
Diego4w ago
Did you set your BASE_URL ? (in instance settings)
SR
SROP4w ago
No description
Diego
Diego4w ago
base_url is wrong, http://localhost (port 80) is not accessible from the workers for example on cloud it's https://app.windmill.dev
SR
SROP4w ago
But I am running a self hosted local setup. I do not have a public base url. Am I supposed to setup ngrok or cloudflare tunnel so that two local container can talk to each other?
Diego
Diego4w ago
i will try to replicate it with Rancher Desktop will keep you in touch
SR
SROP4w ago
Thank you for your support. DuckLake is a great feature, I would really like to use it.
Diego
Diego4w ago
Hello @SR , Unfortunately could still not reproduce. Can you try this Python code
import os
import boto3
from botocore.exceptions import ClientError


def main():
base_url = os.environ["BASE_INTERNAL_URL"] or os.environ["WM_BASE_URL"]
token = os.environ["WM_TOKEN"]
workspace = os.environ["WM_WORKSPACE"]

# Split the token
token_parts = token.split(".")
access_key_id = f"{token_parts[0]}.{token_parts[1]}"
secret_access_key = token_parts[2]

# Create S3 client
s3 = boto3.client(
"s3",
region_name="us-west-1",
endpoint_url=f"{base_url}/api/w/{workspace}/s3_proxy",
aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key,
config=boto3.session.Config(s3={"addressing_style": "path"}),
)

try:
# Upload file
response = s3.put_object(
Bucket="_default_", Key="helloworld.txt", Body="hello world"
)
print("File uploaded successfully:", response)
except ClientError as err:
print("Error uploading:", err)
print("Response:", err.response)


if __name__ == "__main__":
main()
import os
import boto3
from botocore.exceptions import ClientError


def main():
base_url = os.environ["BASE_INTERNAL_URL"] or os.environ["WM_BASE_URL"]
token = os.environ["WM_TOKEN"]
workspace = os.environ["WM_WORKSPACE"]

# Split the token
token_parts = token.split(".")
access_key_id = f"{token_parts[0]}.{token_parts[1]}"
secret_access_key = token_parts[2]

# Create S3 client
s3 = boto3.client(
"s3",
region_name="us-west-1",
endpoint_url=f"{base_url}/api/w/{workspace}/s3_proxy",
aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key,
config=boto3.session.Config(s3={"addressing_style": "path"}),
)

try:
# Upload file
response = s3.put_object(
Bucket="_default_", Key="helloworld.txt", Body="hello world"
)
print("File uploaded successfully:", response)
except ClientError as err:
print("Error uploading:", err)
print("Response:", err.response)


if __name__ == "__main__":
main()
This should pinpoint it and doesn't rely on workers calling the backend
SR
SROP3w ago
Hi Diego, this week I was busy with some projects. I will return testing windmill next week and get back to you. Thank you.
Diego
Diego3w ago
For info the latest release will display the full S3 error in DuckDB
.darrida
.darrida3w ago
This was it for me. Thanks. I still had my base_url set as “:443”. When I changed it to my url DuckLake started working. (I specifically changed it in my docker-compose.yml file)
SR
SROP3w ago
Hi Diego, I have tested a Kubernetes Cluster and Docker Compose: The latest release (>1.555.0) is working fine with Ducklake. The base URL does not make any difference, even http://localhost is working. I suppose there have been some changes in the backend?
Diego
Diego3w ago
Hello, yes I changed the S3 endpoint resolution

Did you find this page helpful?