Cold DM’ed the founder Aayush Gupta, with TraceZero project and attached my github, linkedin and twitter in the email.
Later i see notification of he followed me on twitter, but he didn’t, it means he followed and quickly unfollowed me, later Dm’ed him on twitter.
he replied in email, saying only talk via email and he given two assignments
Clone the provekit repo, go through the README and run the full prepare, prove, and verify flow for the complete age-check Noir example circuit. The verify program should exit without error.
Congrats — now you're ready for the task!
Part 1:
Profile the runtime of the prove command and report your measurements in as much detail as you see fit.
If you go through the project README, you'll find several ways to profile the program and generate flamegraphs and related outputs. I also recommend using a Linux machine, since the perf command is very handy. Also, make sure to run the profiler on an actual machine, not a VM, since VM results are not representative for our use case.
Your report should include at least the following:
- A quick check: by how much would proving time decrease if field multiplication took half as long as it does now? Would the improvement be minor or significant, and why?
- The major areas where the prover spends its time (for example, commitment phase, sumcheck, field operations, etc.)
- The most fundamental operations in the overall proving flow
- Any other relevant data. Charts and images are welcome as well.
Part 2:
By now, you know what the full flow looks like. We now want to use that so if we were to use GPU acceleration, we'd know which areas we should target.
For this task we encourage using AI agents to help you move faster!
- Based on your findings from task one, prototype a webGPU integration with the existing wasm demo to get at least 6% improvement on the complete age check prove time.
- If working with wasm turns out to be too nuanced, you can do native GPU integration using Metal for apple chips and/or CUDA for Nvidia based chips.
For both tasks, please report the number of hours taken for each (a Dayflow screenshot is ideal), as well as note what was done with and without AI assistance and in what capacity. We strongly recommend inclduing a PROMPTS.md with your submission, listing all the prompts you sent to any agents. We are looking for a submission by the end of the week at the latest, let us know ASAP if that does not work for you.
Did both, succeeded in first but sucked in second one, Sina is the guy from paris who was reviewing my code gave this feedback
Thanks for submitting the assignment. I like how you’ve organized task 1, and I also agree with your results. However, Task 2 is incomplete and I have shared my feedback below.
Regarding next steps, please schedule a call via this [link](https://calendly.com/sina-zk/30min) and let us know by email!
Feedback on task 2:
**Design:**
- I like how you approached the problem, and you chose the right step to focus on.
- The benchmarking UI is also useful—it shows, for example, from which input sizes it becomes worthwhile to use the GPU.
- However, you overlooked what the task explicitly asked for "..6% improvement on the **complete age check prove time"** and designed and implemented an isolated NTT module (notes on implementation below)
Essentially you implemented an isolated NTT module and benchmarked that. This does not reflect the same communication pattern as the NTT when it is actually called within the protocol. In the end, managing this the CPU ↔ GPU communication will be the real challenge.
**Implementation:**
– I see that you implemented two shaders: `bn254_field.wgsl` and `ntt_butterfly.wgsl`. If you look at `ntt_butterfly.wgsl`, you’ll notice that it essentially duplicates all the code from `bn254_field.wgsl`. This makes `bn254_field.wgsl` redundant, and it is also unused in the project.
– Even `ntt_butterfly.wgsl` itself is unused and redundant, since all of its contents have been copied into `webgpu-ntt.mjs`, which is the only file actually used in the project.
So, in total, there are three files containing almost identical code, and two of them are not used at all.
– The NTT is implemented as an isolated module and is not integrated into the `prove` command. Therefore, the benchmarks do not accurately reflect what the task asked for.
So task two is incomplete. I would like to see it integrated in the prove flow and see the speedup there and discuss it in the meeting!
The mistake was happened due to vibe coding, rectified the mistake - https://github.com/MdSadiqMd/provekit/pull/1
Later had an interview with Sina, he introduced in the first, asked where i’m from, talked a bit and later asked me were you able to run the code, i said yes, and he told me to run and it not ran, i was shocked, later he asked questions like
- what is NTT (i told the answer)?
- what are the inputs to the NTT (i fumbled and not able to tell the answer)
later he asked me to explain the code, i did, he looks satisfied, but unsure, later he asked me about projects, i’ve shown him TraceZero and Multi-Party-Computation, also told him where i got the inspiration(umbra), i joked about how there are not many resources about Zk and how my code is not running, later i asked him did he know any good projects in ZKML, as i didn’t find any, he told one, i found mistakes in it and he’s impressed on my thinking, i also told him how i wanna integrate nullifiers that i’ve herd from aayush gupta’s talk in defcon, and how i suck at math. at last i fumbled at first, but okish at last, basically i shine when he asked about frontend
Here is why the code failed, mailed them later
Hey Sina,
I had a great time chatting with you today
I figured out why my code didn’t run during the interview, it was due to an update in Whir Cargo
Full rant:
- Here’s the compatible version I was using in my PR: https://github.com/WizardOfMenlo/whir/commit/20aecf708c34b91f652c851f0722177bf08315af
- And here’s the breaking change: https://github.com/WizardOfMenlo/whir/commit/790bdf02cc469700fcb227aae48c8fbc25ca9f49, which modified the Reed-Solomon trait API. The newer version introduced the masked Reed-Solomon API with next_order and evaluation_points methods, while the older version had a simpler single-method trait that your GPU NTT implementation expects
- This is where ProveKit was updated to an unstable version of Whir with respect to my application: https://github.com/worldfnd/ProveKit/commit/118a3a1
And unfortunately, I pulled the latest code right before the interview and built it during the interview, (bad luck at it's peaks tho)
The solution is to switch the Whir version in cargo.toml to "20aecf708c34b91f652c851f0722177bf08315af", which would have solved the issue
An alternative solution would have been not pulling the latest code right before the interview - learnt it the hard way ig
It was quite a rant, but I learned a lot
Thanks for this tho