Draft: control nonce space and timeouts for all all chip topologies#546
Draft: control nonce space and timeouts for all all chip topologies#546shufps merged 7 commits intoshufps:developfrom
Conversation
Replaces static setVrFrequency with computed Hash Counting Number (HCN) on register 0x10, based on bitaxeorg/ESP-Miner PR#420. Formula: HCN = (2^32 / next_pow2(cores) / next_pow2(asics)) * FREQ_MULT / freq * 0.5 Core counts: BM1366=112, BM1368=80, BM1370=128 NOTE: NerdQAxePlus previously used register 0x10 for Version Rolling Frequency (vrFreqToReg). This change replaces that with the HCN value. The interaction between VR frequency and HCN on register 0x10 needs review - they produce very different values for the same register.
|
Hi,
I experimentally verified that the 0x10 sets the version rolling frequency by logging nonce wrap around times and how fast the version counter advances. Not sure what Bitaxe thinks this register is for. And I don't see (yet) why changing the 0x10 might be necessary vor SV2. The only thing I found out what makes tweaking 0x10 necessary is when the ASIC clock is so high that the search nonce wraps around in the search space before the version counter is incremented, leading to duplicate shares. Increasing the VR-frequency fixes it then. I only observed that by ASIC frequencies of hmm around 1100MHz and higher |
|
Thanks for the clarification! That makes sense for V1/Extended Channel where new work arrives every ~500ms via extranonce_2 increment. The reason we opened this is for SV2 Standard Channel (from our SV2 PR #544): In Standard Channel, the pool provides the complete block header - the miner has no extranonce to increment. The miner must rely entirely on nonce (32-bit) + version rolling to find shares. New work only arrives when the pool sends a new job, which is typically every 30-60 seconds depending on pool template settings. I just verified this on hardware: the miner produces duplicate shares within seconds in Standard Channel mode. The current VR frequency setting causes the ASIC to exhaust its search space too quickly. So the question is: can we adjust register 0x10 to give the ASIC enough nonce + version rolling search space for 30-60 seconds of autonomous mining? That's also why we tagged @mutatrum and @adammwest - we're not ASIC firmware experts and wanted their input on the register 0x10 semantics. |
|
Some concrete numbers on why this matters for SV2 Standard Channel: Total search space with full nonce (2^32) + version rolling (0xFFFF = 2^16):
With pool template intervals of 30-60 seconds, only the NerdQAxe+ would be safe if the full search space is utilized. The NerdQAxe++ and NerdOCTAXE would additionally need ntime rolling to avoid exhausting the search space between templates. Current problem: The VR frequency at 25 kHz cycles through all 65536 versions in ~2.6 seconds, regardless of whether all nonces per version have been checked. That's why I'm seeing duplicate shares within seconds on hardware. The register 0x10 value needs to be tuned so that the version rolling rate matches the ASIC's actual nonce scanning speed - ensuring the full 2^48 search space is covered before wrapping around. And for devices above ~2.5 TH/s, ntime rolling support would be needed on top. |
IMHO no, I often read something about partitioning and how people believe it's working but I couldn't confirm any of that and I always saw it as some big misunderstanding. Partitioning works via the chip ID and some people think that distributing the chip ID evenly across the possible 127 IDs, like Bitaxe is doing it, makes it use more search space automagically but this is imho a misconception and I never saw anything that would confirm that it's like that. This is clearly visible when checking the chip ID of the nonce, it's always the ID that is set during the initialization and not some ID in between of two ASICs. And the 0x10 needs some balancing between ASIC frequency, version rolling frequency and job interval times. Just using 0x10 to try to give more time won't work and especially not in 1.5s+ range. |
|
@shufps fair points about partitioning - I'm not claiming to understand the ASIC internals better than you do. But the empirical evidence is clear: on the Bitaxe with the same BM1368/BM1370 ASICs, SV2 Standard Channel works with the register 0x10 change from Bitaxe ESP-Miner PR 420. Without it, duplicate shares within seconds. With it, stable mining. Same ASIC, same pool, same protocol. That PR was written by @adammwest who has deep knowledge of these ASICs. I'd really value his input here on what register 0x10 actually controls and why the HCN calculation makes Standard Channel work on Bitaxe hardware. If there's a fundamental difference in how NerdQAxePlus configures the ASICs vs Bitaxe that makes this approach not work here, I'd like to understand that too. Happy to do more testing on hardware to figure this out. |
The main difference is how the chip IDs are set - maybe I'm wrong and everything works completely different than I figured out lol ... Would really surprise me because the picture that I have mentally is very consistent with all I have learned within the last 2 years 😂 |
|
@shufps totally respect your 2 years of experience here! And the chip ID difference could well be a factor. But I can reproduce this directly on my Bitaxe (same BM1368 and 1370 ASIC):
Same device, same pool, same ASIC - just the register 0x10 value changes. So whatever the HCN calculation does to that register, it measurably extends the search space on the Bitaxe. Would be interesting to test the same register value on NerdQAxePlus and see if the chip ID difference matters or not. |
I guess the best test would be to just copy how they set the chip IDs. Would be a couple of changed lines. I guess any AI could do that in 5 minutes. But there really is nothing else I could imagine that could make a difference / be different. |
|
Good idea 👍 We'll run some tests - both with the Bitaxe-style chip ID setup and with the HCN register value - and report back with the results. |
|
There might be something in the middle. What surprised me a long time ago is that with Bitaxe PR 420, the nonce space exhausts in the same time no matter the frequency. TBF, this needs to be re-verified as it's from over a year ago, but that would mean Bitaxe also misses a component in the init somewhere. Maybe we both have a partial picture, that both work but is not the complete picture. |
- Chip ID distribution: 256 / chip_counter (instead of hardcoded 2/4) - m_addressInterval member for consistent chip addressing - Nonce-to-ASIC mapping: (bswap32(nonce) >> 17) / address_interval - chipIndexFromAddr uses address_interval (removed BM1370 override) - All per-chip CMD_WRITE_SINGLE use address_interval - Disabled checkVrFrequencyChanged (overwrites HCN on register 0x10)
|
@shufps here are our test results - you were right about the chip IDs being the key factor: Test Results (NerdQAxe++, 4x BM1370 @ 615MHz, SV2 Standard Channel)Test 1: HCN on register 0x10, original chip IDs (0,4,8,12)Result: All 4 ASICs hash at full speed (~5 TH/s). V1 shares accepted, no regression. But in Standard Channel: first few unique shares accepted, then duplicates within seconds. Each nonce reported 4x simultaneously - all 4 chips find the same nonce because they search the same nonce space. Test 2: Same as Test 1 + disabled
|
Calculates how long the ASIC needs to exhaust the full nonce+version search space. Used to determine when ntime needs to be incremented for Standard Channel on multi-chip boards.
|
Update on the ntime rolling from our last post: instead of a fixed 5 second interval we went with a dynamic approach. We ported Tested on both NerdQAxe++ and NerdOCTAXE-Gamma with SV2 Standard Channel, no duplicates: NerdQAxe++ rolls at ~46s (80% of ~57s), OCTAXE at ~25s (80% of ~31s). The counter resets on each new pool template so you mostly just see "#1" with 30s template intervals. This can't go into this PR because it depends on code from both this PR and our SV2 PR (#544). Will follow as a separate PR once both are merged, code is ready on our test branch. https://github.com/warioishere/ESP-Miner-NerdQAxePlus/tree/test/sv2-nonce-space-v2 VR frequency question still open. |
|
Hmm interesting ... so it seems the nonce space is really automagically evenly partitioned between the chip IDs - I always thought that wouldn't happen but it seems I was wrong all that time^^ But there is another problem, the dual pool scheduling relies on jobs being switched in short time like 500ms. Letting a job run for 30s or so somewhat breaks this. Btw do we need to support Standard? Pragmatic approach would be just to use Exended -> voila problem solved. wdyt? |
|
On the Dual Pool concern - the chip ID and HCN changes sit on the ASIC driver level but job switching is controlled higher up. V1 and Extended keep sending new work every 500ms regardless of HCN, Dual Pool not affected. Standard Channel has a About whether we need Standard Channels - actually yes, ideally we should support them. The SV2 spec designed Standard Channels for end-mining devices doing Header-Only Mining. They just get a ready Merkle Root and hash, no coinbase/extranonce handling needed. Extended Channels are actually meant for proxies, not end devices. When a miner opens an Extended Channel directly to a pool it's essentially doing the proxy's job - computing coinbase hashes, walking merkle paths, managing extranonce. Works fine but it's a workaround. The real use case for standard channels: someone running a JDC (Job Declarator Client) for template control. The JDC acts as alocal proxy, opens an Extended Channel upstream and feeds Standard Jobs to downstream miners. Our devices could connect to the JDC via Standard Channel and just hash headers - that's the intended SV2 architecture to also work with a own JD-Client in between to controll your own blocktemplate. The sv2 guys are already doing some great work hier to set this up quiet easily: https://github.com/stratum-mining/sv2-ui For connecting directly to a pool without proxy, Extended is the practical choice. But supporting both means the devices works best in both scenarios. And Dual Pool stays disabled for Standard Channel, no conflict there. |
Hmm interesting, thx for your explanation. But chances to get it accepted and merged might be a lot lower with Standard Channel support and all the required changes for it 😉 But it’s really weird. It’s like the Standard Channel was invented completely out of touch with reality. It's so weird that I actually wouldn't like to support it at all, who cares, Extended Channel is supported too and can do exactly what is needed. |
|
Just curious - what makes Standard Channel feel out of touch for you? The way we see it, Standard Channel + JDC is kind of the whole point. The miner just hashes headers, the JDC proxy does all the heavy lifting (template construction, extranonce management, merkle computation). The miner firmware stays dead simple and you get template control over your own Bitcoin node. That's the core SV2 decentralization use case. But hey, if Standard Channel is a blocker for merging we're happy to remove it from the SV2 PR and keep it Extended-only. We can always add it later. The nonce space changes in this PR are independent anyway. cc @GitGab19 @plebhash curious about your thoughts on Standard Channel for small miners / JDC setups |
Hmm, I think Standard Channel is perfectly fine for very slow devices like NerdMiners. But once you have to abuse It is more or less a coincidence that BM* chips seem to be able to extend their search space beyond their actual chip ID, so this still works in practice. My impression is that other ASICs may not be able to do that as easily, or at all. A well-thought-out protocol should have taken into account that the available search space can be exhausted very quickly on certain hardware. So my view is not that Standard Channel is useless — just that it may simply not be the best fit for BM* ASIC miners. 🤷 For that reason I would tend to just ignore it for now and only support Extended Channel what is exactly what we need here. It would also keep the required SV2 changes less invasive — leaving the core code untouched, avoiding potential regressions, and eliminating any weird special-casing based on which protocol is active. If there's ever a concrete need to support Standard Channel down the line, future-us can revisit it then. |
|
ntime rolling isn't really a hack – the ntime field is part of the block header, The SV2 spec explicitly mentions that miners may need to roll ntime when the Behind a JDC proxy, the search space issue mostly goes away anyway. The JDC So the miner gets fresh Standard Jobs often enough that ntime rolling rarely I actually don't know whats wrong about utilizing the full potential of the asics instead of just using endless job creations to compensate a wrongly configured asic. That is what I call a "hack". But yeah, Extended-only for now works for us. I see I cannot really convince you here. I jst want to push SV2 because it has exeptional advantages over SV1 and that its been to long we have been using an outdated protocol. I have tagged plebhash and gitgab19 the lead devs of SRI/SV2, they maybe explain it better then me that Standard Channels are not only usefull for Nerdminers. I will disable Standard Channels on the SV2 PR for the meantime. |
Nothing against the approach itself — but all the surrounding code was written around how the ASICs are currently configured, and just changing that has ripple effects. The best example is probably the dual pool scheduler, which is a deterministic scheduler built on a (short, ~500ms) fixed job interval. Changing that interval could lead to weird pool hashrate statistics or problems regulating pool difficulty. So it's not that the idea is wrong, it's just that the cost of changing it outweighs the benefit for now. It is what it is, sorry 🤷 |
|
Just to be clear - the chip ID and HCN changes don't touch the job interval at all. V1 and Extended still send new work every 500ms, the dual pool scheduler runs exactly as before. The only difference is that each chip searches a unique nonce partition instead of all chips searching the same space. No ripple effects on scheduling or pool stats. The dual pool incompatibility only applies to Standard Channel (which doesn't do 500ms job resends). We already block that combination in the UI - dual pool + standard channel is not selectable. Anyway, we've disabled Standard Channel on the SV2 PR (#544) for now - both UI hidden and backend forced to Extended. Can revisit later if needed. |
Yes, that's what I understood from what you wrote — thanks for clarifying anyway. 👍 My arguments were maybe a bit mixed up. The register 0x10 and chip-ID reluctance is probably more pragmatism than a hard technical blocker — it would require updating log output, nonce histogram and similar things (I think even hashrate register reading) that all build on current assumptions and changing it for BM1366, BM1368 and BM1370 because the FW is used on multiple devices. The main concern was really Standard Channel and dual pool compatibility, which you've already addressed by disabling it for now. So we're good. 🙂 |
|
@warioishere PR420 assumes that the address interval is As for the chip interval (as far as my understanding goes) https://github.com/shufps/ESP-Miner-NerdQAxePlus/blob/develop/components/bm1397/bm1370.cpp#L75 // set chip address
for (uint8_t i = 0; i < chip_counter; i++) {
setChipAddress(i * 4);
}but as there are 4 chips in the nerdQAxe++ (I assume chain length 4) so you can claim up to 4 out of 64 reserved so the nonce range becomes 2^32 / 64 = 2^26 (not in any particular order/bit representation just size) @shufps if you make the register smaller the roll over will be faster because the nonce space is smaller. if you only change freq the roll over time is the same, as the chip frequency changes the max size of 0x10.
This is an interesting observation, according to my understanding the frequency must be bounded as it changes the size of the nonce space proportional to freq, so with 4 chips and a high freq maybe you found an example the limit of the freq. It should have a upper bound. |
|
bitcoin/bips#2116
|
PR420 assumes address_interval = 256 / next_power_of_two(chip_counter) to reserve the minimum bits for nonce space partitioning. For our boards (4, 8 chips) the result is identical but this is correct for non-power-of-two chain lengths. Moved next_power_of_two to asic.h as static inline.
Even though @shufps already retracted this statement, I'll address it first: I'm not one of the original Sv2 spec authors, but I know that Standard Channels existed in the Sv2 spec since its original draft. There's a few arguments for the existence of Standard Channels in Sv2 spec:
Of course, there's always going to be tradeoffs in case an Extended Channel is being split into multiple Standard Channels somewhere along the mining stack. Nevertheless, the arguments above still hold in the general sense, even if weaker in such specific cases. About Version Rolling: Please note that while Please let me know whether this is not clear from the spec, because it should be. If some implementation skips version rolling on Standard Jobs (or doesn't do it to the full extent), then the search space will become smaller than it could have been, and share duplication will happen before job refresh or ntime is increased. I have the impression that this is where confusion arised, and I'd be happily open to feedback in case anyone thinks we can make this more explicitly clear in the Sv2 spec. About hard hashrate ceiling: Although we aim for Sv2 spec to be a canonical document that's "written in stone", it already had to undergo many adjustments over time. So it's not necessarily perfect as-is. The aspect that's admittedly still a bit unpolished is the hashrate threshold for Header-only Mining (HOM), because 280TH/s is likely going to become somewhat "obsolete" for industrial-scale mining in the near future. As @adammwest pointed out above, there's efforts to expand the number of rollable version bits, which should raise this threshold beyond 280TH/s and solve this problem:
The alternative approach to expand Standard Job search space is by rolling ntime (as in actual rolling, not just increasing it after 1s has elapsed). While theoretically possible (as in consensus valid), if applied at scale this approach could have unintended consequences on network difficulty adjustment and IMO should be discouraged in the community: stratum-mining/sv2-spec#187
Even though I understand why/how @warioishere arrived to this conclusion, I wouldn't necessarily frame Standard Channels as something that's only benefitial to small miners or JDC use-cases. The range of legitimate use-cases are broader, and could eventually bring real benefits to the industry (reduction in network-bandwidth and compute) if/when applied at scale. But yeah, there's a few moving parts with regards to Version Rolling and Sv2 spec polishing, which understandably cause confusion. |
Standard Channel is actually something I wanted to have for other things - like some solar powered light-weight LORA miner that shouldn't do any crypto on it's own. The aim always was to just send a header as work load and to just let it mine on that. There was one puzzle piece left I couldn't answer myself though, it's BM* related. How can I let it mine for longer than 1.5s until it wraps around in the search space. Sending new headers every 1.5s via LORA is a no-go (not fair use anymore) and sending entier mining.notify is too big (basically same problem). But the 0x10 register that Adam explained above might be the answer for that. It could make a single ASIC just mine for a couple of minutes when search space has been extended over multiple chip-ID bits in the nonce. Pleasae don't get me wrong, I don't say Standard Channel is useless - I just have the feeling it might not the best fit for this particular project but might be a game changer for others 😅 And thx a lot of taking your time explaining all of this! 🙌 |
|
The BM1370 can mine for several minutes without a new job, if you extend the full nonce range. The BM1366/BM1368 probably similar? Anything before that will not even get to a second. |
can't you increase ntime after 1s has elapsed? that should safely reset the search space, and it's one of the main assumptions behind the 280 TH/s ceiling calculation (sorry if this is a dumb or uninformed question, I haven't really parsed all the details in this discussion!) |
It's not a dumb question at all! I thought about too ... but ... Yes would work but that would be some kind of special case that needs different treatment in the code (rolling ntime instead of enonce2) and I'm not convinced that supporting Standard Channel really is worth the effort when Extended just would work 🙊 And I haven't had a deeper look if ntime provides enough to roll for an effective job switching time of ~500ms that would be required to work properly with the dual pool feature. |
small detail |
|
|
||
| void Asic::setNonceSpace(float frequency, uint16_t asic_count, uint16_t cores) { | ||
| int cores_up = next_power_of_two(cores); | ||
| int asic_count_up = next_power_of_two(asic_count); |
There was a problem hiding this comment.
This is the part that assumes the chip address interval
int asic_count_up = next_power_of_two(asic_count);would need to be
int asic_count_up = 256/address_interval;That would mean you can have any address interval (in theory)
There was a problem hiding this comment.
Good catch, fixed - now using 256/m_addressInterval instead of next_power_of_two(asic_count). Testing this now.
Separate observation: with the nonce space changes we see more hardware duplicate shares (~0.19% vs ~0.03% without). We compared the BM1370 init with Bitaxe early-access and tried removing register 0x68 (not present in Bitaxe) and the extra 0xA4 write after setNonceSpace - didn't help.
Any idea what could cause chips to find the same nonce+version more often with wider address intervals? Happens on both V1 and SV2 Extended and Standard so it's not protocol related.
There was a problem hiding this comment.
is the 0.19% real
we need thousands of shares to be certain of this,
but
for the gamma there was this possible theorised issue, supposedly the HCN is too big by 268
if you ran with 615Mhz and address_interval is 256/4 = 64
then
100-100* 268/(2^25 * 25/ 615 / 4 * 0.5) = 99.85
0.15 dups
which is close to 0.19%
it should scale worse when hcn_max shrinks
as we have HCN - 268/ HCN_MAX so 268/HCN_MAX = duplicates
if you can test (I dont have the NerdQ device )
address interval = 2
and hcn = hcn_max
and freq = 615
and hcn_max is made from `int asic_count_up = 256/address_interval;`
expected is
100-100* 268/(2^25 * 25/ 615 / 128 * 0.5) = 94.97
5% dups
In that case i need to update PR420 aswell
for the gamma case the solution would be to do
// HW errata of 134 per half clock cycle
int hcn = hcn_max-268;There was a problem hiding this comment.
The 0.19% is from two NerdQAxe++ devices running side by side for 2 days, both over 50k shares. One with the nonce space patch, one without. Pretty consistent numbers.
Your math lines up almost perfectly with what we see. We'll test with address_interval=2 and hcn=hcn_max at 615MHz to verify the 5% prediction. If that confirms it we'll add the -268 correction.
There was a problem hiding this comment.
correction: 0.19% was total rejects. Actual hardware duplicates are ~0.16% (0.19%-0.03% from the devices without the patch. Lines up even closer with your 0.15% calculation. Building the address_interval=2 test now.
There was a problem hiding this comment.
Results with address_interval=2, hcn=hcn_max, freq=615MHz, 11h runtime, ~11100 shares: ~1.87% duplicates (1.90% total minus 0.03% baseline from devices without the patch).
That's about a third of your predicted 5%. The errata offset might be smaller than 268, or it scales differently than expected.
There was a problem hiding this comment.
update: the duplicates come in bursts, not evenly distributed. Just jumped from 1.87% to 1.93% after a cluster. Still climbing slowly. Maybe the overlap only triggers under certain timing conditions, not on every nonce wrap.
There was a problem hiding this comment.
update: the duplicates come in bursts, not evenly distributed. Just jumped from 1.87% to 1.93% after a cluster. > Still climbing slowly. Maybe the overlap only triggers under certain timing conditions, not on every nonce wrap.
Thats expected for a HCN that is too big, these are 2 distinct types of duplicates wrap around when the space ends and restarts and (i call them internal dups) maybe overlapping range duplicates is a better name
but essentially the chip encodes info (core,chip) in some part of the nonce range, the HCN can overwrite this
what you end up with is a portion of the nonce range is overlapping, so you get solutions that appear very close together in time.
imagine a fictious scenario of a chip with 2 cores and a total nonce range of 256
with 128 spacing and we set 130 for the size per core.
| Core | Start | End | Range |
|---|---|---|---|
| Core 0 | 0 | 130 | 0 -> 130 |
| Core 1 | 128 | 256 | 128 -> 256 |
we cover 100% of the range but both cores are assinged the overlapping range 128->130 so we end up with some dups, that come back at the same time approximately.
Thank you for the test!
I will update PR420 I will use 268 to be safe.
I am still not sure if you understood my proposal, Dualpool mode would jst have been not available for Standard channels, when implementing this PR, it still works on Sv1 and Sv2 Extended. Both still use jobtime switching in 500ms interval. Jst standard doesnt use it, we could also had a tooltip explaining this topic to the user. |
As adammwest pointed out, the nonce space should be derived from the actual address_interval (256/interval) rather than next_power_of_two(asic_count). This correctly handles any chip address configuration.
why would we add something that needs disabeling another feature when it's actually not really needed? I guess we could discuss longer about this without ever coming to consensus 😅 No Standard Channel for now but I'll have a look at this PR and "fix" the nonce space issue (well actually adjust everything else that gets broken by changing the chip IDs - if it's not already been fixed by the PR ofc) But everything that was about "version rolling frequency" can be removed then. In the web UI too because it's not needed anymore then. It was there for adjusting the "frequency" so that on the QX there are no duplicates. But Adam seems to be right and everything I did was BS in this case 😅 |
Its anyway still work in progress and maybe a proof-of-concept to get the ASIC working as intented and to help understand the ASIC better. No need to hurry on anything here. As you said, and I am fine with that, extended channels is good for now. |
Why "instead"? Wouldn't it make sense to unconditionally bump nTime every second? It keeps the timestamp accurate (that's more of an OCD thing than a real requirement of course). |
134 per half clock cycle = 268 nonce overlap between adjacent cores. Without correction ~0.15% duplicate shares on 4-chip boards.
|
@Sjors good point - bumping ntime every second unconditionally makes sense. It's not aggressive rolling, just keeping the timestamp accurate. And it gives a natural job refresh point for all modes. @shufps this would also solve the dual pool concern for Standard Channel - every second the ntime increments, giving you a natural switching point between pools. No 500ms job resend needed, just pick the right pool on each ntime tick. Don't get me wrong, jst discussing :) I wont change anything on the SV2 PR anymore :) |
Remove all VR frequency infrastructure (vrFreqToReg, vrRegToFreq, setVrFrequency, calculateSearchSpaceMs, getDefaultVrFrequency, NVS storage, HTTP API, Web UI) since the HCN-based nonce space calculation in setNonceSpace() replaces it correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Removed everything with version rolling frequency^^ I'll check on the QX with >1100MHz if I see duplicates, if not I would merge this. edit: interesting, I don't seem to get duplicates at a job time of 10s on an Octaxe and I can confirm that the nonces use the other previously unused chip ID bits in the nonce too, so it really seems the change has extended the search space per chip. btw, double clicking on the danger zone button makes other edit fields appear like the job interval time (and previously version rolling frequency thing but it was removed now)^^ edit2: Nonce evaluation says all bits in the nonce are now used during mining - this is really nice, love that! 🥰 |
|
Before merging - the OCTAXE search space is ~31s at 9 TH/s, ~28s overclocked to 10TH/sec. At your 10s job interval that works, but overclocked devices could still hit duplicates if the job interval exceeds the search space. This affects all modes, not just Standard Channel. As Sjors suggested we could bump ntime every second unconditionally. That would:
How do you want to proceed - add ntime rolling to this PR and remove job switching, or merge as-is and handle it separately? |
The 10s were just a test to confirm that we use more bits of the nonce than before and we don't generate duplicates.
no, this PR is only about fixing the nonce search space. The other is only for SV2 Extended for now. ntime rolling, we will see. |
|
nice, I didn't see any duplicates on the QX with >=1100MHz. Have to test the NQ and NerdAxe too because they use BM1368 and BM1366. But this looks really nice 🥰 edit: NQ+ works too ✔️ |
|
I guess I'll just merge this and the other one and release a new beta^^ |

Summary
Port of Bitaxe ESP-Miner PR 420 - dynamic Hash Counting Number (HCN) calculation for BM1366/BM1368/BM1370 ASICs.
Required for SV2 Standard Channel support where no extranonce is available and the ASIC must search the full nonce space.
Changes
setNonceSpace(frequency, asic_count, cores)to Asic base classgetCoreCount()to each ASIC subclass (BM1366=112, BM1368=80, BM1370=128)setVrFrequency(vrFrequency)withsetNonceSpace()in eachinit()HCN = (2^32 / next_pow2(cores) / next_pow2(asics)) * FREQ_MULT / freq * 0.5Open question: Register 0x10 conflict
NerdQAxePlus uses register 0x10 for Version Rolling Frequency (
vrFreqToReg), while ESP-Miner uses it for Hash Counting Number. These produce very different values:VR_REG_PER_HZ / 25000(2^32/128/4) * 25/615 * 0.5Need review: Can register 0x10 serve both purposes? Does the HCN calculation implicitly set the correct VR timing? Or do we need both writes?
cc @shufps @mutatrum @adammwest for review of the register 0x10 semantics.
Test plan