4

If more than 350 accounts are sent to this group-sorting algorithm, then it begins to just loose accounts. If 400 accounts are sent in, then only 399 accounts are added to hangoutGroups[][]

function commit(bytes32 POIuser) {
    uint groupNumber = uint(POIuser) / (uint(maxHash) / numGroups()) + 1;

    if(hangoutGroups[groupNumber].length >= groupSize) {
        for(uint i = 0; i < numUsers(); i++) {
            if(groupNumber - i >= 1) {
                if(hangoutGroups[groupNumber - i].length < groupSize) { groupNumber -= i; break; }
            }
            if(groupNumber + i <= numGroups()) {
                if(hangoutGroups[groupNumber + i].length < groupSize) { groupNumber += i; break; }
            }
        }
    } 

    hangoutGroups[groupNumber].push(POIuser);
}

If the loop code is changed slightly then it outputs 400

function commit(bytes32 POIuser) {
    uint groupNumber = uint(POIuser) / (uint(maxHash) / numGroups()) + 1;

    if(hangoutGroups[groupNumber].length >= groupSize) {
        for(uint i = 0; i < 2; i++) {
            if(groupNumber - i >= 1) {
                if(hangoutGroups[groupNumber - i].length < groupSize) { groupNumber -= i; break; }
            }
            if(groupNumber + i <= numGroups()) {
                if(hangoutGroups[groupNumber + i].length < groupSize) { groupNumber += i; break; }
            }
        }
    } 

    hangoutGroups[groupNumber].push(POIuser);
}

Then if the loop is incremented just one step, it looses an account again

function commit(bytes32 POIuser) {
    uint groupNumber = uint(POIuser) / (uint(maxHash) / numGroups()) + 1;

    if(hangoutGroups[groupNumber].length >= groupSize) {
        for(uint i = 0; i < 3; i++) {
            if(groupNumber - i >= 1) {
                if(hangoutGroups[groupNumber - i].length < groupSize) { groupNumber -= i; break; }
            }
            if(groupNumber + i <= numGroups()) {
                if(hangoutGroups[groupNumber + i].length < groupSize) { groupNumber += i; break; }
            }
        }
    } 

    hangoutGroups[groupNumber].push(POIuser);
}

This is the output for the loop with < numUsers(), 399 accounts

0, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 0

This is the output if the loops is limited to 2, 400 accounts

0, 6, 5, 1, 3, 4, 5, 5, 7, 5, 1, 4, 3, 5, 5, 5, 5, 5, 7, 6, 6, 7, 7, 7, 5, 2, 
5, 5, 4, 5, 5, 5, 3, 5, 5, 5, 3, 3, 5, 5, 4, 3, 5, 2, 5, 5, 7, 6, 5, 5, 1, 5,
7, 5, 4, 5, 6, 5, 6, 7, 8, 6, 5, 5, 4, 5, 5, 5, 8, 8, 8, 6, 7, 6, 5, 5, 4, 4,
5, 5, 4, 0

This is the output if the loop is incremented to 3, 399 accounts

0, 5, 5, 1, 3, 4, 5, 5, 5, 5, 3, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 5, 5, 4,
5, 5, 4, 5, 5, 5, 3, 5, 5, 5, 3, 3, 5, 5, 4, 3, 5, 2, 5, 5, 7, 6, 5, 5, 3, 5, 
5, 5, 5, 5, 5, 5, 6, 7, 8, 6, 5, 5, 5, 5, 6, 5, 6, 6, 9, 7, 7, 5, 5, 5, 5, 4,
5, 5, 4, 0

The accounts are generated by this code, and I've used hashes to simulate accounts

bytes32[] public fourHundredAddresses;


function generateFourHundredAddresses() { 
    for(uint i = 0; i < 400; i++) {
        fourHundredAddresses.push(sha3(i));
        numUsers++;
    }
}

function batchRegisterOne() {

    for(uint i = 0; i < fourHundredAddresses.length; i++) {
        commit(fourHundredAddresses[i]);
    }
}

The rest of the code

mapping(uint => bytes32[]) public hangoutGroups;

uint groupSize;
uint public numUsers;

    bytes32 maxHash = 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff;
    function numGroups() returns(uint){ return numUsers / groupSize;}

    function poi() {
        groupSize = 5;
    }

2 Answers2

1

Your problem is in if(groupNumber - i >= 1). You see, if i > groupNumber, then the uint will wrap around to 0xffff..., which is a huge number that is indeed >= 1. This huge number will also be saved in the mapping, no error here. But if you loop through number 1 to 100 in the mapping, you will not find this huge number.

The proper test is: if(groupNumber >= i + 1)


Apart from that, you are breaking a certain number of best practices:

  • avoid naming "address" something that is a bytes32 and not an address in Solidity.
  • you are looping a large number of times, that is a recipe for out-of-gas errors.
  • indeed your function generateFourHundredAddresses() fails unless you reduce looping to 152 times, with 4,700,000 gas.

Also you turn people away because you let us decipher and guess what your code is all about:

  • you say all these 0 ... 5 are an output, but there is no output to a function that returns such numbers. Oh, I guess, you meant the succession of groupNumber when we repetitively call commit. Oh, no, you meant that is the length of each hangoutGroup with their position being the index in the mapping. Phew!
  • you say there are 399 or 400 of such numbers, but I see only 82 for each of the 3 lists. Oh, right, you mean the sum of them, see previous point.
  • what is POIUser. Oh, I guess that's the same thing as "address" or as "account". Three different words for the same thing.
  • it looks like function poi() is the constructor. Hard to know without the contract name. Al the more so that best practice tells that the contract name should be capitalised.
1

Thanks @xavier-leprĂȘtre-b9lab. Replying here which is poor stack-exchange practice, but posted the question under a guest account, albeit using my name, had to re-register to reply and seem to not have enough reputation to comment.

As you say, the problem was indeed that it mapped to 0xFF...

I went over the problem with a few non-ethereum devs before I posted here, and neither could understand it. Is the mapping to 0xFF... unique to the EVM ? Could you point me in the right direction so I could read up on it ?

To answer this question for anyone who has the same problem, here is an example of how < 0 maps to the max hash, 1.1579 quindecillion.

enter image description here

Before I knew this, this problem boggled my mind, and I'm aware the question was poorly worded, but had no idea where to even begin to debug this. I'd managed to find the last account on what looked like hangoutGroups[-1] but I now understand that < 0 maps to 0xFF... Would appreciate a link to documentation where I can read up on why it does this.

The code is a group-sorting algorithm, which builds on http://github.com/d11e9/poi, and I've attempted to improve the distribution of accounts. It's deterministic, and commit() is called by each user, one transaction per commit, so gas costs for billions of accounts are not a problem there since each commit is a transaction.

I've also guessed that the loop shouldn't run too many times, since the distribution of accounts from g = n / (k /p) + 1 ought to be pretty good, so that the gas costs should not be a problem there, and this is why I tested the algorithm with mass-amounts of accounts. My tests went well for 200 accounts, then 300 accounts, then for 400 accounts I ran into this problem.

Massive thanks for your help in solving it, been trying to grasp what could be causing this for 4 days now. Would not have been able to solve it without your help. Now I'll continue to test it with 1000s of accounts.

Will be testing it on the Morden testnet later tonight with 2000 accounts, using this contract,

gist.github.com/anonymous/fdc34cf82468964d795220ef6935e647

If it scales to 2000 accounts when I batch-commit 50 accounts at a time, then I would guess that it scales to a quindecillion accounts if each account commits using an independent transaction. A second opinion there would be great.