MongoDB Benchmark by using YCSB
Recently I run a benchmark at MongoDB instances for a client, in this article I'm going to share the impact of differences between writeConcern, cacheSizeGB, architectures and versions based on the results of the benchmark session.
For this benchmark session the chosen tool was YCSB (Yahoo! Cloud Serving Benchmark), you may ask why? The main reason was because at the Mongodb.Local at Milan 2024, this tool was used by the vendor for benchmark purpose.
Once the tool was determined, I provisioned the binaries of YCSB in a server without workload, and executed a lot of "load" and "run" functions against the Replica Sets. Between one and other executions I changed the writeConcern, the cacheSizeGb of Replica Sets in architectures P (Singe node, only Primary), PSA(Primary Secondary and Arbiter) and PSS (Primary Secondary Secondary) on major versions 5.0, 6.0 and 7.0. (for the architecture P also in version 4.4).
So, let's share the results:
For analysis purpose it was considered only threads with 128 and 256, and records/operations of 15M.
Impact of the writeConcern:
All architectures:

Considering loads at any replica set architectures and versions:
 For average throughput setting the writeConcern at majority was 47% lower than writeConcern at 1.
 For average runtime setting the writeConcern at majority was 33% slower than writeConcern at 1.

Considering runs at any replica set architectures and versions:
 For average throughput setting the writeConcern at majority was 16 lower than writeConcern at 1.
 For average runtime setting the writeConcern at majority was 15% slower than writeConcern at 1.
Architecture Replica Set with a single node:

Considering loads at replica set with a single node and all versions:
 For average throughput setting the writeConcern at majority was 34% lower than writeConcern at 1.
 For average runtime setting the writeConcern at majority was 26% slower than writeConcern at 1.

Considering runs at, replica set with a single node and all versions:
 For average throughput setting the writeConcern at majority was 9% lower than writeConcern at 1.
 For average runtime setting the writeConcern at majority was 8% slower than writeConcern at 1.
Architecture Replica Set with PSA and PSS:

Considering loads at replica set with a single node and all versions:
 For average throughput setting the writeConcern at majority was 49% lower than writeConcern at 1.
 For average runtime setting the writeConcern at majority was 33% slower than writeConcern at 1.

Considering runs at, replica set with a single node and all versions:
 For average throughput setting the writeConcern at majority was 18% lower than writeConcern at 1.
 For average runtime setting the writeConcern at majority was 16% slower than writeConcern at 1.
Difference between versions:
Considering all replica set architectures with writeConcert at 1:
 4.4 vs 5.0:
 For loads:
 the version 5.0 is 2% lower than the version 4.4 for average of throughput.
 the version 5.0 is 2% slower than the version 4.4 for average of runtime.
 For runs:
 the version 5.0 is 9% lower than the version 4.4 for average of throughput.
 the version 5.0 is 10% slower than the version 4.4 for average of runtime.
 5.0 vs 6.0:
 For loads:
 the version 6.0 is equals to the version 5.0 for average of throughput.
 the version 6.0 is equals to the version 5.0 for average of runtime.
 For runs:
 the version 6.0 is 1% lower than the version 5.0 for average of throughput.
 the version 6.0 is 3% slower than the version 5.0 for average of runtime.
 6.0 vs 7.0:
 For loads:
 the version 7.0 is 2% lower than the version 6.0 for average of throughput.
 the version 7.0 is 2% slower than the version 6.0 for average of runtime.
 For runs:
 the version 7.0 is 4% higher than the version 6.0 for average of throughput.
 the version 7.0 is 6% faster than the version 6.0 for average of runtime.
Considering all replica set architectures with writeConcert at majority:
 4.4 vs 5.0:
 For loads:
 the version 5.0 is 2% lower than the version 4.4 for average of throughput.
 the version 5.0 is 3% slower than the version 4.4 for average of runtime.
 For runs:
 the version 5.0 is 5% lower than the version 4.4 for average of throughput.
 the version 5.0 is 5% slower than the version 4.4 for average of runtime.
 5.0 vs 6.0:
 For loads:
 the version 6.0 is 12% lower than the version 5.0 for average of throughput.
 the version 6.0 is 16% slower than the version 5.0 for average of runtime.
 For runs:
 the version 6.0 is 11% lower than the version 5.0 for average of throughput.
 the version 6.0 is 16% slower than the version 5.0 for average of runtime.
 6.0 vs 7.0:
 For loads:
 the version 7.0 is 5% higher than the version 6.0 for average of throughput.
 the version 7.0 is 6% faster than the version 6.0 for average of runtime.
 For runs:
 the version 7.0 is 7% higher than the version 6.0 for average of throughput.
 the version 7.0 is 10% faster than the version 6.0 for average of runtime.
Considering replica set architecture PSA and PSS:

5.0 vs 6.0:

For writeConcern majority:

For loads:
 For average of throughput the version 6.0 is 7% lower than the version 5.0 .
 For average of runtime the version 6.0 is 11% slower than the version 5.0.

For runs:
 For average of throughput the version 6.0 is 9% lower than the version 5.0 .
 For average of runtime the version 6.0 is 14% slower than the version 5.0.

For writeConcern 1:

For loads:
 For average of throughput the version 6.0 is 1% higher than the version 5.0.
 For average of runtime the version 6.0 1% faster than the version 5.0.

For runs:
 For average of throughput the version 6.0 is 5% lower than the version 5.0.
 For average of runtime the version 6.0 8% slower than the version 5.0.


6.0 vs 7.0:

For writeConcern majority:

For loads:
 For average of throughput the version 7.0 is 5% higher than the version 6.0 .
 For average of runtime the version 7.0 is 7% faster than the version 6.0.

For runs:
 For average of throughput the version 7.0 is 9% higher than the version 6.0 .
 For average of runtime the version 7.0 is 12% faster than the version 6.0.

For writeConcern 1:

For loads:
 For average of throughput the version 7.0 is 1% lower than the version 6.0 .
 For average of runtime the version 7.0 is 1% slower than the version 6.0.

For runs:
 For average of throughput the version 7.0 is 5% higher than the version 6.0 .
 For average of runtime the version 7.0 is 7% faster than the version 6.0.

[!NOTE] Notes: When the cacheSizeGB was raised to 20, there were no relevant differences between the versions 5.0, 6.0 and 7.0 in terms of runtime and throughput.
Impact of the cacheSizeGB:
Considering PSA and PSS architectures with any writeConcern and all versions:

1 vs 2:
 For loads:
 For average of throughput the cacheSizeGB 2 is 3% higher than cacheSizeGB 1.
 For average of runtime the cacheSizeGB 2 is 1% faster than cacheSizeGB 1.
 For runs:
 For average of throughput the cacheSizeGB 2 is 1% higher than cacheSizeGB 1.
 For average of runtime the cacheSizeGB 2 is 2% faster than cacheSizeGB 1.

2 vs 20:
 For loads:
 For average of throughput the cacheSizeGB 20 is 10% higher than cacheSizeGB 2.
 For average of runtime the cacheSizeGB 20 is 10% faster than cacheSizeGB 2.
 For runs:
 For average of throughput the cacheSizeGB 20 is 6% higher than cacheSizeGB 2.
 For average of runtime the cacheSizeGB 20 is 7% faster than cacheSizeGB 2.
Considering writeConcern majority:

1 vs 2:
 For loads:
 For average of throughput the cacheSizeGB 2 is 6% higher than cacheSizeGB 1.
 For average of runtime the cacheSizeGB 2 is 6% faster than cacheSizeGB 1.
 For runs:
 For average of throughput the cacheSizeGB 2 is 1% higher than cacheSizeGB 1.
 For average of runtime the cacheSizeGB 2 is 1% faster than cacheSizeGB 1.

2 vs 20:
 For loads:
 For average of throughput the cacheSizeGB 20 is 8% higher than cacheSizeGB 2.
 For average of runtime the cacheSizeGB 20 is 8% faster than cacheSizeGB 2.
 For runs:
 For average of throughput the cacheSizeGB 20 is 4% higher than cacheSizeGB 2.
 For average of runtime the cacheSizeGB 20 is 6% faster than cacheSizeGB 2.
Considering writeConcern 1:

1 vs 2:
 For loads:
 For average of throughput the cacheSizeGB 2 is 2% lower than cacheSizeGB 1.
 For average of runtime the cacheSizeGB 2 is 2% slower than cacheSizeGB 1.
 For runs:
 For average of throughput the cacheSizeGB 2 is 5% higher than cacheSizeGB 1.
 For average of runtime the cacheSizeGB 2 is 7% faster than cacheSizeGB 1.

2 vs 20:
 For loads:
 For average of throughput the cacheSizeGB 20 is 2% higher than cacheSizeGB 2.
 For average of runtime the cacheSizeGB 20 is 2% faster than cacheSizeGB 2.
 For runs:
 For average of throughput the cacheSizeGB 20 is 1% higher than cacheSizeGB 2.
 For average of runtime the cacheSizeGB 20 is 1% faster than cacheSizeGB 2.
[!NOTE] Difference between architectures PSA and PSS: The results were similiar in the Replica Set's architectures PSA and PSS
Final considerations:
The following considerations are based on this benchmark session, so may not match for another environments or applications.
When a Replica Set has only one member, setting the default writeConcern to 1 may improve by 34% the throughput of load operations in comparison with a default writeConcern majority, and 9% at run operations, so, pay attention to set an adequate default writeConcern for each architecture.
In PSA and PSS architectures with writeConcern majority, rising the cacheSizeGB from 2 to 20 may increase by 10% the throughput of load operations and 6% of run operations.
In PSA and PSS architectures with writeConcern majority, upgrading from the version 6.0 to 7.0 may increase by 5% the throughput of load operations and 9% of run operations.
In terms of cost efficiency, may be interesting upgrading from version 6.0 to 7.0 before rising the cacheSizeGB from 2 to 20. Then evalute the performance anche check if increasing cacheSizeGB is still necessary.