Parallel exection using OpenMP takes longer than serial execution c++, am i calculating execution time in the right way?

OpenMP internally implement multithreading for parallel processing and multi threading's performance can be measured with large volume of data. With very small volume of data you cannot measure the performance of multithreaded application. The reasons:-

a) To create a thread O/S need to allocate memory to each thread which take time (even though it is tiny bit.)

b) When you create multi threads it needs context switching which also take time.

c) Need to release memory allocated to threads which also take time.

d) It depends on number of processors and total memory (RAM) in your machine

So when you try with small operation with multi threads it's performance will be as same as a single thread (O/S by default assign one thread to every process which is call main thread). So your outcome is perfect in this case. To measure the performance of multithread architecture use large amount of data with complex operation then only you can see the differences.

Because of your critical block you cannot sum sum in parallel. Everytime one thread reaches the critical section all other threads have to wait.

The smart approach would be to create a temporary copy of sum for each thread that can be summed without synchronization and afterwards to sum the results from the different threads. Openmp can do this automatically for with the reduction clause. So your loop will be changed to.

#pragma omp parallel for reduction(+:sum)
for (i = 0; i < num_steps; i++)
    x = (i + 0.5)*step;
    sum += 4.0 / (1.0 + x * x);

On my machine this performs 10 times faster than the version using the critical block (I also increased num_steps to reduce the influence of one-time actions like thread-creation).

PS: I recommend you you to use <chrono>, <boost/timer/timer.hpp> or google benchmark for timing your code.