Broker randomly turns off

Dear community,
Within our system, we are running EMQX 5.0.15. We’ve been noticing that the Broker is often down, in the last 2 weeks, we already have two cases when the Broker was down and we had to restart it manually. Since the broker needs to always be up and running, we are wondering if there are any ways to ensure that the broker will always run?

Kind regards,
Preanger

Can you provide more log information?

Hello,

How do you start the broker?
We build deb and rpm packages that include systemd unit file that allow to start emqx as a service. It restarts the broker on failure: emqx/emqx.service at e1864775311e9dede78a1a03d2b5e7d789d4d1f1 · emqx/emqx · GitHub

@zhongwencool, thank you for your reply.
I noticed that the broker was down again this weekend. Here is the log that I can gather for the past day, see attached. Could it be something with the SSL?

emqx.log.txt.zip (921 Bytes)

Hi @dmif, thank you for your reply. Could you provide us more information on how to start emqx as a service?
As far as I can see I don’t see it in the documentation…

Regards

These logs in the log file should not cause the node to die. Are there any other abnormal crash logs. Or is there a file called erl_crash.dump.

Yes, here is the erl_crash.dump file:

erl_crash.zip (530.5 KB)

Hi @dmif, thank you for your reply. Could you provide us more information on how to start emqx as a service?

Amazon Linux 2 | EMQX 5.0 Documentation , see “start EMQX with systemctl” (using amazon linux as an example, but the procedure is similar in all distros using systemd as the init system)

Ah, I see that the solution seems to be specific for Linux only. Unfortunately that doesn’t work for us, since we have the broker running on a Windows machine. Is it possible to have the broker running as a service in a Windows machine?

Hello,

I see. No, unfortunately we don’t support this. Our windows builds are mostly targeted towards developers, so they could test client code with EMQX running in the local environment, rather than production systems.

Thu Mar  9 12:42:21 2023
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{bad_return,{{kernel,start,[normal,[]]},{'EXIT',{{badmatch,{error,{bad_config,{handler,{kernel,{handler_not_added,{file_error,"log/emqx.log.siz",eacces}}}}}}},[{kernel,start,2,[{file,"kernel.erl"},{line,38}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}}})
System version: Erlang/OTP 24 [erts-12.3.2.8] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [jit]
Taints: crypto,asn1rt_nif,dyntrace,jiffy
Atoms: 53635
Calling Thread: scheduler:1
=scheduler:1

the crash log is happend on Thu Mar 9 12:42:21 2023, and it’s show have not right to access log/emqx.* and it start failed.
I think it’s a start failure error, not the randomly closed one.
I also notice you only have one schedule. The hardware configuration is too low, call also make the operator system to shutdown emqx.
What is your machine configuration?

Dear @zhongwencool, thank you for your reply and excuse for my late reaction.
I am not sure I understand what you mean, could you elaborate more about low configuration and one schedule?
And what are the possible fixes to this problem?

Kind regards, Preanger

@zhongwencool pointed out this part of the crash log:

{file_error,"log/emqx.log.siz",eacces}

This reads as: EMQX failed to start because it couldn’t access some of the system files that it uses for normal operation. (<emqx dir>/log/emqx.log.siz is the file).
Also the broker didn’t turned off, it failed to start (see application_start_failure part of the error).
So this error is caused by wrong file permissions on the operating system. It could happen if you ran it as root (or administrator), and then restarted it as a regular user for example.

I also notice you only have one schedule. The hardware configuration is too low, call also make the operator system to shutdown emqx.

Schedulers usually refer to CPU cores. Having only one scheduler could mean that there is only on CPU core, this is usually quite too low-spec for EMQX in production environment.

I hope this helps.