tikazyq 5 years ago

Hi,

Thanks for the upvotes.

Crawlab is a golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Technically you can run any spider on it. It has both English and Chinese language support.

Github Repo: https://github.com/tikazyq/crawlab Demo: http://crawlab.cn/demo

Since its launch in March, Crawlab has received a lot of positive feedbacks, especially about the flexibility and appealing Web UI. And Crawlab is evolving fast, we have developed many features through continuous iterations.

dragonsh 5 years ago

A good project, how does it compare to python based scrapy.org.

For admin you can use scrapyhub open source or also another project https://github.com/Gerapy/Gerapy

  • chrischen 5 years ago

    Scrapy is a web crawler. This project is a web crawler management UI/platform, so it presumably manages your scrapy crawlers/instances and schedules them.

    • dragonsh 5 years ago

      So if I understand it correct Crawlab is another simple easy to use admin for managing web crawlers, one still needs to use scrapy or write their own crawlers. It should be similar to the admin tool I mentioned in my earlier comment and at:

      https://github.com/topics/scrapy-ui

      https://github.com/topics/scrapyd-ui

      • tikazyq 5 years ago

        There are a couple of crawler management projects: scrapydweb, spiderkeeper, gerapy, crawlab. The first three are based on scrapyd.

realty_geek 5 years ago

Great to see an awesome product - primarily in Chinese!!! That will teach me to not take English language domination for granted!!

  • pattusk 5 years ago

    Certainly interesting to see English's domination increasingly challenged on open source tech projects. However this makes contributing harder for non-Chinese speakers. I had a look at the git's issues page and all the discussion is in Chinese. Google translate can help, but I'm not sure it would be enough for some subtle problems. Also not sure how communication would go with PRs if part of the team is strictly sinophone.

    Great project nonetheless. Will likely give it a try. Keep up the excellent work!

    • VvR-Ox 5 years ago

      We'll have to get used to this and I think it's actually a good thing.

      Of course a lot of folks speak English but Chinese is also very important and will be more so in the world.

      • dvdkon 5 years ago

        I really appreciate having a "lingua franca" of programming. Projects in other languages are certainly interesting to see, but I also appreciate that most authors use English, it contributes to a larger worldwide community.

      • pictur 5 years ago

        do you really believe that?

  • tikazyq 5 years ago

    Thanks for the feedback. Actually I saw a lot of great Chinese projects on Github trending and sadly they are Chinese only. I would definitely agree they can do better by translating into English!

tikazyq 5 years ago

Thanks all for the upvoting and positive feedbacks for Crawlab. The reason why Crawlab is mainly focused on Chinese is because it was initially promoted in mainland China tech sites (Juejin, V2ex, etc). Due to the GFW we cannot access the info outside China, therefore it would be difficult for us to know the feedback from non-Chinese developers.

We definitely would be happy if more contributors can join Crawlab development, so we will be working on the improvement of multi-language support including English documentation, Code of conduct, Contributing.md and English communities. Our team is small (please check out the Contributors section) but from top companies in China and we would be happy to share knowledge between Chinese and non-Chinese developers.

Btw, what is the best tech community? (In China we have Wechat group)

atymic 5 years ago

Looks like a cool project, however I can't seem to get into the demo (it seems to indicate using admin/admin but that doesn't work).

Would be great to have an english language option on the demo login :)

  • tikazyq 5 years ago

    Thanks @atymic for the feedback. The initial password for admin is changed so that no harmful action would be done on the demo. Instead, you can still sign-up to checkout the demo.

    And we do have an English version but not on the Login page. Will definitely add into it.

captainmarble 5 years ago

Sounds like you're using redis as a message broker for tasks here. Are you using redis streams?

  • tikazyq 5 years ago

    No, we are using SubPub for message communication between nodes. For tasks, we are using hashed list. English documentation missing but we will add it later.

lidHanteyk 5 years ago

Cool stuff. Does it really run any language, or only languages that have had integrations written?

  • tikazyq 5 years ago

    Crawlab is based on shell execution, so basically anything that is runnable in shell, it can be run on Crawlab, i.e. any language.