Announcement

Collapse
No announcement yet.

Request to all PureBasic afficionados

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Request to all PureBasic afficionados

    Request to all PureBasic aficionados on the forum: Recently stumbled across the following PB code which uses Regex to identify whether or not a string constitutes a valid URL:
    Code:
    ; Source: [URL]https://www.purebasic.fr/english/viewtopic.php?f=12&t=44359[/URL]
    
    ; Validates URLS
    ; --------------
    ; Must include a scheme such as http:// or ftp://
    ; Support for port numbers and numeric IPs
    ; 
    ; Returns bool (#True or #False)
    ; -----------------------------------------------
    
    Procedure.b ValidURL(url.s)
    
    regex.i
    pattern.s = "^([a-z0-9]+://)(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-][email protected])?(([0-9]{1,3}\.){3}[0-9]{1,3}|([0-9a-z_!~*'()-]+\.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.[a-z]{2,6})(:[0-9]{1,4})?((/?)|(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$"
        
    If CreateRegularExpression(regex, pattern)
    
        If MatchRegularExpression(regex, url)
            FreeRegularExpression(regex)
            ProcedureReturn #True
        EndIf
    
    EndIf
        
    FreeRegularExpression(regex)
    ProcedureReturn #False
    
    EndProcedure
    Would any of you PB-literate intelligensia be prepared to make this into a DLL which can return a list of all valid URLs from within a given string/table? Seems like a good regex which I'd hate to otherwise see go to waste.

    Or perhaps at least, give me some instruction on how to do it, myself? I do know how to make a very basic 'Hello World' DLL with Pure Basic, but don't yet understand how to translate the above code into something that would return a list of URLs from any given string/table.

    Have been blundering around in the PB Help file for hours but my grasp of PB is still so completely retarded at the moment, that it's just turning into an exercise in frustration.



  • #2
    I'll rewrite the code so that you can create a DLL. Firstly I have some comments about the code:

    The comments claim that the procedure returns a bool, however a boolean value in most cases is simply an int where 1 = true and 0 = false. It's not really efficient to return a byte (this procedure now returns a .b, a byte) as the register the result is returned in is the size of the native datatype of the processor anyhow. A byte is 8 bits, however the result register on 32-bit is 32-bits and 64 bits on x64. Funny enough, even though PureBasic has #True and #False, PureBasic has no boolean type. #True is merely an integer constant for 1, and #False for 0

    Aside from that, all you have to do is change the 'Procedure' keyword to 'ProcedureDLL'. That tells the PureBasic compiler to export that symbol. Then, in Compiler -> Compiler Options, you change the Executable Format to 'Shared Dll' and you compile it to a DLL file. Nothing more to it!

    I would also still suggest making the pattern a constant, it now allocates memory for the string on each call to the procedure because it is placed into a variable which is a bit overkill. A constant would handle the string as a literal. I took the liberty to rewrite the procedure.

    I had to put the code on PasteBin, as the forum is blocking the code for some reason;


    Test it well, as I have not been able to test it. AMS is crashing more often each day that passes over here.
    Bas Groothedde
    Imagine Programming :: Blog

    AMS8 Plugins
    IMXLH Compiler

    Comment


    • #3
      I must add, AMS parses the parameters to the DLL in Ascii format, however later versions of PureBasic only support Unicode. A conversion is probably required before you try matching against the URL. If you want to prevent this, use an older version of PureBasic (like 5.46). I noticed this upon testing.

      *Edit* I added an example with MemoryEx that handles the Ascii to Unicode conversion for you. I would still suggest you use the Lua string pattern functions for this instead though. PureBasic is extremely powerful, however the Lua 5.1 pattern library can achieve the same and it would take one step of data processing less. The conversion to unicode wouldn't be required. I would suggest to use PureBasic if you use the older version, where the conversion is not required.
      Attached Files
      Bas Groothedde
      Imagine Programming :: Blog

      AMS8 Plugins
      IMXLH Compiler

      Comment


      • #4
        Cheers, IP. This is very instructive and insightful. Give me a cuppla-dayzz to fully digest everything you've outlined here; including your code-rewrite on PasteBin (muchas gracias, for that). And I shall post my progress.

        Actually, in the days since starting this thread, I've had my head buried in PureBasic again. And am now actually starting to become a little less PB-retarded. Just building basic PB 'vocabulary-skills' makes things a ton easier. And am now actually starting to have some fun with it.

        Noted your advice about 'perhaps sticking with Lua string pattern functions' for this task. On this point: some months ago, I started a thread here which evolved into a pretty decent solution for using str:match on URLs via this pattern:
        Code:
        https?://(([%w_.~!*:@&+$/?%%#-]-)(%w[-.%w]*%.)(%w%w%w?%w?)(:?)(%d*)(/?)([%w_.~!*:@&+$/?%%#=-]*))
        ... And although not perfect, still seems pretty robust. However Lua's incompatibility with actual Regex solutions started gnawing at me (like a splinter in the mind - as Morpheus would say). Which eventually led me to this article: In Search of the Perfect URL Validation Regex.

        Which in turn, tempted me to get a hold of RegexBuddy to start analyzing these suckers in detail. To me, logic seemed to suggest that a really robust regex for URL-matching which employed either PB or Dev-C++ to do the heavy lifting, might prove to be a 'superior' solution to Lua pattern-matching. But from what you're saying, I'm now gathering that this might NOT necessarily be the case? Because of the additional data-processing requirement? Is that correct?

        Regardless, I'd still like to have a ***** at this. I have on file, some old DLL-building tutorials from both Riz (in PB) and Reteset (in Dev C++). And with what you've outlined here (together with your code-rewrite), I think I can work out what to do now. Is actually kind of exciting - like a shiny new toy!

        I have the 'older' versions of PB on disc still, so if I manage to succeed in compiling something that actually seems to work, would u check my homework for me, sensei? LOL., in the meantime, I'll be like one of those ass-kissing kids who offer their teacher an apple.
        Here ya go, teach:

        Comment


        • #5
          Nice to hear! I think it's always very cool how a very old compiler and IDE like PureBasic is still being used by so many people. It even is regularly updated with new features. You get all that for paying for it once, those guys are amazing; it's actually their hobby, not their day job - that makes it even more impressive to me.

          Yes I remember seeing that thread back then. The funny part about Lua patterns is that the patterns are actually applied to strings on the C-side of Lua. Their pattern engine was meant to be small and light weight and is optimised for Lua strings, we should do a speed comparison of the two - I don't actually know which one will be faster in this scenario. I do believe that, considering we have to convert the Ascii text to Unicode, the overhead comes from the memory actions and not the conversion of the encoding; allocating memory from Lua in AMS through any plugin means you have to call one or more functions from Lua. When you use an older version of PB which supports the Ascii compilation mode, you don't need this and I actually think it might be faster than the Lua string pattern library.

          I would also like to suggest that you use Visual Studio Express for your DLL compilations if you're using C++. The MSVC compiler has had many overhauls the past two years and it produces great optimised code for Windows systems these days. Dev-C++ uses Mingw, which in my experience, still has a tonne of issues at this time. It is still a great compiler though.

          Anyhow, you can always throw your questions my way. Funny enough, aside from being a software engineer, I am also part time teacher in the Netherlands - I don't mind to teach. That has always been the purpose of my freeware, open source stuff and contributions on forums and blogs.

          Thanks for the apple, have a good one! Here's a banana:
          Bas Groothedde
          Imagine Programming :: Blog

          AMS8 Plugins
          IMXLH Compiler

          Comment


          • #6
            Originally posted by Imagine Programming View Post
            ...Funny enough, aside from being a software engineer, I am also part time teacher in the Netherlands - I don't mind to teach. That has always been the purpose of my freeware, open source stuff and contributions on forums and blogs.

            Thanks for the apple, have a good one! Here's a banana:
            To hear that you're using your powers to do good works, does not surprise me in the least, IP. Good for you, buddy.

            PS.
            Still working thru the PB stuff. Cuppla dayzz might actually be cuppla weekzz (sheepish grin).
            I'm a master procrastinator.

            Comment


            • #7
              Originally posted by BioHazard View Post
              To hear that you're using your powers to do good works, does not surprise me in the least, IP. Good for you, buddy.

              PS.
              Still working thru the PB stuff. Cuppla dayzz might actually be cuppla weekzz (sheepish grin).
              I'm a master procrastinator.
              Let me know if you need help!
              Bas Groothedde
              Imagine Programming :: Blog

              AMS8 Plugins
              IMXLH Compiler

              Comment


              • #8
                Well now, IP - I dare say it's probably already been cuppla weekzz by now (by at least cuppla dayzz!)
                So, on with it!

                And LOL, I think someone should do an emoji for 'Con the Fruiterer'.
                Bewdiful!


                Let me preface this post by first saying that (despite your kind and gentle tutelage) while interfacing with PureBasic for this little endeavor, I've been feeling an awful lot like Homer here:

                That being said, I think I might actually have this figured out! Without a doubt have probably missed the obvious in about a hundred different ways, but did actually manage to successfully compile the DLL and get AMS to return the boolean result correctly.

                Initially, went at this by trying to get my head around your URL Check.apz. But that MemoryEx plugin of yours, sends me into a cold sweat. No offence but don't even TRY to teach me how to use that one, IP. Else you'll likely end up having to commit me to a mental-health ward somewhere.

                So, ended up going back to square-one, staring at your DLL code-rewrite on Pastebin until things started making sense. Compiled the DLL using v5.30 of PureBasic and managed to get AMS to return the boolean value. So I 'think' I've done it correctly. Have attached the PB source-files below (check my homework if u like, teach?). Although, it seems to be working A-OK, at this stage, I'm now concluding that the regex itself is actually NOT all that crash-hot. And am currently in search of a better one. (Thus far, am actually finding this regex a little better):
                Code:
                ^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$
                ..although it too, fails miserably on complex/unusual URL strings. Still though, should be able to synthesize some kind of an an improvement with all the regex-specialty tools which have 'mysteriously' come into my possession, recently.

                Do have a cuppla questions for you, though:
                • Just to clarify, when declaring a PB variable as "Protected" (as illustrated in your code-rewrite), this is just the same as when we assign a "local" tag to a variable in AMS? Correct?
                • When calling a DLL from AMS, I'm still not really clear on how to best determine the appropriate CallConvention. (ie. CDECL vs. STDCALL). At this point, I just use what 'seems to work'.
                And lastly,
                • Does PB demonstrate a 'preference' or 'bias' in regards to the language for which the regex in question was originally developed. (ie. Will PB interpret a regex written for C++, more effectively than say, one wriiten for C#?). Just to clarify, have noted that the PureBaic Regex Library uses PCRE (Perl Compatible Regular Expressions). And I'm aware that http://www.pcre.org/pcre.txt hosts a complete list of supported patterns/arguments. But the info contained therein, is WAY over my head.

                ............................

                PS.
                Hey, have you ever taken a look at the Lua & PureBasic categories over on RosettaCode?
                Is actually a Programming Chrestomathy site with 695 languages! And is pretty razzle-dazzle awesome!
                The Lua (pronounced LOO-ah) programming language is a lightweight, reflective, imperative and procedural language, designed as a scripting language with extensible...

                PureBasic is a high-level programming language by Fantaisie Software based on BASIC rules. PureBasic has been created for beginners and experts alike. PureBasic...

                Bewdiful!
                Attached Files

                Comment


                • #9
                  Hi there Bio,

                  Don't worry, good development takes time! It's fun to see people using PureBasic in the AMS culture again!

                  Don't be afraid of MemoryEx, it's actually not that complicated, especially not the library function set;

                  Code:
                  ValidURL = Library.Load("AutoPlay\\Docs\\URLCheck.dll");
                  Load a DLL file, meaning you'll open it until you close it. AMS' DLL.CallFunction closes a DLL right after the call, meaning no state can be stored in its memory. This is why the Library functions were added to MemoryEx.

                  And here's the other code with a bit of comment annotation
                  Code:
                  local url = Input.GetText(this);
                  
                  -- Reserve a bit of memory, at least twice the length of the URL (unicode uses 2 bytes per character instead of 1 in ASCII)
                  local buf = MemoryEx.AllocateEx(url:len() * 2 + 2);
                  if(not buf)then
                      error("out of memory");
                  end
                  
                  -- Write the URL string to the buffer, -1 indicates that we want to write the full length of the string and MEMEX_UNICODE means that we want to convert it to unicode during writing
                  buf:String(-1, MEMEX_UNICODE, url);
                  
                  -- Call the ValidURL function from our dll, buf:GetPointer gets the memory address of the buffer that contains our unicode URL
                  local res = ValidURL.ValidURL(buf:GetPointer());
                  if(res == 1)then
                      Label.SetText("Label2", "Valid!");
                  else
                      Label.SetText("Label2", "Invalid!");
                  end
                  
                  Debug.Print(tostring(res).."\r\n");
                  
                  -- release the memory we reserved on line 4 :)
                  buf:Free();
                  Good to hear you managed to get it to work though! The older PB versions are not even that much worse than the modern versions; they often fix a few bugs and add new features, however the rest doesn't change it that much..

                  As for your questions:

                  1. When you use Protected in PureBasic, you use memory on the stack instead of in RAM. It's not the same per say as with local in Lua, as local only changes its scope. Protected can only be used inside procedures and uses the stack of the procedure to store the variable, which is faster. You could also use 'define' in a procedure which would not make it global, but it would also not use stack memory.

                  2. The calling convention, more commonly referred to as application binary interface (ABI), is the method of communication between two binary modules or even between two functions in the same module. It's how functions talk - they define how arguments should be transferred to a function, how results are handled and who is responsible for cleaning up after the function call has ended.

                  All PB procedures declared as `Procedure` and `ProcedureDLL` have the STDCALL ABI for example. All procedures declared as `ProcedureC` and `ProcedureCDLL` have the CDECL ABI.

                  3. The PureBasic Regex library uses the PCRE syntax, so you can follow that syntax!

                  Yes I've seen RosettaCode many many times, and it's a great place for inspiration on very uncommon and common problems. Even though almost all code on the Rosetta website work, I would always encourage you to use it as an inspiration and not copy the code over to your project. It's a community driven website with a thousand code styles, not all of them take enough care of i.e. speed and memory.

                  I'll look over your homework hopefully this afternoon! Haha
                  Bas Groothedde
                  Imagine Programming :: Blog

                  AMS8 Plugins
                  IMXLH Compiler

                  Comment


                  • #10
                    I've looked at your PB code and it looks fine to me!
                    Bas Groothedde
                    Imagine Programming :: Blog

                    AMS8 Plugins
                    IMXLH Compiler

                    Comment


                    • #11
                      Thanks for having invested some of your time into this one, IP. For me, the challenge here was to actually understand the PB code you rewrote, rather than just compiling it (which of course would've required no more of me than just point-n-click).

                      I found that by looking over Dean's old PB-DLL tutorial from years ago, it was very useful in helping understand how the AMS argument is passed to the DLL and how the DLL then returns its values. Then by applying his work comparatively with your own code, making the logical leaps and connections to understand the process, was much easier. Interesting how the 'actions' and 'teachings' of just one individual, can still 'echo down' over many years to influence the learning of others. (Something, I'm sure you're already finding to be the case with your own teaching work). Incidentally, that really 'made my day' when I heard you were working part-time as a teacher now. LOL, that's just awesome, man. Really.

                      Anyway, had fun with this. It for sure helped build my PB vocab a bit (although still woefully short) which I think is probably as fundamentally important as understanding the procedural logic itself. That is, if I'm to make any leaps forward with my overall PureBasic understanding.

                      PureBasic aside though, my next self-challenge will be in trying to figure out how to get a Lua utility library (for web data mining via via curl & sqlite3) up and functioning in AMS. My spider-sense tells me I'm setting myself up for a massive migraine with this one - but LMAO - I just can't seem to help myself. Ahhh, cuppla dayzz!

                      Comment


                      • #12
                        Originally posted by BioHazard View Post
                        Thanks for having invested some of your time into this one, IP. For me, the challenge here was to actually understand the PB code you rewrote, rather than just compiling it (which of course would've required no more of me than just point-n-click).

                        I found that by looking over Dean's old PB-DLL tutorial from years ago, it was very useful in helping understand how the AMS argument is passed to the DLL and how the DLL then returns its values. Then by applying his work comparatively with your own code, making the logical leaps and connections to understand the process, was much easier. Interesting how the 'actions' and 'teachings' of just one individual, can still 'echo down' over many years to influence the learning of others. (Something, I'm sure you're already finding to be the case with your own teaching work). Incidentally, that really 'made my day' when I heard you were working part-time as a teacher now. LOL, that's just awesome, man. Really.
                        Fun fact; Dean taught me the first steps of PureBasic back in the day. He introduced it to me, he taught me how to write a plugin for AMS and from that day, it escalated pretty quickly. When Dean got preoccupied with other matters in life, he transferred the ListIcon project to me and that project and its code taught me a huge amount of information. Many of the things I do today have been influenced by him, as I'm still using PureBasic for many projects.

                        Originally posted by BioHazard View Post
                        Anyway, had fun with this. It for sure helped build my PB vocab a bit (although still woefully short) which I think is probably as fundamentally important as understanding the procedural logic itself. That is, if I'm to make any leaps forward with my overall PureBasic understanding.
                        Wherever I can help in the future, please do not hesitate to ask me questions. You could also ask for help on the PureBasic forum, however you can always pitch me a message!

                        Originally posted by BioHazard View Post
                        PureBasic aside though, my next self-challenge will be in trying to figure out how to get a Lua utility library (for web data mining via via curl & sqlite3) up and functioning in AMS. My spider-sense tells me I'm setting myself up for a massive migraine with this one - but LMAO - I just can't seem to help myself. Ahhh, cuppla dayzz!
                        You mean you want to write a Lua library, or write Lua code that utilizes these mentioned libraries?
                        Bas Groothedde
                        Imagine Programming :: Blog

                        AMS8 Plugins
                        IMXLH Compiler

                        Comment


                        • #13
                          Originally posted by Imagine Programming View Post
                          ...You mean you want to write a Lua library, or write Lua code that utilizes these mentioned libraries?
                          That'd definitely be the latter (wouldn't have a clue how to write a Lua library from scratch). Is just a pre-existing library (with dependencies) found on GitHub, here: https://github.com/mkottman/wdm There's even an example (google.lua) demonstrating usage that's already pre-written, so that part's not an issue.

                          The stumbling block I'm encountering is that although the GitHub page does provides links for the dependencies this thing uses (ie. luacurl, luasql, etc..) some of these links are dead. And when searching for the sourcefiles elsewhere, I have no idea how to recognize exactly what it is that I'm supposed to be looking for.

                          If I can get all the essential dependency files together, I'm sure I can figure out how to implement this thing easily enough - but at the moment am essentially clueless. I can see that what I'm attempting here is not exactly rocket-science, it's just an unfamiliar process to me. Any suggestions? (Ideally, at this stage, I'd just like to get the example 'google.lua' script shown there, up and running in an APZ, so I can start playing around with it to investigate its other potentialities).

                          Comment

                          Working...
                          X