Perhaps indexes are also helpful, but I have only a basic idea of what they are and no idea whether they would be of help here. I would like to know if there's way to optimize this query using either parallelization (the query should run ideally from python) or using the knowledge that there is exactly 1 match in each Label table.įrom my understanding, multiple inner joins should take at most the sum of the individual joins, correct? this is really slow, especially for more label tables, using PCRE sqlite3 extension for the REGEXP function. Websites.path REGEXP '^' || Label2.path_re || '$') Websites.domain REGEXP '^' || Label2.domain_re || '$' AND Label2 ON (Websites.protocol REGEXP '^' || Label2.protocol_re || '$' AND Websites.path REGEXP '^' || Label1.path_re || '$') Websites.domain REGEXP '^' || Label1.domain_re || '$' AND Label1 ON (Websites.protocol REGEXP '^' || Label1.protocol_re || '$' AND I would now like to write a query that reconstructs a table like the original OldWebsites one by matching labels and automatically queried data. ![]() Since I would always extend this table with new data and remove old data, this became too much of a hazzle, so I tried a different approach, namely just loading the existing addresses into one table and then have other tables for the data where I would write regex matchers for the address components CREATE TABLE Websites (Īssume that I have already (using other queries) guaranteed, that there is exactly one match in each label table for each address in the Websites table. I found myself repeating labels over and over based on certain patterns that the address would match. The labels are then in additional columns. You will have to gather the data for the extra columns, while you analyze the html with your framework. In the table, where you have the column with the html code, add additional columns for data about the html. Initially, I would load a couple 100k worth of addresses into a table with each field of the address being a column and together building the primary key. Do it with your programming language, your framework that is using SQLite. Let's use URLs an example, the three parts being the protocol, the domain and the path. The data from the live system consists primarily of an address, comprised of 3 parts. If you are confident that you will not find "msn" multiple times in one string (e.g., ""), though, you can find with a regex (as above) and then use the non-regex replace.I'm using an sqlite database to store manually created labels for some data automatically queried from a live system. In order to replace the "msn" with "toast" (within the string, as a substring replacement), though, SQLite does not currently have native support for regex-replacement (short of icu_replace.c, found here). con0 <- DBI::dbConnect(RSQLite::SQLite())ĭBI::dbExecute(con0, "CREATE TABLE personal_websessions(id_no INTEGER PRIMARY KEY, website_link TEXT)")ĭBI::dbExecute(con0, "INSERT INTO personal_websessions VALUES(1, ''), (2, ' '), (3, 'msn.com ')")ĭBI::dbExecute(con0, "INSERT INTO personal_websessions VALUES(4, '')")ĭBI::dbGetQuery(con0, "select * from personal_websessions where website_link like 'msn%'")ĭBI::dbGetQuery(con0, "select * from personal_websessions where website_link regexp '\\bmsn\\b'") You can use SQLite's regexp function, but only after having it registered. I am using an RSQLITE connection if that helps. I am really just trying to find out how to use a regex pattern to find and replace values in sqllite. However in this link, they are not using any regex to match the words just defining them I am currently using SQlite and I know that I will have to use the REPLACE function as it is can find a pattern and then provide a replacement, I know that the regex will be of the form (msn) as a grouping structure to match on but I do not know how to write a regex match in Sqlite.Įssentially i will have the following desired output below: id_no | website_link so the example above - and will stay the same. ![]() What I would like to do is if the character is 'msn.com' or ' etc (so something with msn in the word) in the website_link column, find that value of 'msn' and replace it with an string 'toast', but if it is not the word msn then leave it as it is. I would like to perform a find and replace using regex: INSERT INTO personal_websessions VALUES(1, ''), (2, ' '), (3, 'msn.com ') ![]() You can create this table using the following SQL commands: CREATE TABLE personal_websessions(id_no INTEGER PRIMARY KEY, website_link TEXT) I have a table called personal_websessions that contains data in the following format: id_no | website_link
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |